SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
exascaleproject.org
ECP Application Development
Andrew Siegel, ANL
HPC User Forum
Argonne National Laboratory
Sept. 10, 2019
2
Exascale Computing Project: Application Development
Code Porting Algorithmic
Restructuring
Alternate choice of
Physical Models
New
Numerical
Approaches
Hardware has significant impact on all aspects of simulation strategy
Goal: Ensure that exascale hardware impacts DOE science/engineering mission
Approach: Significant investment in scientific applications well in advance of
exascale machines
3
Portfolio of ECP Applications
Application
Categories
Number of
Projects
Chemistry and
Materials
6
Energy (generation) 5
Earth and Space
Sciences
5
Data Analytics and
Optimization
4
National Security 4
24 Domain Science/Engineering Simulation Projects
50+ separate codes
Well defined, evolving dependencies on ECP
software technology projects
2/3 C ; 1/3 Fortran
Most pure MPI, or MPI+OpenMP at outset
4
What defines an application project?
New physics capabilities – Not just faster/bigger version of existing codes
1. Scientific or Engineering exascale challenge problem.
2. Detailed completion criteria for (1) on exascale platform
3. A Figure of Merit (FOM) > 50 for project success
5
Quick Flyover of all 21 non-NNSA AD Application Projects
6
Energy Applications
Harden wind plant design
and layout against energy
loss susceptibility; higher
penetration of wind energy
Lead: NREL
DOE EERE
ExaWind: Turbine Wind Plant
Efficiency
Commercial-scale demo of
transformational energy
technologies - curbing CO2
emissions at fossil fuel
power plants by 2030
Lead: NETL
DOE EERE
MFIX-Exa: Scale-up of Clean
Fossil Fuel Combustion
Virtual test reactor for
advanced designs via
experimental-quality
simulations of reactor
behavior
Lead: ORNL
DOE NE
ExaSMR: Design and
Commercialization of Small
Modular Reactors
Combustion-PELE: High-
Efficiency, Low-Emission
Combustion Engine Design
Reduction or elimination
of current cut-and-try
approaches for
combustion system
design
Lead: SNL
DOE BES, EERE
WarpX: Plasma Wakefield
Accelerator Design
Virtual design of 100-stage
1 TeV collider; dramatically
cut accelerator size and
design cost
Lead: LBNL
DOE HEP
Prepare for ITER
experiments and increase
ROI of validation data and
understanding; prepare for
beyond-ITER devices
Lead: PPPL
DOE FES
WDMApp: High-Fidelity Whole
Device Modeling of Magnetically
Confined Fusion Plasmas
7
Chemistry and Materials Applications
ExaAM: Additive Manufacturing of
Qualifiable Metal Parts
Accelerate the widespread
adoption of AM by enabling
routine fabrication of
qualifiable metal parts
Lead: ORNL
DOE NNSA / EERE
GAMESS: Biofuel Catalyst Design
Design more robust and
selective catalysts orders
of magnitude more
efficient at temperatures
hundreds of degrees lower
Lead: Ames
DOE BES
EXAALT: Materials for Extreme
Environments
Simultaneously address
time, length, and accuracy
requirements for predictive
microstructural evolution of
materials
Lead: LANL
DOE BES, FES, NE
QMCPACK: Find, Predict, Control
Materials & Properties at Quantum
Level
Design and optimize next-
generation materials from
first principles with
predictive accuracy
Lead: ORNL
DOE BES
NWChemEx: Catalytic Conversion
of Biomass-Derived Alcohols
Develop new optimal catalysts
while changing the current
design processes that remain
costly, time consuming, and
dominated by trial-and-error
Lead: PNNL
DOE BER, BES
LatticeQCD: Validate Fundamental
Laws of Nature
Correct light quark masses;
properties of light nuclei
from first principles; <1%
uncertainty in simple
quantities
Lead: FNAL
DOE NP, HEP
8
Earth and Space Science Applications
Subsurface: Carbon Capture,
Fossil Fuel Extraction, Waste
Disposal
Reliably guide safe
long-term consequential
decisions about storage,
sequestration, and
exploration
Lead: LBNL
DOE BES, EERE, FE, NE
EQSIM: Earthquake Hazard Risk
Assessment
Replace conservative and
costly earthquake retrofits
with safe purpose-fit
retrofits and designs
Lead: LBNL
DOE NNSA / NE, EERE
Forecast water resources
and severe weather with
increased confidence;
address food supply
changes
Lead: SNL
DOE BER
E3SM-MMF: Accurate Regional
Impact Assessment in Earth
Systems
Unravel key unknowns in
the dynamics of the
Universe: dark energy,
dark matter, and inflation
Lead: ANL
DOE HEP
ExaSky: Cosmological Probe of
the Standard Model of Particle
Physics
ExaStar: Demystify Origin of
Chemical Elements
What is the origin of the
elements? Behavior of
matter at extreme
densities? Sources of
gravity waves?
Lead: LBNL
DOE NP
9
Data Analytics and Optimization Applications
ExaBiome: Metagenomics for
Analysis of Biogeochemical Cycles
Discover knowledge useful
for environmental
remediation and the
manufacture of novel
chemicals and medicines
Lead: LBNL
DOE BER
ExaFEL: Light Source-Enabled
Analysis of Protein and Molecular
Structure and Design
Process data without beam time loss;
determine nanoparticle size
& shape changes; engineer
functional properties in
biology and material science
Lead: SLAC
DOE BES
Optimize power grid
planning, operation,
control and improve
reliability and efficiency
Lead: PNNL
DOE EDER, CESER, EERE
ExaSGD: Reliable and Efficient
Planning of the Power Grid
Develop predictive pre-clinical
models & accelerate diagnostic
and targeted therapy thru
predicting mechanisms of
RAS/RAF driven cancers
Lead: ANL
NIH
CANDLE: Accelerate and Translate
Cancer Research
10
Application Development Milestones
AD: Mapping of
applications to
target exascale
architecture with
machine-specific
performance
analysis including
challenges and
projections.
CD-2/3 Approval
AD: Early results
on pre-exascale
architectures
with analysis of
performance
challenges and
projections.
Q2 Q1Q1 Q1Q4Q2 Q4 Q1Q2 Q2 Q4 Q1FY18 FY19 FY20 FY21 FY22 Q4 Q4FY23
AD, ST, HI:
Demonstration of
Application
Performance on
Exascale Challenge
Problems
AD: Assess
application
status relative
to challenge
problem
Q4
AD: Results
on early
exascale
hardware
CD-4 Approve
Project Completion
Q2
11
Sequoia (10)
Cori (12)
Theta (24)
Mira (21)
Titan (9)
Trinity (6)
Baseline
Platforms
Trinity (6)
O (10PF)
Summit (1)
NERSC-9
Perlmutter
Current
ECP Focus
O (100PF)
Aurora
ECP Target
Exascale
Platforms
O (1EF)
12
Current Figure of Merit Improvements on Summit/Sierra
13
Applications Face Common Challenges
1) Flat performance profiles
2) Strong Scaling
3) Understanding/analyzing accelerator performance
4) Choice of programming model
5) Selecting mathematical models that fit architecture
6) Software dependencies
14
Strong scaling on modern high throughput cores
• High throughput processors do not perform near their peak in starvation
limit
– High value of n1/2
– Require abundance of fine-grained tasks that are efficiently scheduled on available
resources
– Otten et al. e.g. demonstrate that, on Titan, certain problems can run faster by
exploiting the additional granularity afforded through the all-CPU model rather than
using the highly-tuned GPU code (albeit with a 2.5 increase in power)
• Important because work is linear in time, so 50x has major performance
impact
14
15
Fast reductions is another key component of strong scaling
• Conjugate Gradient
– If vector reductions performed in software
• η=.5 n/P ≥ 8500–12000 for P = 106–109
– If vector reductions performed in hardware
• η=.5 n/P ≥ 1200 for P = 106–109 !
• Multigrid
– η=.5 n/P ~ 10000-20000 on machine like BG/Q
– 2-4 times faster if hardware support for addition prefix ops
– Bottom line: enables same simulation to run faster
15
16
• Instead of solving equation, simulate
individual neutrons directly
• Use known probability distributions
for events (distance to collision,
reaction, etc.)
• Count (or “tally”) the number of
events that occur
• Simulating many (think millions+)
particles gives average behavior
Exemplar: Nuclear Engineering (ExaSMR)
Steve Hamilton (ORNL)
Approach: Monte Carlo Method
Why is this hard on accelerator architectures?
17
History-based Algorithm
Entire life of a particle
history
for each particle do
while particle is alive
calculate next interaction
endwhile
endfor
Thread divergence: not a natural fit for GPUs
One particle at a time
18
Get vector of particles
while any particle alive do
for each event type do
for particle ∈ event queue do
Process event
end for
end for
end while
Event-based Algorithm
Data-level parallelism?
• Do one step at a time
• Sort by event type
• Process as SIMD
19
Algorithmic mapping to hardware – neutron particle transport
• Reduce thread divergence – change from history- to event-based algorithm
• Flatten algorithms to reduce kernel size; smaller kernels = higher occupancy
• Partition events based on fuel and non-fuel regions
• Take advantage of other architectural improvements
20
• Machine-learned MD potential that seeks for quantum-chemistry accuracy
• Neighbors of each atom are mapped onto unit sphere in 4D
• Density around each atom is expanded in a basis of 4D hyperspherical harmonics
• Bispectrum components of the 4D hyperspherical harmonic expansion are used as the geometric
descriptors of the local environment
• Preserves universal physical symmetries
• Invariant to rotation, translation, permutation
• Size-consistent
• SNAP uses linear regression to fit coefficients to DFT data
q0,q,f( ) = q0
max
r rcut , cos-1
(z r), tan-1
(y x)( )
20
r
rcut
Exemplar: SNAP Potential (Danny Perez, LANL)
21
SNAP GPU Performance Over Time
OLCF GPU
Hackathon
22
SNAP Performance Improvements
• Aidan Thompson (Sandia) took the SNAP CPU code out of LAMMPS →
TestSNAP stand-alone (realistic) force kernel, includes correctness check
• Idea from Nick Lubbers (LANL) →Aidan made algorithmic improvements that
reduced FLOP count and eliminated some intermediate storage → ~2x
speedup on CPUs
• Aidan reduced memory use by collapsing multidimensional arrays into
compact lists
• Rahul Gayatri (NERSC):
1. broke up the one monster kernel into many smaller kernels, reduces register pressure and allows tailoring
launch parameters for each kernel, but blows up the memory
2. inverted loops and changed data layouts to improve memory access
• Also had help from Sarah Anderson (Cray) and Evan Weinberg (NVIDIA)
• These improvements were ported to Kokkos SNAP in LAMMPS by Stan
Moore
23
EXAALT FOM/KPP Projection for Summit
• Mira (IBM BG/Q) FOM baseline: 0.182 Katoms-steps/s/node * 49152 Mira nodes
• 2018 LAMMPS performance on Summit: 33.7 Katom-steps/s/node * 4608 Summit nodes:
projected 17.4x faster than Mira baseline
• New LAMMPS performance on Summit: 175.1 Katom-steps/s/node * 4608 Summit nodes:
projected 90.2x faster than Mira baseline
• Recently ported energy minimization in LAMMPS to Kokkos, which is needed by ParSplice
• Danny Perez (LANL) planning to validate these projections with large-scale Summit run soon
24
Overall …
• ECP is a very difficult project with many moving parts: specialized node
architectures, system software, programming models, application level libraries,
etc. enabling ambitious science and performance goals.
• Early adoption of intermediate (100PF) systems, test hardware, and hardware
simulators critical to lowering risk by enabling progress tracking and early
identification of issues.
• Surprisingly good progress to date, need to continue to push early adoption of
exascale-type hardware, ensure proper balance of domain expertise and
performance engineering. Facilities engagement programs are critical to
achieving this.

Weitere ähnliche Inhalte

Was ist angesagt?

Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
Heiko Joerg Schick
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
Heiko Joerg Schick
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Heiko Joerg Schick
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 

Was ist angesagt? (20)

Petascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big AnalyticsPetascale Analytics - The World of Big Data Requires Big Analytics
Petascale Analytics - The World of Big Data Requires Big Analytics
 
High Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale ComputingHigh Performance Computing - Challenges on the Road to Exascale Computing
High Performance Computing - Challenges on the Road to Exascale Computing
 
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
Experiences in Application Specific Supercomputer Design - Reasons, Challenge...
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A CGRA-based Approachfor Accelerating Convolutional Neural NetworksA CGRA-based Approachfor Accelerating Convolutional Neural Networks
A CGRA-based Approach for Accelerating Convolutional Neural Networks
 
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...Targeting GPUs using OpenMP  Directives on Summit with  GenASiS: A Simple and...
Targeting GPUs using OpenMP Directives on Summit with GenASiS: A Simple and...
 
The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
2. Cnnecst-Why the use of FPGA?
2. Cnnecst-Why the use of FPGA? 2. Cnnecst-Why the use of FPGA?
2. Cnnecst-Why the use of FPGA?
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
 
20072311272506
2007231127250620072311272506
20072311272506
 
Jg3515961599
Jg3515961599Jg3515961599
Jg3515961599
 

Ähnlich wie ECP Application Development

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
confluent
 

Ähnlich wie ECP Application Development (20)

Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry ImpactAccelerators at ORNL - Application Readiness, Early Science, and Industry Impact
Accelerators at ORNL - Application Readiness, Early Science, and Industry Impact
 
The U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and PlansThe U.S. Exascale Computing Project: Status and Plans
The U.S. Exascale Computing Project: Status and Plans
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Project
 
Arrhenius.jl: A Differentiable Combustion Simulation Package
Arrhenius.jl: A Differentiable Combustion Simulation PackageArrhenius.jl: A Differentiable Combustion Simulation Package
Arrhenius.jl: A Differentiable Combustion Simulation Package
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
How HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental scienceHow HPC and large-scale data analytics are transforming experimental science
How HPC and large-scale data analytics are transforming experimental science
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Towards Exascale Engine Simulations with NEK5000
Towards Exascale Engine Simulations with NEK5000Towards Exascale Engine Simulations with NEK5000
Towards Exascale Engine Simulations with NEK5000
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
HPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific ComputingHPC + Ai: Machine Learning Models in Scientific Computing
HPC + Ai: Machine Learning Models in Scientific Computing
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
The Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPCThe Coming Age of Extreme Heterogeneity in HPC
The Coming Age of Extreme Heterogeneity in HPC
 
OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019OpenACC Monthly Highlights Summer 2019
OpenACC Monthly Highlights Summer 2019
 
CLIC accelerator overview
CLIC accelerator overviewCLIC accelerator overview
CLIC accelerator overview
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019OpenACC Monthly Highlights March 2019
OpenACC Monthly Highlights March 2019
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

ECP Application Development

  • 1. exascaleproject.org ECP Application Development Andrew Siegel, ANL HPC User Forum Argonne National Laboratory Sept. 10, 2019
  • 2. 2 Exascale Computing Project: Application Development Code Porting Algorithmic Restructuring Alternate choice of Physical Models New Numerical Approaches Hardware has significant impact on all aspects of simulation strategy Goal: Ensure that exascale hardware impacts DOE science/engineering mission Approach: Significant investment in scientific applications well in advance of exascale machines
  • 3. 3 Portfolio of ECP Applications Application Categories Number of Projects Chemistry and Materials 6 Energy (generation) 5 Earth and Space Sciences 5 Data Analytics and Optimization 4 National Security 4 24 Domain Science/Engineering Simulation Projects 50+ separate codes Well defined, evolving dependencies on ECP software technology projects 2/3 C ; 1/3 Fortran Most pure MPI, or MPI+OpenMP at outset
  • 4. 4 What defines an application project? New physics capabilities – Not just faster/bigger version of existing codes 1. Scientific or Engineering exascale challenge problem. 2. Detailed completion criteria for (1) on exascale platform 3. A Figure of Merit (FOM) > 50 for project success
  • 5. 5 Quick Flyover of all 21 non-NNSA AD Application Projects
  • 6. 6 Energy Applications Harden wind plant design and layout against energy loss susceptibility; higher penetration of wind energy Lead: NREL DOE EERE ExaWind: Turbine Wind Plant Efficiency Commercial-scale demo of transformational energy technologies - curbing CO2 emissions at fossil fuel power plants by 2030 Lead: NETL DOE EERE MFIX-Exa: Scale-up of Clean Fossil Fuel Combustion Virtual test reactor for advanced designs via experimental-quality simulations of reactor behavior Lead: ORNL DOE NE ExaSMR: Design and Commercialization of Small Modular Reactors Combustion-PELE: High- Efficiency, Low-Emission Combustion Engine Design Reduction or elimination of current cut-and-try approaches for combustion system design Lead: SNL DOE BES, EERE WarpX: Plasma Wakefield Accelerator Design Virtual design of 100-stage 1 TeV collider; dramatically cut accelerator size and design cost Lead: LBNL DOE HEP Prepare for ITER experiments and increase ROI of validation data and understanding; prepare for beyond-ITER devices Lead: PPPL DOE FES WDMApp: High-Fidelity Whole Device Modeling of Magnetically Confined Fusion Plasmas
  • 7. 7 Chemistry and Materials Applications ExaAM: Additive Manufacturing of Qualifiable Metal Parts Accelerate the widespread adoption of AM by enabling routine fabrication of qualifiable metal parts Lead: ORNL DOE NNSA / EERE GAMESS: Biofuel Catalyst Design Design more robust and selective catalysts orders of magnitude more efficient at temperatures hundreds of degrees lower Lead: Ames DOE BES EXAALT: Materials for Extreme Environments Simultaneously address time, length, and accuracy requirements for predictive microstructural evolution of materials Lead: LANL DOE BES, FES, NE QMCPACK: Find, Predict, Control Materials & Properties at Quantum Level Design and optimize next- generation materials from first principles with predictive accuracy Lead: ORNL DOE BES NWChemEx: Catalytic Conversion of Biomass-Derived Alcohols Develop new optimal catalysts while changing the current design processes that remain costly, time consuming, and dominated by trial-and-error Lead: PNNL DOE BER, BES LatticeQCD: Validate Fundamental Laws of Nature Correct light quark masses; properties of light nuclei from first principles; <1% uncertainty in simple quantities Lead: FNAL DOE NP, HEP
  • 8. 8 Earth and Space Science Applications Subsurface: Carbon Capture, Fossil Fuel Extraction, Waste Disposal Reliably guide safe long-term consequential decisions about storage, sequestration, and exploration Lead: LBNL DOE BES, EERE, FE, NE EQSIM: Earthquake Hazard Risk Assessment Replace conservative and costly earthquake retrofits with safe purpose-fit retrofits and designs Lead: LBNL DOE NNSA / NE, EERE Forecast water resources and severe weather with increased confidence; address food supply changes Lead: SNL DOE BER E3SM-MMF: Accurate Regional Impact Assessment in Earth Systems Unravel key unknowns in the dynamics of the Universe: dark energy, dark matter, and inflation Lead: ANL DOE HEP ExaSky: Cosmological Probe of the Standard Model of Particle Physics ExaStar: Demystify Origin of Chemical Elements What is the origin of the elements? Behavior of matter at extreme densities? Sources of gravity waves? Lead: LBNL DOE NP
  • 9. 9 Data Analytics and Optimization Applications ExaBiome: Metagenomics for Analysis of Biogeochemical Cycles Discover knowledge useful for environmental remediation and the manufacture of novel chemicals and medicines Lead: LBNL DOE BER ExaFEL: Light Source-Enabled Analysis of Protein and Molecular Structure and Design Process data without beam time loss; determine nanoparticle size & shape changes; engineer functional properties in biology and material science Lead: SLAC DOE BES Optimize power grid planning, operation, control and improve reliability and efficiency Lead: PNNL DOE EDER, CESER, EERE ExaSGD: Reliable and Efficient Planning of the Power Grid Develop predictive pre-clinical models & accelerate diagnostic and targeted therapy thru predicting mechanisms of RAS/RAF driven cancers Lead: ANL NIH CANDLE: Accelerate and Translate Cancer Research
  • 10. 10 Application Development Milestones AD: Mapping of applications to target exascale architecture with machine-specific performance analysis including challenges and projections. CD-2/3 Approval AD: Early results on pre-exascale architectures with analysis of performance challenges and projections. Q2 Q1Q1 Q1Q4Q2 Q4 Q1Q2 Q2 Q4 Q1FY18 FY19 FY20 FY21 FY22 Q4 Q4FY23 AD, ST, HI: Demonstration of Application Performance on Exascale Challenge Problems AD: Assess application status relative to challenge problem Q4 AD: Results on early exascale hardware CD-4 Approve Project Completion Q2
  • 11. 11 Sequoia (10) Cori (12) Theta (24) Mira (21) Titan (9) Trinity (6) Baseline Platforms Trinity (6) O (10PF) Summit (1) NERSC-9 Perlmutter Current ECP Focus O (100PF) Aurora ECP Target Exascale Platforms O (1EF)
  • 12. 12 Current Figure of Merit Improvements on Summit/Sierra
  • 13. 13 Applications Face Common Challenges 1) Flat performance profiles 2) Strong Scaling 3) Understanding/analyzing accelerator performance 4) Choice of programming model 5) Selecting mathematical models that fit architecture 6) Software dependencies
  • 14. 14 Strong scaling on modern high throughput cores • High throughput processors do not perform near their peak in starvation limit – High value of n1/2 – Require abundance of fine-grained tasks that are efficiently scheduled on available resources – Otten et al. e.g. demonstrate that, on Titan, certain problems can run faster by exploiting the additional granularity afforded through the all-CPU model rather than using the highly-tuned GPU code (albeit with a 2.5 increase in power) • Important because work is linear in time, so 50x has major performance impact 14
  • 15. 15 Fast reductions is another key component of strong scaling • Conjugate Gradient – If vector reductions performed in software • η=.5 n/P ≥ 8500–12000 for P = 106–109 – If vector reductions performed in hardware • η=.5 n/P ≥ 1200 for P = 106–109 ! • Multigrid – η=.5 n/P ~ 10000-20000 on machine like BG/Q – 2-4 times faster if hardware support for addition prefix ops – Bottom line: enables same simulation to run faster 15
  • 16. 16 • Instead of solving equation, simulate individual neutrons directly • Use known probability distributions for events (distance to collision, reaction, etc.) • Count (or “tally”) the number of events that occur • Simulating many (think millions+) particles gives average behavior Exemplar: Nuclear Engineering (ExaSMR) Steve Hamilton (ORNL) Approach: Monte Carlo Method Why is this hard on accelerator architectures?
  • 17. 17 History-based Algorithm Entire life of a particle history for each particle do while particle is alive calculate next interaction endwhile endfor Thread divergence: not a natural fit for GPUs One particle at a time
  • 18. 18 Get vector of particles while any particle alive do for each event type do for particle ∈ event queue do Process event end for end for end while Event-based Algorithm Data-level parallelism? • Do one step at a time • Sort by event type • Process as SIMD
  • 19. 19 Algorithmic mapping to hardware – neutron particle transport • Reduce thread divergence – change from history- to event-based algorithm • Flatten algorithms to reduce kernel size; smaller kernels = higher occupancy • Partition events based on fuel and non-fuel regions • Take advantage of other architectural improvements
  • 20. 20 • Machine-learned MD potential that seeks for quantum-chemistry accuracy • Neighbors of each atom are mapped onto unit sphere in 4D • Density around each atom is expanded in a basis of 4D hyperspherical harmonics • Bispectrum components of the 4D hyperspherical harmonic expansion are used as the geometric descriptors of the local environment • Preserves universal physical symmetries • Invariant to rotation, translation, permutation • Size-consistent • SNAP uses linear regression to fit coefficients to DFT data q0,q,f( ) = q0 max r rcut , cos-1 (z r), tan-1 (y x)( ) 20 r rcut Exemplar: SNAP Potential (Danny Perez, LANL)
  • 21. 21 SNAP GPU Performance Over Time OLCF GPU Hackathon
  • 22. 22 SNAP Performance Improvements • Aidan Thompson (Sandia) took the SNAP CPU code out of LAMMPS → TestSNAP stand-alone (realistic) force kernel, includes correctness check • Idea from Nick Lubbers (LANL) →Aidan made algorithmic improvements that reduced FLOP count and eliminated some intermediate storage → ~2x speedup on CPUs • Aidan reduced memory use by collapsing multidimensional arrays into compact lists • Rahul Gayatri (NERSC): 1. broke up the one monster kernel into many smaller kernels, reduces register pressure and allows tailoring launch parameters for each kernel, but blows up the memory 2. inverted loops and changed data layouts to improve memory access • Also had help from Sarah Anderson (Cray) and Evan Weinberg (NVIDIA) • These improvements were ported to Kokkos SNAP in LAMMPS by Stan Moore
  • 23. 23 EXAALT FOM/KPP Projection for Summit • Mira (IBM BG/Q) FOM baseline: 0.182 Katoms-steps/s/node * 49152 Mira nodes • 2018 LAMMPS performance on Summit: 33.7 Katom-steps/s/node * 4608 Summit nodes: projected 17.4x faster than Mira baseline • New LAMMPS performance on Summit: 175.1 Katom-steps/s/node * 4608 Summit nodes: projected 90.2x faster than Mira baseline • Recently ported energy minimization in LAMMPS to Kokkos, which is needed by ParSplice • Danny Perez (LANL) planning to validate these projections with large-scale Summit run soon
  • 24. 24 Overall … • ECP is a very difficult project with many moving parts: specialized node architectures, system software, programming models, application level libraries, etc. enabling ambitious science and performance goals. • Early adoption of intermediate (100PF) systems, test hardware, and hardware simulators critical to lowering risk by enabling progress tracking and early identification of issues. • Surprisingly good progress to date, need to continue to push early adoption of exascale-type hardware, ensure proper balance of domain expertise and performance engineering. Facilities engagement programs are critical to achieving this.