In this deck from the 2019 Stanford HPC Conference, Rob Neely, from Lawrence Livermore National Laboratory presents: Sierra - Science Unleashed.
"This talk will give an overview of Sierra and some of the early science results it has enabled. Sierra is an IBM system harnessing the power of over 17,000 NVIDIA Volta GPUs recently deployed at Lawrence Livermore National Laboratory and is currently ranked as the #2 system on the Top500. Before being turned over for use in the classified mission, Sierra spent months in an “open science campaign” where we got an early glimpse at some of the truly game-changing science this system will unleash – selected results of which will be presented."
Rob Neely is a Computer Scientist and Technical Manager at Lawrence Livermore National Laboratory where he is the Weapon Simulation & Computing Program Coordinator for Computing Environments, and the Associate Division Lead for the Center for Applied Scientific Computing (CASC). He also is the DOE Exascale Computing Project lead for Software Technologies Ecosystem and Delivery. He has been involved in High Performance Computing for his entire 25+ year career.
Learn more: https://computation.llnl.gov/computers/sierra
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
1. 1
LLNL-PRES-767803
LLNL-PRES-767803
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-
AC52-07NA27344. Lawrence Livermore National Security, LLC
Sierra: Science Unleashed
HPC Advisory Council
Stanford University
Presenter: Rob Neely
Weapons Simulation and Computing Program Coordinator for Computing Environments
Feb 14, 2019
2. 2
LLNL-PRES-767803
Outline
• LLNL’s Mission for Stockpile Stewardship
• Sierra system overview
• Motivation for Sierra
• Application preparation with the “Sierra Center of Excellence”
• Exemplar unclassified NNSA application areas
• Open Science Period Results
• Earthquake modeling with SW4
• Biological modeling of Cancer
• Neutron lifetime
• Extreme-scale RANS modeling
• Future directions: AI/ML in scientific computing
4. 4
LLNL-PRES-767803
Lawrence Livermore Nat’l Lab (LLNL) is a
multidisciplinary national security laboratory
§ Established in 1952
§ Approximately 7,000 employees
§ 1 square mile, 684 facilities
4
Located in the SF Bay Area
5. 5
LLNL-PRES-767803
Nuclear Security Other National Security
Bi0-Science
and Bio-Engineering
Earth and Atmospheric
Science
High-Energy-Density
Science
Lasers and Optical S&T
Advanced Materials
and Manufacturing
Nuclear, Chemical,
and Isotopic S&T
High-Performance Computing,
Simulation, and Data Science
ScienceandTechnology
6. 6
LLNL-PRES-767803
Nuclear security and the Stockpile Stewardship Program
§ Assess annually the safety, security,
reliability and, effectiveness of the
stockpile
§ Extend life of stockpile warheads;
adapting safety and security features
to evolving requirements
§ Strengthen underpinning science,
technology, and engineering
§ Nuclear nonproliferation and
counterterrorism
The Stockpile Stewardship Program has successfully maintained
the nuclear stockpile without explosive nuclear testing since 1992
7. 7
LLNL-PRES-767803
Life extension programs can improve safety and security while providing no new military
capabilities (consistent with policy)
Legacy System
Improved safety and
security
Still cannot take it on the
track
Remanufacture as close as
possible to original
Remanufacture
approach
Improve safety and
security approach
8. 8
LLNL-PRES-767803
Nuclear security is more than nuclear weapons
Non-proliferation Treaty monitoring Counterterrorism
§ Verificationmonitoring
§ Samplingandanalysis
§ Safetyandsecurityof
special nuclear material
§ Monitoringanddetection
for nuclear test bantreaty
agreements
§ Analysisandcharacterizationof
nuclear tests
§ Forensics
§ Detectionanddiagnostics
§ Incident response
9. 9
LLNL-PRES-767803
Providing simulation capabilities to the end user involves a large,
integrated ecosphere
Development and
debugging tools
PCA
Linux Cluster (128)
PCB
Linux Cluster (88)
ASCI Platform
White
4
Visualization
Servers
DisCom2 WAN
Cisco 8540 Router
SCF Network
4-way striping on jumbo frame GigE
4 OC-12c Fastlane
ATM Encryptors
ASCI Platform
SKY
HPGN
Routers
OC-48c
DisCom2 WAN
(from SNL/NM & LANL)
TBMX
Switch
Frame
1
OC-48c
to SNL/CA
Frame
2
Frame
3
Frame
4
SecureNet
TidalWave
EdgeWater
SecureNet
Router
NES
Encryptor
Fastlane
Encryptor
gw1
gw2
NES
Router
LLNL ATM
Switch
Open
LabNetESnet
ATM Switch
Cisco 8540
ATM switch
Ice
32 VIEWS
nodes
GigE
channel
B451
Area
B113
Area
4
32
4
4
4
2
4
7
8
2
4
4
4
4 LOGIN
nodes
4 LOGIN
nodes
4 LOGIN
nodes
4
subnets
4
subnets
4
subnets
4
subnets
2
2
NFS
4
4
4
NFS
R5
NFS
MTD
NFS
R2
NFS
D2
NFS
E2
22 2
NFS
Maple
NFS
Poplar
22
6
6
Alpha 255
Forest
5
5
5
Forest
Cluster (6)
SC
Cluster (5)
ICF
Cluster (8)
8
8
1
Sun Ultra
Denali
(X
Server)
CLN#2
CLN#1
GigE, External/Internet, std frame
GigE, Internal, std frame
GigE, Internal, jumbo frame
CLN#3
To SCF Server Network
FDDI
Switch
To NSF Servers
D2, E3, R2, R5, MTD
J90
Router
To SCF Server1 Network
To LC Servers
2
Computers
Facilities
Filesystems and Networks
Security access
controls
Visualization
Codes
11. 11
LLNL-PRES-767803
What drives super-exascale requirements is 3D high resolution,
enhanced physics calculations done in an uncertainty quantification
regime
11
Advanced physics
and algorithms (new
codes?)
Higher fidelity
and resolution in
3D (new
platforms,
scalable codes)
Predictive simulation through
Quantification of Uncertainties
(1000’s of managed simulations
and data analytics)
Our mission demands require
progress along all three axes to
advance our predictive capabilities
12. 12
LLNL-PRES-767803
LLNL has a long history of fielding Top500 systems
Courtesy of Erich Strohmaier (LBL) SC’17 Invited talk
https://www.top500.org/static/media/uploads/presentations/top500_ppt_invited_201711.pdf
Normalized HPL performance
of deployed systems over the
first 50 Top500 lists (25 years)
What is not captured here is the derivative. Like a “moving average”, this will change
over the next decade as non-US/DOE sites dominate the top 3 positions.
14. 14
LLNL-PRES-767803
Sierra is a 125 Petaflop peak system based in the Department of
Energy’s Lawrence Livermore National Laboratory supporting
national security mission advancing science in the public interest
15. 15
LLNL-PRES-767803
15
System performance
• Peak performance of 125
petaflops for modeling and
simulation
Each node has
• 2 IBM POWER9 processors
• 4 NVIDIA Tesla V100 GPUs
• 320 GiB of fast memory
• 256 GiB DDR4
• 64 GiB HBM2
• 1.6 TB of NVMe memory
The system includes
• 4,320 nodes
• 2:1 tapered Mellanox EDR InfiniBand
tree topology (50% global bandwidth)
with dual-port HCA per node
• 154 PB IBM Spectrum Scale
file system with 1.54 TB/s R/W
bandwidth
System Overview
16. 16
LLNL-PRES-767803
What makes Sierra so effective?
GPUs: Sierra links more
than 17,000 deep-learning optimized
NVIDIA GPUs with the potential to
deliver exascale-level performance (a
billion-billion calculations per second)
for AI applications.
High-Speed Data Movement: High-speed
Mellanox interconnect and NVLink high-
bandwidth technology built into all of
Sierra’s processors supply the next-
generation information superhighways.
Memory: Sierra’s sizable memory gives
researchers a convenient launching point for
data-intensive tasks, an asset that allows for
greatly improved application performance and
algorithmic accuracy as well as AI training.
CPUs: IBM Power9 processors to rapidly
execute serial code, run storage and I/O
services, and manage data so the compute is
done in the right place.
17. 17
LLNL-PRES-767803
IBM Power9 Processor
• Up to 24 cores
• Sierra’s P9s have 22 cores for yield optimization on
first processors
• PCI-Express 4.0
• Twice as fast as PCIe 3.0
• NVLink 2.0
• Coherent, high-bandwidth links to GPUs
• 14nm FinFET SOI technology
• 8 billion transistors
• Cache
• L1I: 32 KiB per core, 8-way set associative
• L1D: 32KiB per core, 8-way
• L2: 256 KiB per core
• L3: 120 MiB eDRAM, 20-way
18. 18
LLNL-PRES-767803
NVIDIA Volta Details
Tesla V100 for NVLink Tesla V100 for PCIe
Performance with
NVIDIA GPU Boost
DOUBLE-PRECISION
7.8 TeraFLOPS
SINGLE-PRECISION
15.7 TeraFLOPS
DEEP LEARNING
125 TeraFLOPS
DOUBLE-PRECISION
7 TeraFLOPS
SINGLE-PRECISION
14 TeraFLOPS
DEEP LEARNING
112 TeraFLOPS
Interconnect
Bandwidth
Bi-Directional
NVLINK
300GB/s
PCIE
32GB/s
Memory CoWoS
Stacked HBM2
CAPACITY
16GB HBM2
BANDWIDTH
900GB/s
19. 19
LLNL-PRES-767803
Lassen is an unclassified system similar to
Sierra, but smaller in size.
Its peak performance is 20.9 petaflops vs.
Sierra's 125 petaflops peak performance.
Multiple programs and institutional users
will use Lassen for leading-edge science,
such as predictive biology, computer-
aided drug discovery, or exploring novel
materials, on a world-class system.
Lassen System Details (subject to change
on final install)
Lassen
Zone CZ (Unclassified)
Nodes 720
POWER9 Processors per Node 2
GV100 (Volta) GPUs per Node 4
Node Peak (tFLOP/s) 29.1
System Peak (pFLOP/s) 20.9
Node Memory (GiB) 320
System Memory (PiB) 0.225
Interconnect 2x IB EDR
Off-Node Aggregate b/w (GB/s) 45.5
Compute Racks 40
Network and Infrastructure Racks 4
Storage Racks 4
Total Racks 48
Peak Power (MW) ~1.8
22. 22
LLNL-PRES-767803
Advancements in HPC have happened over three major
epochs, and we’re settling on a fourth (heterogeneity)
Each of these eras defines a new common programming model and
transitions are taking longer; DOE/NNSA is entering a fourth era.
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.00E+12
1.00E+13
1.00E+14
1.00E+15
1.00E+16
1.00E+17
1.00E+18
1.00E+19Jan-52
Jun-54
D
ec-56
M
ay-59
N
ov-61
M
ay-64
O
ct-66
A
pr-69
S
ep-71
M
ar-74
A
ug-76
Feb-79
A
ug-81
Jan-84
Jul-86
D
ec-88
Jun-91
N
ov-93
M
ay-96
N
ov-98
A
pr-01
O
ct-03
M
ar-06
S
ep-08
Feb-11
A
ug-13
Jan-16
Jul-18
Jan-21
Jun-23
D
ec-25
MegascaleGigascaleTerascalePetascaleKiloscale
peak
(flops)
Serial Era Vector Era
Blue Pacific
BlueGene/L
White
Roadrunner
Many Core EraDistributed Memory Era
Cielo
ATS-4
Exascale
ATS-5
Sierra
Trinity
Sequoia
Purple
ASCI Red
Red Storm
Crossroads
Transitions are
essential for
progress: between
1952 and 2012
(Sequoia), NNSA saw
a factor of ~1 trillion
improvement in peak
speed
Code
Transition
Period
Serial Era Vector Era Distributed Memory Era
After a decade or
so of uncertainty,
history will most
likely refer to this
next era as the
heterogeneous era
Heterogeneous Era
23. 23
LLNL-PRES-767803
Power
• Unreasonable
operating costs
• Today's tech =
$100M/year –
exceeds capital
investments
Chip /
processor
efficiency
• GPUs /
accelerators
• Simpler cores
• Unreliability at
near-threshold
voltages –
compounded by
scale of systems
On-node
"inscaling"
challenges
• Complex
hardware
• Massive on-
node
concurrency
requirements
• New
programming
and memory
models
Disruptive
Changes
• Code
evolution
• Algorithm
rewrites
• Platform
uncertainty
The drive for more capable high-end computing is driving disruption in
HPC application development
Meanwhile, “Big Data” and AI is driving industry trends and investments
• Social Networking, Machine/Deep Learning, Cloud Computing, …
• Software innovation in data-centric HPC is thriving
• Traditional HPC is facing strong headwinds if it can’t adapt
External
Constraints
Hardware
Designs
Software
Supports
Applications
React
The trickle down…
Assume:
Continued
increased
computational
power is desired
24. 24
LLNL-PRES-767803
Processor trends tell the parallelism story….
Moore’s Law – Alive and well!
Barely hanging on (dynamic clock
management, IPC tricks)
Flat (or even slightly down)
Mostly flat
Exponential growth?
25. 25
LLNL-PRES-767803
The LLNL/ASC procurement strategy relies on
long-lived partnerships with HPC vendors
25
Concept 9 – 10 years Retirement
Pre-
competitive
discussions
with potential
bidders
Machine
Delivery
Machine
Retirement
Request for
Proposals
with target
requirements
Contract
selection
Non-recurring
engineering
contracts
(make system
better, not
bigger)
Establish
representative
evaluation
benchmarks
Early
Delivery
System(s)
Nail down target
requirements as
final deliverables
Establish Center
of Excellence for
Application
Development
Machine
Acceptance
• Long lead times in procurements allow vendor roadmaps to be clear enough to
bid, but still fungible
• Use of target requirements reduces risk to vendor, and thus cost to the Program
• Testbed architectures make up the unofficial 3rd tier of the ASC platform strategy:
evaluation of competing and potential future technologies at smaller scale
~5 years of
effective use
Begin next
procurement
cycle
26. 26
LLNL-PRES-767803
Predicting the life cycle behavior of nuclear warheads requires simulation at a wide
range of spatial and temporal scale
10–8 10–4 0.01 110–10 10–6Distance (m)
Nuclear Physics
Full System
Atomic Physics
Molecular Level
Turbulence
Continuum Models
27. 27
LLNL-PRES-767803
Large, Integrated multi-physics Codes provide the basic simulation
capabilities for broad range of application domains
Average of 60Kloc/year of additional
code to support in a representative IC
Source code
size over time
900 kloc
§ Long life-time projects
§ 15+ years of development by large
teams (10 – 20+ people, ~50% CS &
50% Physicists)
§ Algorithms tuned for minimal turn-
around time based on today’s
hardware!
HE Cookoff
Navy Railguns
Fracture
and
Failure
Blast on Structures
Inertial Confinement Fusion
Additive Manufacturing
28. 28
LLNL-PRES-767803
Sierra Represents a Major Shift For Our Mission-Critical
Applications on Advance Architectures
2004 -
present
• IBM BlueGene, along with copious amounts of
linux-based cluster technology dominate the
HPC landscape at LLNL.
• Apps scale to unprecedented 1.6M MPI ranks,
but are still largely MPI-everywhere on ”lean”
cores
2014
BG/L
BG/P
BG/Q• Heterogeneous GPGPU-based computing begins
to look attractive with NVLINK and Unified
Memory (P9/Volta)
• Apps must undertake a 180 turn from BlueGene
toward heterogeneity and large-memory nodes
2015 –
20??
• Heterogeneous computing is here to stay, and Sierra gives us
a solid leg-up in preparing for exascale and beyond
29. 29
LLNL-PRES-767803
Application enablement through the Center of
Excellence – the original (and enduring) concept
Vendor
Application /
Platform
Expertise
Staff (code
developers)
Applications
and Key
Libraries
Hands-on collaborations
On-site vendor expertise
Customized training
Early delivery platform access
Application bottleneck analysis
Both portable and vendor-specific solutions
Technical areas: algorithms, compilers,
programming models, resilience, tools, …
Codes tuned for
next-gen
platforms
Rapid feedback
to vendors on
early issues
Publications,
early science
Steering
committee
The LLNL/Sierra
CoE is jointly
funded by the ASC
Program and the
LLNL Institution
Improved
Software Tools
and
Environment
Performance
Portability
30. 30
LLNL-PRES-767803
High level goals of the Center of Excellence
Specific strategies for each of these goals were captured in an “Execution Strategy” document
and used to guide our implementation
To train ASC and LLNL institutional application teams in the art and science of developing high
performance software on the Sierra architecture
To have our flagship applications utilizing advanced features of Sierra as soon as the machine
is generally available, and fully optimized (4-6x improvement over Sequoia) for the
architecture within 1-2 years of deployment
To develop performance-portable solutions that allow application teams to focus on
maintaining a single source code that will be effective on platforms deployed at our sister
laboratories
To build strong ties with our vendor partners as a key element of our co-design strategy –
leveraging their deep knowledge of the hardware and algorithmic approaches specific to their
platform, and informing them of our long-term application requirements as they will impact
future machine designs
31. 31
LLNL-PRES-767803
High resolution ICF simulations help us understand yield
degradation mechanisms
Lasers
heat
inside
surface of
hohlraum
Radiation
drives
capsule
inward
DT ignites
producing
fusion
energy
Basics of
Inertial
Confinement
Fusion
32. 32
LLNL-PRES-767803
Chris Clouse
LLNL
Until recently, 3D simulations were often run as a double
check on routinely employed 2D approximations. These 3D
simulations were computationally expensive and, as a
result, were run as sparingly as possible. Design and
analysis in LLNL’s National Ignition Facility (NIF) Program
until today relied primarily on 2D approximations because
3D simulations could not be turned around quickly enough
to make them a useful routine design tool.
Sierra’s architecture, which is expected to bring speed-ups
on the order of 10X for many of our 3D applications, will be
able to process these crucial simulations efficiently,
changing the way users compute by making the use of 3D
routine.
Taking Inertial Confinement Fusion (ICF) simulations to a higher level
“Sierra provided an unprecedented (98 billion cell)
simulation of two-fluid mixing in a spherical geometry.
Understanding hydrodynamic instability and the
transition to turbulence process is important in inertial
confinement fusion and High Energy Density (HED)
Physics.”
—Chris Clouse
33. 33
LLNL-PRES-767803
Rob Rieben
LLNL
To maximize the full potential of new computer
architectures like Sierra’s, we are developing new
algorithms that play to the strengths of the machine—
specifically the high intensity of compute operations
available to data that remain local.
We are also developing next-generation simulation codes
for inertial confinement fusion and nuclear weapons
analysis that employ high-order, compute-intensive
algorithms that maximize the amount of computing done
for each piece of data retrieved from memory. These
schemes are very robust and should significantly improve
the overall analysis workflow for users. These advanced
simulation tools enabled by Sierra will improve throughput
along two axes: faster turnaround and less user
intervention.
Developing the next generation of
algorithms and codes
“The combination of our next generation,
compute intensive, high-order algorithms with
the processing power of Sierra will deliver an
exciting new era in multi-physics simulations.”
—Rob Rieben
34. 34
LLNL-PRES-767803
Early Science Applications
Early Science - That time between when the system is first booted up (~3/1/18) and when it is transitioned to
the classified network for Programmatic use (~1/20/19) when friendly users help “shake down” the system.
Characteristics: Buggy or incomplete software stack, random bad hardware, rebooted on a whim, nursing
problems along through the night, etc…
Benefits… when it’s up, you can sometimes get the whole system to yourself for as long as you can keep things
running. Invaluable feedback to the facility and vendors, and a one-time shot for open science apps
35. 35
LLNL-PRES-767803
Arthur Rodgers
LLNL
The character of strong earthquake shaking near a fault is
highly variable and poorly constrained by empirical data.
Supercomputers enable simulation of earthquake motions
to investigate the hazard and risk to buildings and
infrastructure before damaging events occur. The large
scale (~100 km) and fine detail (~10 m) of high frequency
waves (>5 Hz) from damaging earthquakes require today’s
most powerful computers.
Using SW4-RAJA, a 3D seismic simulation code ported to
GPU hardware, LLNL researchers are able to increase the
resolution of earthquake simulations to span most
frequencies of engineering interest on regional domains
with rapid throughput to enable sampling of various
rupture scenarios and sub-surface models.
Sierra advances resolution of
earthquake simulations
“Sierra’s thousands of GPU-accelerated nodes allow
SW4-RAJA to compute earthquake ground motions
with 100’s of billions of grid points in shorter run
times so we can resolve high-frequency waves and
investigate different rupture scenarios or earth
models.”
—Arthur Rodgers
37. 37
LLNL-PRES-767803
Computational Domain and Rupture
• 120 km x 80 km x 35 km
• Centered on:
• San Leandro
• hypocenter (green star)
• Rupture: MW 7.0
• Dimensions: 77 x 13 km
• Draped onto 3D geometry of
the Hayward Fault
38. 38
LLNL-PRES-767803
National Lab HPC resources we run on:
Cori-II at LBNL/NERSC and Sierra at LC/LLNL
Cori Phase-II at LBNL/NERSC
#10 on Top 500
27 PetaFLOPS (peak)
68 Intel Xeon Phi (CPUs) per node
Sierra at LLNL
#3 on Top 500
119 PetaFLOPS (peak)
2 IBM POWER (CPU) per node and
4 NVIDIA GPUs per node
SW4 port to GPU
platforms supported by 3-
year LLNL internal project
Future platforms will rely heavily on GPU’s
39. 39
LLNL-PRES-767803
SW4 (CPU)
fmax = 5.0 Hz on Cori-II
SW4-RAJA (GPU)
fmax = 5.0 Hz on Sierra
vSmin = 500 m/s vSmin = 500 m/s
PPW = 8 PPW = 8
T = 90 s T = 90 s
hmin = 12.5 m hmin = 12.5 m
Npoints = 25.8 billion Npoints = 25.8 billion
Ntime-steps = 63,063 Ntime-steps = 63,063
Checkpointing every 4000 time-steps No checkpointing
Ran on 8192 nodes (85%) of Cori-II
(524,288 cores CPUs)
Ran on 256 nodes (6%) of Sierra
(1024 GPUs)
Solver time 10.3 hours Solver time 10.3 hours
Summer 2018
Hayward Fault MW 7.0 runs
40. 40
LLNL-PRES-767803
Verification of SW4 and SW4-RAJA:
Peak Ground Velocity maps
HF MW7.0 3DTOPO
fmax = 5 Hz
hmin = 12.5 m
Two codes, different platforms PGV maps
agree to machine precision!
Cori-II: PGV = 2.1797349453 m/s
Sierra: PGV = 2.1797349453 m/s
41. 41
LLNL-PRES-767803
Verification of SW4 and SW4-RAJA:
Waveforms agree to single precision (SAC files)
SW4 on Cori-II 5 Hz
SW4-RAJA on Sierra 5 Hz Difference ~1e-7
SW4-RAJA on Sierra 5 Hz
SW4 on Cori-II 5 Hz
overplotted
Two codes, different platforms waveforms
agree to machine precision!
42. 42
LLNL-PRES-767803
Comparison of near-fault accelerations for a
range of resolutions (fmax = maximum frequency)
As frequency content increases,
amplitudes increase and are
more consistent with
observations
Doubling fmax requires 16x
more computational effort {
Sierra run Aug. 2018
fmax = 10.0 Hz
Livermore 37.6 km
45. 45
LLNL-PRES-767803
Joint Design of Advanced Computing Solutions for Cancer
A collaboration between DOE and NCI
4
Pilot 2
RAS Biology in
Membranes
Pilot 3
Precision
Oncology
Surveillance
Pilot 1
Pre-clinical
Model
Development
Yvonne Evrard (FNLCR)
Rick Stevens (ANL)
Dwight Nissley (FNLCR)
Fred Streitz (LLNL)
Lynn Penberthy (NCI)
Gina Tourassi (ORNL)
Courtesy: Fred Streitz, Jim Glosli, LLNL
46. 46
LLNL-PRES-767803
Oncogenic RAS is responsible for many human cancers
93% of all pancreatic
42% of all colorectal
33% of all lung cancers
1 million deaths/year world-wide
No effective inhibitors
Pathway transmits
signals
RAS is a switch
oncogenic RAS is “on”
RAS localizes to the
plasma membrane
RAS binds effectors (RAF)
to activate growthSimanshu,Cell 170, 2017
47. 47
LLNL-PRES-767803
Essential strategy: utilize appropriate scale methodology
for each component
Membrane evolves on ms
time frame across µm…
…while protein interactions
involve µs across nm
A problem of length
and time scale:
48. 48
LLNL-PRES-767803
Essential strategy: utilize appropriate scale methodology
for each component
Model membrane
with RAS at micron
(continuum) scale
Model protein behavior
at molecular scale
49. 49
LLNL-PRES-767803
• Use variational autoencoder to
learn reduced order model (ROM)
of system
• Map every region of simulation
into this ROM
• Rank every patch in the
simulation by distance from
others (”uniqueness”) in ROM
• Identify the most unique patches
as the most interesting
From continuum to molecular modeling
1. Identify regions of interest
using AI techniques
50. 50
LLNL-PRES-767803
1. Identify region
From continuum to molecular modeling
• Lipid positions are randomly generated consistent with composition
• Ras proteins, if present, are inserted with appropriate conformation
• System is equilibrated to “smooth edges”
2. Generate particle positions
52. 52
LLNL-PRES-767803
1. Identify region
From continuum to molecular modeling
3. Simulate at molecular resolution
4. Feedback molecular information to
continuum model
2. Generate particle positions
Feedback
53. 53
LLNL-PRES-767803
Preliminary Results
• 50nm X 50nm high-res study w/ 2 RAS
proteins (40 μs)
• Investigate phenomena witnessed
originally in μm X μm scale simulation
• Aggregation/repulsion of charged lipids
(PAP6, PAPS, DIPE, CHOL) following
“collision” of RAS
• Results demonstrate importance of
time and length-scale for simulation
54. 54
LLNL-PRES-767803
❑ The lifetime of a neutron is ~15 min. (~881 sec), and its value has a profound effect on the mass composition of the universe.
❑ These two methods result in measurements of the lifetime that
have a 99% probability of being incompatible - why?
❑ Is there an unaccounted for systematic?
❑ Or more exciting, is there new physics causing them to
disagree?
❑ Which method is consistent with the prediction from the
Standard Model?
❑ Answering this question requires access to leadership class
supercomputers
Bottle experiments trap neutrons in a “bottle” and
measure how many are left after a period of time. If
neutrons also decay to other new unknown particles (?
dark matter ?) results will not agree with the known
theory of physics (QCD).
In the known theory of physics (QCD) neutrons decay to
protons. Beam experiments count how many protons
emerge from a beam of neutrons.
❑ Using Sierra the team simulated the fundamental theory of Quantum
Chromodynamics (QCD) and calculated the lifetime to an
unprecedented theoretical accuracy (blue bar). This preliminary
result has an uncertainty that is 4 times smaller than the next best
result in the field (equivalent to a factor of 16 times more data).
❑ This early science time allowed us to form a concrete plan to further
reduce the uncertainty and achieve a discriminating level of
precision in the theoretical prediction.
The Neutron Lifetime on Sierra Early Science
Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration)
beam
! = 887.3 ± 3.1s
bottle
! = 879.4 ± 0.6s
Courtesy: Pavlos Vranas, LLNL
55. 56
LLNL-PRES-767803
The Neutron Lifetime on Sierra Early Science
Pavlos Vranas LLNL, André Walker-Loud LBNL, et. al. (CalLat Collaboration)
❑ The vertical gray band denotes the physical value of pion mass - these points are significantly more expensive than the rest to compute
- but the most valuable for the final predictions
❑ The green point in our publication cost as much computing time as all the other points combined
❑ The green point from Sierra has 10x more statistics than our publication
❑ The red point from our publication was not useful
❑ The red point from Sierra came from an entirely new calculation and is now very useful
❑ The blue point from Sierra was entirely unattainable from previous computers (it still needs more statistics to be useful)
Nature 558 (2018) no. 7708, 91-94 Sierra Early Science (preliminary)
Sierra
0.00 0.05 0.10 0.15 0.20 0.25 0.30
✏⇡ = m⇡/(4⇡F⇡)
1.10
1.15
1.20
1.25
1.30
1.35
gA
model average gLQCD
A (✏⇡, a = 0)
gPDG
A = 1.2723(23)
gA(✏⇡, a ' 0.15 fm)
gA(✏⇡, a ' 0.12 fm)
gA(✏⇡, a ' 0.09 fm)
a ' 0.15 fm
a ' 0.12 fm
a ' 0.09 fm
56. 57
LLNL-PRES-767803
Stanford PSAAP Center Results
Extreme-Scale HPC with the Soleil-X Multiphysics Solver
M. Papadakis1, L. Jofre1, G. Iaccarino1 & A. Aiken1
▪ Research Objectives:
- Performance & scalability extreme scales
- Demonstrate potential of Soleil-X
▪ Performance & Scalability Analysis:
- Strong and Weak Scaling
- Cost distribution different physics
- Time-to-solution
- Task-mapping (CPU/GPU) studies
▪ Physics modeling:
- Compressible flow formulation
- Lagrangian particle tracking
- Algebraic & DOM radiation model
▪ Computational approach:
- Written in Regent
- Targets Legion runtime
- GASNet library communication
- Realm for OpenMP
- CUDA toolkit for GPU
1 2 4 8 16 32 64 128 256 512 1024 2048
Number of Nodes
0.0
0.2
0.4
0.6
0.8
1.0
Efficiency
Weak Scaling: Total Solver
Soleil-X
Max. comm. pattern
Runtime bottleneck
Investigating …
Conclusions & Ongoing Work
- At present, Soleil-X is (in average) 1.7 times faster than Soleil-MPI in terms of time-to-solution
- Soleil-MPI presents very good weak scaling efficiency on Titan up to 256 nodes
- Soleil-X presents very good weak scaling efficiency both on Titan and Sierra up to 256 nodes
- Ongoing investigation of the runtime bottleneck observed at 512 nodes
57. 58
LLNL-PRES-767803
§ A high resolution simulation of two-fluid mixing in a
spherical geometry was ran to gain insight into the
growth of a Rayleigh-Taylor instability.
§ High resolution simulations of instability growth are
not practical for routine use, so high resolution
simulations like this help guide the development of
sub-grid models that capture instability effects with
much less computational cost, which are used for
Inertial Confinement Fusion (ICF) calculations.
We ran a massive turbulent fluid mixing simulation on Sierra in
October 2018
In-situ Visualization of
Mixing LayerCourtesy: Cyrus Harrison, LLNL
58. 59
LLNL-PRES-767803
Highlights:
§ The 97.8 billion element simulation ran across 16,384 GPUs on 4,096 Sierra Compute
Nodes
§ The simulation application used CUDA via RAJA to run on the GPUs
§ Time-varying evolution of the mixing was visualized in-situ using Ascent, also
leveraging 16,384 GPUs
§ Ascent leveraged VTK-m to run visualization algorithms on the GPUs
§ The last time step was exported to the parallel file system for detailed post-hoc
visualization using VisIt
This simulation was the first to utilize over 16,000 GPUs on
Sierra
This successful simulation is the result of many years of preparation!
59. 60
LLNL-PRES-767803
§ Running physics algorithms on the GPUs allows us to
scale and turn around massive simulations in a power
efficient way
§ For this simulation, the physics calculations alone
(excluding including setup, I/O, etc) took 60 hours of
compute time on Sierra’s V100 GPUs
§ We estimate using the entire Sequoia system the same
problem would take 30 days of physics compute time
Porting has unlocked impressive performance on Sierra and
positions us well for future architectures
Compared Node to Node to other HPC architectures, we expect Sierra will
provide 6x to 12x faster simulation turn around time
60. 61
LLNL-PRES-767803
Similarly, DOE’s visualization community has invested over many
years to create tools for in-situ and GPU-enabled visualization
2010 2011 2012 2013 2014 2015 2016 2017
EAVL
DAX
PISTON
ISAV15
Strawman
SC16
Performance
Modeling
(Strawman)ASCAC Subcommittee
on Exascale Computing
Strawman
ECP Begins
ISAV17
Ascent
Data Parallel
Visualization
Research
Dist. Mem.
2018
In-situ viz of
97.8 B Element
Simulation on
Sierra using
Ascent
61. 62
LLNL-PRES-767803
Knowing this, we used a mix of in-situ (in-memory) and post-hoc (file-based)
visualization:
§ In-situ: Use used Ascent (https://github.com/alpine-dav/ascent) to visualize the time-
varying evolution of the mixing layer and export the final mesh data to compressed
HDF5 files for post-hoc exploration
—Ascent used VTK-m (http://m.vtk.org/) to run visualization algorithms on the GPUs
§ Post-hoc: We used VisIt (https://visit.llnl.gov) to explore details of final state at the
final simulation state
On Sierra, computation far outpaces our ability to save data to
the parallel file system for detailed post-hoc visualization
In-situ visualization of the time-varying details of the simulation
avoided terabytes of I/O vs post-hoc visualization
62. 63
LLNL-PRES-767803
We ran a massive turbulent fluid mixing simulation on Sierra in
October 2018
Post-hoc Visualization of
Mixing Layer
63. 64
LLNL-PRES-767803
Conclusion: This successful simulation is the result of many years
of preparation in both our simulation and visualization tools
Highlights:
§ The 97.8 billion element simulation ran
across 16,384 GPUs on 4,096 Sierra
Compute Nodes
§ The simulation application used CUDA via
RAJA to run on the GPUs
§ Time-varying evolution of the mixing was
visualized in-situ using Ascent, also
leveraging 16,384 GPUs
§ Ascent leveraged VTK-m to run visualization
algorithms on the GPUs
§ The last time step was exported to the
parallel file system for detailed post-hoc
visualization using VisIt
Links:
§ LLNL's Sierra Supercomputer:
https://computation.llnl.gov/computers/sierra
§ RAJA https://github.com/llnl/raja
§ https://github.com/alpine-dav/ascent
§ http://m.vtk.org/
§ https://visit.llnl.gov
§ https://github.com/llnl/conduit
VisIt
64. 65
LLNL-PRES-767803
Future simulations for complex systems will
be guided by machine learning
Experimental data
Model
predictions,
uncertainties
Complex
design
model The next
simulation?
Simulation
ensemble
pipeline
On-line
machine
learning
Cognitive simulation learns an approximate
model for simulation state dynamics and
output as it runs …
The next
experiment?
We’re developing an integrated roadmap for NNSA and LLNL
in cognitive simulation
65. 66
LLNL-PRES-767803
The Virtuous Circle of HPC Simulation and Machine Learning
AI and machine learning is driven by
large amounts of training data Which gets funneled down to a small amount
of data useful for decision analysis
HPC Simulation starts with a small
amount of data as initial conditions
And generates huge amounts of data
suitable for AI training
66. 67
LLNL-PRES-767803
Cognitive simulation integrates machine
learning with simulation at multiple levels
Learning in the
simulation
loop
Learning on the
simulation loop
Steering the
simulation
loop
T
10
0
T
60
00
T
0
Molecular dynamics systems that decide when
and where to switch resolution
100X longer
simulation time
spans
Simulation systems that can predict
and correct mesh tangling
Greatly reduced
time on mesh
tuning
Learned models integrating
simulation and experiment
Improved
prediction,
Efficient UQ,
New designs
67. 68
LLNL-PRES-767803
Scientific computing challenges differ from industry challenges
• Different type of scaling
• Data augmentation
techniques
• Image classification defined
by fine features
• Laws of physics
Goal is to leverage industry advancements and specialize when needed.
Validation needs to be a part of this!
68. 69
LLNL-PRES-767803
• Results must honor laws conservation (e.g. energy, mass)Physics Constraints
• It is sometimes intractable to generate sufficient training dataSparse Data
• Interpretability of results necessary for predictive scienceExplainability
• Lots of data with no standards for specification or sharingData Collection
Open Research Questions for Machine Learning in Science
These are all critical research areas to pursue if we are to demonstrate the
value of AI techniques in the pursuit of predictive science
69. 70
LLNL-PRES-767803
Intelligence built into HPC simulations will change how we tackle
problems that have overwhelming amounts of data
Higher-fidelity subgrid models run
inline with multi-physics
integrated codes
Exascale Approach
Analysis is performed by knowing
where to look for interesting
features in the data
Performance optimized in
applications to be best for
average case
AI mimics high fidelity
models at a fraction of the
computational cost
AI-based Approach
Smart integrated AI
identifies features of
interest in throngs of data
On-the-fly recompilation
of code with optimal
settings for specific run
Applications run faster, we can
increase number of simulations
for UQ and validation
Expected Benefits
Insights from exascale
simulations aren’t lost in a data
deluge
More effective use of complex
exascale architectures
Intelligent Simulation complements our exascale investments.
Sierra will allow us to research and explore the possiblities
70. 71
LLNL-PRES-767803
§ Multi-scale problems are difficult to model
—Often solved using low-fidelity table lookups
—More accurate models are user-intensive and/or computationally expensive
§ Surrogate models can be trained on high-fidelity physics models
—Expensive calculations may be reduced to function calls
Surrogate models to increase simulation coupling for
multi-scale problems
Ab-initio Atoms Long-time Microstructure Dislocation Crystal Continuum
Inter-atomic
forces, EOS,
excited states
Defects and
interfaces,
nucleation
Defects and
defect structures
Meso-scale multi-
phase, multi-grain
evolution
Meso-scale
strength
Meso-scale
material response
Macro-scale
material response
71. 72
LLNL-PRES-767803
The ATOM partnership is exploring new active
learning approaches to accelerate drug design
Approach: An open public-
private partnership
-Lead with computation
supported and validated by
targeted experiments
-Data-sharing to build models
using everyone’s data
Partners: LLNL, GSK, NCI,
UCSF
Product: An open-source
framework of tools and
capabilities
25 FTE’s in shared Mission
Bay, SF space
R&D started March 2018
72. 73
LLNL-PRES-767803
ADAPD: Advanced Data Analytics for Proliferation
Detection
Indirect Data
New and responsive analytics
capabilities at the Labs
Focused Data
Science R&D
SME process and
physics models
NNSA R&D Testbeds
Establishing the next
generation of proliferation
analytics expertise
• Predictive Models
• MultiModal Detection
• Crossing Facilities and
Scenarios
DOE/NNSA
Partnerships
Student and University
Programs
Focus on Hard
Problems
Leveraging Growing
Industry Investments
Innovation in
low-profile
nuclear
proliferation
detection
73. 74
LLNL-PRES-767803
LLNL’s Data Science Institute (established 2017)
Big Machines. Big Data. Big Ideas.
The Data Science Institute acts as the central hub for
all data science activity at LLNL working to help lead,
build, and strengthen the data science workforce,
research, and outreach to advance the state-of-the-
art of our nation's data science capabilities.
The Data Science Institute acts as the central hub for all data science activity at LLNL,
working to help lead, build, and strengthen the data science workforce, research, and
outreach to advance the state of the art of LLNL’s data science capabilities.
https://data-science.llnl.gov/
74. 75
LLNL-PRES-767803
Data Science
Council
§ Inform S&T
investments
§ Communication
Workforce
Development
§ Reading groups
§ Courses
§ Summer Program
Web Presence
§ External/Internal
website
§ Mailing list
§ Slack
Outreach
§ Academic
engagement
§ Info sessions
§ Recruiting
Seminar
Series/
Workshop
§ Invited speakers
§ UC/LLNL multi-
day workshop
§ Strengthen and sustain LLNL’s data science workforce
§ Enhance internal coordination and communication of the data science
workforce across disciplines and programs
§ Promote and expand the visibility of data science work at LLNL
§ Evolve strategic data science vision and guide S&T investments
Strengthening the LLNL Data Science Workforce
Fostering a sense of community
75. 76
LLNL-PRES-767803
Data Science Summer Institute
The Focus is the Future
1000+ applicants in FY18
12-week flexible internship focused on data science
Students funded 50% to work on solutions to challenge problems,
attend courses and seminars, and strengthen skills.
50 total students in FY17+FY18
§ 26 students in FY18
§ 24 students in FY17
Visiting Faculty
§ James Flegal, UC Riverside
§ Robert Gramacy, Virginia Tech
Challenge Problems
§ Topology Optimization
§ Machine Vision
§ Multimodal Data Exploration
§ Cyber
76. 77
LLNL-PRES-767803
§ The 125 PF Sierra system is currently being stood up for use in the Stockpile
Stewardship mission of the NNSA
§ Preparation for the pivot to heterogenous computing was difficult, but is
demonstrating significant benefits in computational results
§ The Open Science period for Sierra allowed open applications significant allocations to
demonstrate cutting-edge science results
§ LLNL is pursuing an application strategy that will encompass both traditional exascale
computing needs, as well as the burgeoning AI/ML trends.
Conclusions
77. Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United
States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or
implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or
imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security,
LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government
or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.