[Talk presented at Monterey Data Conference, August 31, 2022]
Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Thus, methods are required for configuring and running distributed computing pipelines—what we call flows—that link instruments, computers (e.g., for analysis, simulation, AI model training), edge computing (e.g., for analysis), data stores, metadata catalogs, and high-speed networks. We review common patterns associated with such flows and describe methods for instantiating these patterns. We present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages powerful computers for data inversion, machine learning model training, or other purposes. We also discuss implications of such methods for operators and users of scientific facilities.
1. Linking Scientific Instruments and Computation:
Patterns, Technologies, Experiences
Ian Foster
The University of Chicago
Argonne National Laboratory
foster@anl.gov
Crescat scientia; vita excolatur
https://arxiv.org/abs/2204.05128
https://arxiv.org/abs/2208.09513
2. A new generation of
scientific instruments
New sensors produce data at high
velocities and in large volumes
New methods and structures are
required to capture and process
data, and to feed back to sensors
Increasing need to harness HPC,
cloud, edge computers
An instrument becomes a set of
flows, overlaid on distributed
physical resources and software
Mark Boland, https://bit.ly/3cfSosk, 2017
6. A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
7. A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
Action providers
Implement actions.
Resource-independent
Compute Action Provider: Run function at A.
Transfer Action Provider: Transfer from A to B.
Search Action Provider: Publish metadata.
…
8. A modular, extensible approach to creating and running flows
Flows
Capture useful patterns as
sequences of actions.
Resource-independent
Action providers
Implement actions.
Resource-independent
Fabric
Implements auth, data, and
compute APIs for
manipulating resources
Authenticate user.
Delegate credentials.
Manage file transfers.
Run jobs on computers.
Access data catalog.
…
Compute Action Provider: Run function at A.
Transfer Action Provider: Transfer from A to B.
Search Action Provider: Publish metadata.
…
16. AI model
training
AI model
deployment
Data collection
& transfer
Cerebras
Catalog &
publish
Detector
Injector
x-ray
Target
FAIR data
Data reduction,
refine structures
Data collection
& transfer
AI accelerators, HPC
Ptychographic
reconstruction
Data collection
& transfer (raw)
Data collection &
transfer (position)
AI accelerators
Serial synchrotron crystallography
Ptychography
High energy diffraction microscopy
Flows have been developed for light source
data analysis, biomedical and materials
science data ingest, on-demand simulation, …
17. Determining protein structures 10-100x faster
“These data services have taken the
time to solve a structure from
weeks to days and now to hours”
Darren Sherrell, SBC beamline
scientist APS Sector 19
• Developed new automation pipeline to
collect data, analyze and visualize the data,
solve protein structure and load results into a
searchable portal for real-time feedback
• Achieved over 10-100x speed up in time to
solution of protein structures at APS beamline
• Leveraged unique DOE facilities at Advanced
Photon Source (SBC Sector 19) and ALCF
(Theta/ ThetaGPU, Petrel, and Data Portals)
Deposited first results in open repositories
Automation pipeline
(Chard, Vescovi, Foster, Blaiszik, Sherrell, Joachimiak, et al.)
ALCF Theta
ALCF Theta
ALCF Theta
Data Portals
APS
ALCF
Petrel
ALCF Theta
17
18. Flow invocations 2020-21 for five APS experiments
Numbers vary due to facility and experimental schedules.
19. We collect detailed performance data on flows
https://arxiv.org/abs/2204.05128
Transfer, compute,
and cataloging
costs for median
flows
20. Round-trip latencies for various action providers
• Current architecture
has ~1 sec minimum
latency due to cloud
interaction
• funcX latencies higher
due to polling strategy
• Both can be improved
as needed
21. We build on a universal auth, compute, & data fabric
Globus
Auth
Authentication and delegation mechanisms to control
what happens where
Run functions anywhere funcX deployed
Access data anywhere Globus Connect deployed
* See also: Integrated Research Infrastructure, computing continuum, grid
Globus
Connect
24. Globus hybrid “SaaS” model: Compute fabric
funcX
agent
funcX
agent
Customer owned and
administered computer
with funcX agent
running on it
funcX service orchestrates function
execution via communication with
funcX agent
26. Building computationally-enhanced instruments:
There is much more to be done!
• We have worked so far with light sources and data ingest
pipelines
• We are pleased with adaptability and reliability
• Work required in capability (e.g., iteration) and performance
• Others are applying tools to microscopes and other
instruments
• New action providers are needed for instrument control
• We are eager to find partners who want to work with us on
developing and/or applying these methods and tools!
27. Thanks to talented colleagues!
Linking Scientific Instruments & HPC: Patterns, Technologies, Experiences
Globus Automation Services: Research process automation across the space-time continuum
Rachana
Ananthakrishnan
Josh Bryan Kyle Chard Ryan Chard Kurt McKee Jim Pruyne Brigitte Raumann
https://arxiv.org/abs/2204.05128 https://arxiv.org/abs/2208.09513
Raf Vescovi Ryan Chard Nick Saint Ben Blaiszik Jim Pruyne Tekin Bicer
Alex Lavens Zhengchun Liu Mike Papka Suresh Narayanan Nicholas Schwarz Kyle Chard
and
And sponsors
And the rest of
the ALCF, APS, &
Globus teams
28. Recap: Enabling
new instruments
Reusable flows
composed from an
extensible set of
actions
Built on global
auth, compute, data
fabric
Join us in applying
these methods!
https://arxiv.org/abs/2204.05128
https://arxiv.org/abs/2208.09513
https://www.globus.org/platform/services/flows
Hinweis der Redaktion
Probe. Instrument. Meter.
Metacomputing revisited
1010 x faster
105 x more tasks
106 x more data
Link HPC, AI, instruments
c still 3 x 108 m/s
Metacomputing revisited
1010 x faster
105 x more tasks
106 x more data
Link HPC, AI, instruments
c still 3 x 108 m/s
Metacomputing revisited
1010 x faster
105 x more tasks
106 x more data
Link HPC, AI, instruments
c still 3 x 108 m/s
Need to mention other Braid people!
Eliu Huerta
Bogdan Nicolae
Justin Wozniak
MENTION Eliu work?