Better Information Faster: Programming the Continuum

Better Information Faster
Programming the Continuum Ian Foster

A life in supercomputing
Ken Kennedy
1945-2007
1998-2000

A life in supercomputing … papers accepted to SC

A life in supercomputing … papers accepted to SC … and co-authors

A life in supercomputing … papers accepted to SC … and student and postdoc co-authors

A life in supercomputing … papers accepted and rejected Data available

Data available
A life in supercomputing … papers accepted and rejected
Future

An overarching concern: Better information faster

90 days
in 1980

90 msec
in 2022

A
Better information faster via computation
Subject to constraints on: Time, computation, energy use, money, …
Q

A
Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c in fiber
Noperations
Nmeters
Nbytes
Better information faster via computation
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Q

Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c m/s
Better information faster as a systems problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
Compression
Increase by:
Better optics
Parallelism
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics

Better systems: Do many different things at once
Swift, Parsl, …
Swift: Parallel Computing 2007
HPF+MPI: SC’96, JPDC’97
Parsl: HPDC’19
Colmena: MLHPC’21

Managed research acceleration services
Cloud hosted for reliability + global footprint for scientific impact
https://globus.org

Collins Australian Clear
School Atlas, 1964
Wellington, New Zealand
to Madrid, Spain:
19,855 km
Circumference of the earth:
40,075 km
Semi-circumference:
20,037 km

Megascale
Gigascale
Terascale
Petascale
Exascale
Zettascale
Faster components Increased connectivity
Dial-up lines
155 Mbps
10 Gbps
5G
400 Gbps
Free space optics
Computing continuum
Time
Better information faster in a converged world

A universal communication fabric
Aalyria’s “Spacetime” [originally “Minkowski”] platform Free-space optics

A universal communication fabric
Free-space optics
”Henceforth, space for itself, and time for itself, are doomed to fade
away into mere shadows, and only a kind of union of the two will
preserve an independent reality.” – Hermann Minkowski, 109
Aalyria’s “Spacetime” [originally “Minkowski”] platform

23
500 km
2.5 ms
The space-time continuum in converged systems
5 ms
7.5 ms
10 ms
0 km
C2
C1
Misquoted [2022]: “Henceforth, location for itself, and speed for itself, are doomed to fade away
into mere shadows, and only a kind of union of the two will preserve an independent reality."
The behaviors of the two
computers are indistinguishable
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
T1
T2
Time
Space

24
0km
(Illinois)
2000 km
(Virginia)
10 ms
Example: High energy physics trigger analysis
T1 = 2 seconds
on CPU
(not to scale)
T2 = 30 msec
on FPGA
Local: 2000 msec
Remote: 30 + 10 + 10 = 50 msec
40x acceleration
40 ms
50 ms
Nhan Tran, FermiLab, et al. arXiv:1904.08986

ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
Example: High energy diffraction microscopy
Cerebras CS-2
1102 1102
1102 1102
Model (re)training on local GPU:
1102 seconds
@ LCLS

ources
Request
Example: High energy diffraction microscopy
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS

Multiple techniques are used in these two examples
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics

Better information faster in the computing continuum
is an application-infrastructure codesign problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
What to compute
What algorithms to use
Where to compute
Where to place computers
How to communicate
What to communicate
What networks to use
Where to deploy networks

Continuum-aware programming
Compute fabric Data fabric
Trust fabric
Compute
anywhere
Access data
anywhere
Ensure only authorized operations
occur in trusted locations
Develop efficient, reliable,
reusable applications
New methods and tools are needed to enable
exploration and exploitation

Trust fabric
We are exploring new programming abstractions
https://braid-project.org
https://globus.org
“Flows”

Trust fabric
Globus
Transfer
funcX
Globus
Automation
Services
Globus
Auth
We are exploring new programming abstractions
and managed research acceleration services*
https://globus.org
*Cloud hosted for reliability + global footprint for scientific impact

ources
Request
7 seconds
7 + 19 + 5 = 31 s
5 seconds
19 seconds
“Flows” as an abstraction for smart instruments
ModelTrain

Globus automation services: Managed research acceleration
services for flow specification, execution, and management
R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513

def F(in_args):
# do something
return results
fxc.register_function(F)
$ pip install funcx-endpoint
$ funcx-endpoint configure myep
$ funcx-endpoint start myep
funcX: A managed research acceleration service that
implements a universal computing fabric
https://funcx.org
Z. Li et al., https://doi.org/10.48550/arXiv.2209.11631
F(ep, arg)
F(ep, arg)
F(ep, “A”)
f = fxc.run(“A”,
endpoint_id=ep,
function_id=F)
R = fxc.get_result(f)
Deploy funcX agents
Register functions
Run functions

New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
35
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Seven flows in use at the
Advanced Photon Source

New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
36
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Execution times at the
Argonne Leadership
Computing Facility

(5) “Run F()”
F()
@A
node A
F()
Auth
(3) “Run F()
on node A”
Flows (2) “Provide access
token for funcX”
(4) “Provide
dependent token
for F() on node A”
(6) F() runs
Consent
Dependent
token
Token hierarchy
(1) “Run flow”
with consent
funcX
Access
token
Cloud-hosted
services
Globus Auth: A managed research acceleration service
providing distributed authorization with delegation
Not shown:
Token refresh
Who do I trust to act on
my behalf, when, and for
what purpose?
Leverage OAuth to:
• Increase security via
scoped credentials
• Improve usability via
browser compatibility
S. Tuecke et al., https://doi.org/10.1109/eScience.2016.7870901, R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
1.3 B access tokens
2.7 M consents

Polaris
Bebop
Cluster
Argonne
Leadership
Computing
Facility
Laboratory
Computing
Research
Center
Eagle store
APS
Computing
Orthros Cluster
APS DM system
Portal
server
Portal
server
Theta
Advanced
Photon
Source Key: funcX agent
Globus Connect agent
Globus-accessible
storage and
computing
(10,000s of systems)
Deployment at science facilities
Elements of an
Integrated Research
Infrastructure

Foucault’s pendulum, Paris Panthéon
Problems we have solved, in part
• Authorization (“enterprise grade
security, consumer-grade usability”)
• Universal connectivity
• Location-independent computing
• Programming
New opportunities and challenges
• New devices and applications
• Massive performance and scale
• Energy as a ubiquitous constraint
• Global data space architecture
• Programming
Utility (‘60s) … xSP (’80s) … Grid (‘90s) … Cloud (2010) … Edge … Continuum …
The pendulum swings between centralization and decentralization

There are important questions to answer
• What will continuum networks look like in practice?
• Energy vs. accuracy vs. cost vs. … tradeoffs
• Where will we place computing and storage?
• What new instruments will be created?
• What new applications and new science will be enabled?
• What new abstractions, services, and tools do we need?
• Do we need new continuum-aware algorithm design methods?
• What will be economic foundation of this new computing fabric?
• (How) Will we integrate quantum sensors, networks, computers?

Thank you for your attention!
Two wonderful institutions: Argonne National Laboratory and the University of Chicago
Federal agencies for continued support: DOE, NSF, NIH, NIST
Wonderful colleagues: Rachana Ananthakrishnan, Ben Blaiszik, Pete Beckman, Charlie Catlett,
Kyle Chard, Ryan Chard, Carl Kesselman, Rick Stevens, Vas Vasiliadis, Logan Ward, & many more
To learn more about our work: https://labs.globus.org
Ask questions or share your thoughts: foster@anl.gov
Experiment with tools:
Thanks to:
https://doi.org/10.1016/j.patter.2022.100606

Better Information Faster: Programming the Continuum

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Better Information Faster: Programming the Continuum

Ähnlich wie Better Information Faster: Programming the Continuum (20)

Mehr von Ian Foster

Mehr von Ian Foster (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Better Information Faster: Programming the Continuum

Hinweis der Redaktion