14. A
Better information faster via computation
Subject to constraints on: Time, computation, energy use, money, …
Q
15. A
Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c in fiber
Noperations
Nmeters
Nbytes
Better information faster via computation
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Subject to constraints on: Time, computation, energy use, money, …
Q
16. Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c m/s
Better information faster as a systems problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
Compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
Subject to constraints on: Time, computation, energy use, money, …
17. Better systems: Do many different things at once
Swift, Parsl, …
Swift: Parallel Computing 2007
HPF+MPI: SC’96, JPDC’97
Parsl: HPDC’19
Colmena: MLHPC’21
18. Managed research acceleration services
Cloud hosted for reliability + global footprint for scientific impact
https://globus.org
19. Collins Australian Clear
School Atlas, 1964
Wellington, New Zealand
to Madrid, Spain:
19,855 km
Circumference of the earth:
40,075 km
Semi-circumference:
20,037 km
21. A universal communication fabric
Aalyria’s “Spacetime” [originally “Minkowski”] platform Free-space optics
22. A universal communication fabric
Free-space optics
”Henceforth, space for itself, and time for itself, are doomed to fade
away into mere shadows, and only a kind of union of the two will
preserve an independent reality.” – Hermann Minkowski, 109
Aalyria’s “Spacetime” [originally “Minkowski”] platform
23. 23
500 km
2.5 ms
The space-time continuum in converged systems
5 ms
7.5 ms
10 ms
0 km
C2
C1
Misquoted [2022]: “Henceforth, location for itself, and speed for itself, are doomed to fade away
into mere shadows, and only a kind of union of the two will preserve an independent reality."
The behaviors of the two
computers are indistinguishable
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
T1
T2
Time
Space
24. 24
0km
(Illinois)
2000 km
(Virginia)
10 ms
Example: High energy physics trigger analysis
T1 = 2 seconds
on CPU
(not to scale)
T2 = 30 msec
on FPGA
Local: 2000 msec
Remote: 30 + 10 + 10 = 50 msec
40x acceleration
40 ms
50 ms
Nhan Tran, FermiLab, et al. arXiv:1904.08986
25. ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
Example: High energy diffraction microscopy
Cerebras CS-2
1102 1102
1102 1102
Model (re)training on local GPU:
1102 seconds
@ LCLS
26. ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
Example: High energy diffraction microscopy
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS
27. Multiple techniques are used in these two examples
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
28. Better information faster in the computing continuum
is an application-infrastructure codesign problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
What to compute
What algorithms to use
Where to compute
Where to place computers
How to communicate
What to communicate
What networks to use
Where to deploy networks
29. Continuum-aware programming
Compute fabric Data fabric
Trust fabric
Compute
anywhere
Access data
anywhere
Ensure only authorized operations
occur in trusted locations
Develop efficient, reliable,
reusable applications
New methods and tools are needed to enable
exploration and exploitation
31. Continuum-aware programming
Compute fabric Data fabric
Trust fabric
Globus
Transfer
funcX
Globus
Automation
Services
Globus
Auth
We are exploring new programming abstractions
and managed research acceleration services*
https://braid-project.org
https://globus.org
*Cloud hosted for reliability + global footprint for scientific impact
32. ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
7 seconds
7 + 19 + 5 = 31 s
5 seconds
19 seconds
“Flows” as an abstraction for smart instruments
ModelTrain
33. Globus automation services: Managed research acceleration
services for flow specification, execution, and management
R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
34. def F(in_args):
# do something
return results
fxc.register_function(F)
$ pip install funcx-endpoint
$ funcx-endpoint configure myep
$ funcx-endpoint start myep
funcX: A managed research acceleration service that
implements a universal computing fabric
https://funcx.org
Z. Li et al., https://doi.org/10.48550/arXiv.2209.11631
F(ep, arg)
F(ep, arg)
F(ep, “A”)
f = fxc.run(“A”,
endpoint_id=ep,
function_id=F)
R = fxc.get_result(f)
Deploy funcX agents
Register functions
Run functions
35. New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
35
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Seven flows in use at the
Advanced Photon Source
36. New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
36
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Execution times at the
Argonne Leadership
Computing Facility
37. (5) “Run F()”
F()
@A
node A
F()
Auth
(3) “Run F()
on node A”
Flows (2) “Provide access
token for funcX”
(4) “Provide
dependent token
for F() on node A”
(6) F() runs
Consent
Dependent
token
Token hierarchy
(1) “Run flow”
with consent
funcX
Access
token
Cloud-hosted
services
Globus Auth: A managed research acceleration service
providing distributed authorization with delegation
Not shown:
Token refresh
Who do I trust to act on
my behalf, when, and for
what purpose?
Leverage OAuth to:
• Increase security via
scoped credentials
• Improve usability via
browser compatibility
S. Tuecke et al., https://doi.org/10.1109/eScience.2016.7870901, R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
1.3 B access tokens
2.7 M consents
39. Foucault’s pendulum, Paris Panthéon
Problems we have solved, in part
• Authorization (“enterprise grade
security, consumer-grade usability”)
• Universal connectivity
• Location-independent computing
• Programming
New opportunities and challenges
• New devices and applications
• Massive performance and scale
• Energy as a ubiquitous constraint
• Global data space architecture
• Programming
Utility (‘60s) … xSP (’80s) … Grid (‘90s) … Cloud (2010) … Edge … Continuum …
The pendulum swings between centralization and decentralization
40. There are important questions to answer
• What will continuum networks look like in practice?
• Energy vs. accuracy vs. cost vs. … tradeoffs
• Where will we place computing and storage?
• What new instruments will be created?
• What new applications and new science will be enabled?
• What new abstractions, services, and tools do we need?
• Do we need new continuum-aware algorithm design methods?
• What will be economic foundation of this new computing fabric?
• (How) Will we integrate quantum sensors, networks, computers?
41. Thank you for your attention!
Two wonderful institutions: Argonne National Laboratory and the University of Chicago
Federal agencies for continued support: DOE, NSF, NIH, NIST
Wonderful colleagues: Rachana Ananthakrishnan, Ben Blaiszik, Pete Beckman, Charlie Catlett,
Kyle Chard, Ryan Chard, Carl Kesselman, Rick Stevens, Vas Vasiliadis, Logan Ward, & many more
To learn more about our work: https://labs.globus.org
Ask questions or share your thoughts: foster@anl.gov
Experiment with tools:
https://braid-project.org
Thanks to:
https://doi.org/10.1016/j.patter.2022.100606
Hinweis der Redaktion
A brief retrospective of my life from the perspective of the SC conference.
Here is me arriving at Argonne in January 1989.
Jack Dongarra had just left for Tennessee.
I had the great good fortune to be welcomed soon after my arrival into the Center for Research on Parallel Computation, an NSF Science and Technology Center, led by Ken Kennedy.
CRPC under Ken’s leadership was a remarkably collegial and productive community.
Ken was also very supportive of me, for example by providing resources that allowed me to recruit a young 22-year-old hot shot, Steve Tuecke.
Steve and I worked together for 30 years until his recent passing.
So much of what I talk about here was dreamed up in joint brainstorming sessions.
I dedicate this talk to his memory.
But to return to the SC conference.
I have published 30 papers here since 1993, my first SC conference.
Here are the faces of 91 of my 97 wonderful SC co-authors.
Time is too short to call them out by name, but what exceptional people.
The people that we work with are much of what makes HPC a joy.
Many of these were students or postdocs at the time.
Now, you may be thinking that publishing 30 papers in 30 years suggests a consistent ability to impress SC reviewers.
But with help from Mark Montague from Linklings, I refreshed my memory of all papers from 2011 onwards.
I can reveal here that some were less appreciated.
Furthermore, applying the powerful AI method of linear regression, I can build a predictor of my future personal acceptance rates.
My last paper to be accepted at SC will be in 2023, and only if I submit around 20 papers.
The power of data.
Back to the theme of my talk.
I have always been fascinated by the question of how to use computing to deliver better information faster to decision makers.
Perhaps relates to growing up in New Zealand, which is a long way away from anywhere.
Indeed, when I left in 1980, which for idiosyncratic reasons I did on a sailboat, it took me 90 days to reach North America.
Now we can transmit information much faster, reaching all the way to Dallas in just 90 msec: Close to 100M times faster
But still quite slow for many purposes, and not directly improvable further, given light speed limits.
So let’s look briefly at what is involved in obtaining useful information.
We send a message to retrieve an answer, which often requires computation.
The time required to get that answer depends on the number of operations, data communication needs, the distance to be traversed, and the speeds of computation and communication.
We can accelerate each element in this equation in numerous ways, from better algorithms that reduce the number of operations to be performed to better computers and networks.
This complexity is of course what makes HPC so much fun.
My colleagues and I have worked to optimize various of these elements over the years.
Let me mention just a few of those activities.
One area of focus has been using HPC systems to do many different things at once.
Back in the day, HPC systems were used only for single program multiple data, or SPMD, computations.
To escape this straightjacket, my colleagues and I have developed new methods and tools, like Strand, Swift, Parsl, and Colmena that applications that combine different components to be defined and executed. These tools have been used to build large heterogeneous applications that, for example, perform AI-guided simulation.
Many talented contributors here; I will call out Mike Wilde, Justin Wozniak, Kyle Chard, Shantenu Jha, Logan Ward as leaders of individual projects.
Today, such applications are largely mainstream, although arguably our runtimes and operating systems still need to catch up.
As a second example of work intended to produce better information faster, I will describe the work of the Globus team on science services to accelerate complex tasks, for example, by making fast and reliable data transfer a one-click operation.
The Globus system, developed and operated by a wonderful team at the University of Chicago, uses software running on the Amazon cloud to manage data transfers, and more, with extreme reliability and high performance: what we may call a managed research acceleration service.
I created this animation of Globus transfers from late 2010 to today.
The x-axis is the great circle distance between source and destination, from 0 to 20,000 kilometers.
The y-axis the transfer size, from 100MB to several petabytes.
Each dot represents a transfer. I also track total exabytes and gigafiles.
I label a few source-destination pairs
You’ll see a large gap around 5000 km. It turns out that the sizes of the continents and oceans means that few communications occur over that distance.
You may also see a point to the far right at 19855 kilometers.
Can any two places be that far apart?
It turns out those dots represent transfers between Madrid and Wellington,
And those two cities are as about as far apart on the globe as you can get, as shown in this map from a childhood atlas explaining the concept of ”antipodes.”
Now to the title of my talk,, computing in the continuum.
Jack talked yesterday about how computers have grown ever faster.
Over that same period, networks have become not only faster but also immensely more reliable.
We can now envision a world in which barriers to remote computing, other than the speed of light, disappear.
When computation can be accessed remotely as easily as locally, we may talk about a computing continuum.
One aspect of this new world is that the large white spaces in our maps, in which little or no communication is possible, will be filled in.
5G and 6G are one reason.
Another technology that may be even more transformative is coherent light free-space optics: that is, space lasers.
This image depicts such a device, on the right, and the “SpaceTime” platform that ah-Leer-eeh-ah, a Google spinout, is developing to provide global communication.
The original name for their platform was Minkowski, after Herman Minkowski
Hermann Minkowski, a colleague of Einstein’s, observed about the special theory of relativity that it defined a space-time continuum
With universal high-speed communications, we may adapt his words to refer to location and speed.
The speed of a computer, as perceived by its user, is not an absolute, but rather varies with its distance,
As shown here where a computer twice as fast but 500 km distance takes the same time to compute as a local computer
For example: Nhan Tran of FermiLab observes that he can accelerate a trigger analysis code from 2000 to 50 ms by running it on an FPGA.
The FPGA is in an Amazon data center in Virginia, 10 ms away.
The actual computation takes 30 msec, round trip time is 50ms.
Net result is 40x acceleration.
Another example, involving work by Zhengchun Liu and others at Argonne and SLAC.
High energy diffraction microscopy experiments at the Linac Coherent Light Source use a deep neural network, running on a GPU, to extract information from high-rate measurements of microstructure evolution in materials.
The GPU is not powerful enough, however, for the periodic retraining required to respond to structural deformation.
Dispatching requests to a Cerebras system at Argonne reduces this time to 31 seconds: 12 seconds to move data and the new model; 19 seconds for training.
So 35x faster
Several techniques are used in these two examples.
Scientists use NN as efficient approximators, stream data, and use parallelism, specialized hardware, and data center AI systems.
ESNet is increasing communication rates and reducing latency, and ALCF is deploying specialized computers.
In general, solving massive co-design problem: designing applications,
We have identified the following key requirements. Surely not a complete list, but all essential.
As I’ll briefly describe, we are developing new methods in each area to allow experimentation.
One is a notation for expressing data pipelines.
Others include methods for managing flows, computing anywhere, accessing data anywhere, and delegating credentials.
As with Globus Transfer, we make extensive use of cloud-hosted research acceleration services,
Which I remind you combine cloud hosted management logic for high reliability with local agents for global footprint
Flows are built up from simple operations like data transfer, computation, data cataloging.
A wonderful team at the University of Chicago led by Rachana Ananthakrishnan has developed Globus Automation Services to manage the execution of such flows.
As a cloud-hosted service, they can manage flows that run for seconds or months.
Another managed research acceleration service, funcX, led by Kyle Chard, …
I’ll mention that these new applications represent a new source of increasingly different workloads.
Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action.
Durations range from seconds to hours
Globus services ensure reliable execution.
I’ll mention that these new applications represent a new source of increasingly different workloads.
Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action.
Durations range from seconds to hours
Globus services ensure reliable execution.
Another essential building block, this one operational for several years, is the Globus Auth identity and access management platform service.
We need flows to perform actions on instruments, computers, repositories, and networks at remote locations.
To do this securely, we provide consent and tokens mechanisms for controlling what agents acting on a users behalf can do.
Using managed research automation services, we can create what Ben Brown of DOE calls an “integrated research infrastructure” by deploying local agents at each system that is to be integrated into the continuum.
Here I show what this means at Argonne.
Most beamlines at the Advanced Photon Source run Globus Connect agents, and a variety of computers run funcX agents.
You might ask, how are the problems and solutions that I have presented here different from those deployed 10, 20, or 30 years ago?
Dan Reed has observed that computing differs from other sciences in that the questions often stay the same while the answers change.
Far more reliable and performant networks, and effective containerization, allow us to revisit old answers.
And new concerns, like energy, result in new questions.
Indeed, our early experiences programing the continuum suggest numerous other questions.