SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Better Information Faster
Programming the Continuum Ian Foster
A life in supercomputing
A life in supercomputing
Ken Kennedy
1945-2007
1998-2000
A life in supercomputing
A life in supercomputing … papers accepted to SC
A life in supercomputing … papers accepted to SC … and co-authors
A life in supercomputing … papers accepted to SC … and student and postdoc co-authors
A life in supercomputing … papers accepted and rejected Data available
Data available
A life in supercomputing … papers accepted and rejected
Future
An overarching concern: Better information faster
An overarching concern: Better information faster
90 days
in 1980
An overarching concern: Better information faster
90 msec
in 2022
An overarching concern: Better information faster
A
Better information faster via computation
Subject to constraints on: Time, computation, energy use, money, …
Q
A
Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c in fiber
Noperations
Nmeters
Nbytes
Better information faster via computation
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Subject to constraints on: Time, computation, energy use, money, …
Q
Compute at f ops/sec
Send at b bytes/s
Transfer at 0.67c m/s
Better information faster as a systems problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
Compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
Subject to constraints on: Time, computation, energy use, money, …
Better systems: Do many different things at once
Swift, Parsl, …
Swift: Parallel Computing 2007
HPF+MPI: SC’96, JPDC’97
Parsl: HPDC’19
Colmena: MLHPC’21
Managed research acceleration services
Cloud hosted for reliability + global footprint for scientific impact
https://globus.org
Collins Australian Clear
School Atlas, 1964
Wellington, New Zealand
to Madrid, Spain:
19,855 km
Circumference of the earth:
40,075 km
Semi-circumference:
20,037 km
Megascale
Gigascale
Terascale
Petascale
Exascale
Zettascale
Faster components Increased connectivity
Dial-up lines
155 Mbps
10 Gbps
5G
400 Gbps
Free space optics
Computing continuum
Time
Better information faster in a converged world
A universal communication fabric
Aalyria’s “Spacetime” [originally “Minkowski”] platform Free-space optics
A universal communication fabric
Free-space optics
”Henceforth, space for itself, and time for itself, are doomed to fade
away into mere shadows, and only a kind of union of the two will
preserve an independent reality.” – Hermann Minkowski, 109
Aalyria’s “Spacetime” [originally “Minkowski”] platform
23
500 km
2.5 ms
The space-time continuum in converged systems
5 ms
7.5 ms
10 ms
0 km
C2
C1
Misquoted [2022]: “Henceforth, location for itself, and speed for itself, are doomed to fade away
into mere shadows, and only a kind of union of the two will preserve an independent reality."
The behaviors of the two
computers are indistinguishable
t(C1) = T1 = 0.01 sec
t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec
T1
T2
Time
Space
24
0km
(Illinois)
2000 km
(Virginia)
10 ms
Example: High energy physics trigger analysis
T1 = 2 seconds
on CPU
(not to scale)
T2 = 30 msec
on FPGA
Local: 2000 msec
Remote: 30 + 10 + 10 = 50 msec
40x acceleration
40 ms
50 ms
Nhan Tran, FermiLab, et al. arXiv:1904.08986
ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
Example: High energy diffraction microscopy
Cerebras CS-2
1102 1102
1102 1102
Model (re)training on local GPU:
1102 seconds
@ LCLS
ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
Example: High energy diffraction microscopy
7 seconds
7 + 19 + 5 = 31 s
35x faster
5 seconds
19 seconds
Cerebras CS-2
@ LCLS
Multiple techniques are used in these two examples
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
Better information faster in the computing continuum
is an application-infrastructure codesign problem
Nbytes
Reduce by:
Shorter networks
Closer computing
Reduce by:
Avoiding RPC
Reduce by:
compression
Increase by:
Better optics
Parallelism
Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c
Reduce by:
Approximation
Increase by:
Parallelism
Specialization
Centralization
Increase by:
Hollow core fiber
Free space optics
What to compute
What algorithms to use
Where to compute
Where to place computers
How to communicate
What to communicate
What networks to use
Where to deploy networks
Continuum-aware programming
Compute fabric Data fabric
Trust fabric
Compute
anywhere
Access data
anywhere
Ensure only authorized operations
occur in trusted locations
Develop efficient, reliable,
reusable applications
New methods and tools are needed to enable
exploration and exploitation
Continuum-aware programming
Compute fabric Data fabric
Trust fabric
We are exploring new programming abstractions
https://braid-project.org
https://globus.org
“Flows”
Continuum-aware programming
Compute fabric Data fabric
Trust fabric
Globus
Transfer
funcX
Globus
Automation
Services
Globus
Auth
We are exploring new programming abstractions
and managed research acceleration services*
https://braid-project.org
https://globus.org
*Cloud hosted for reliability + global footprint for scientific impact
ources
Request
Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967
7 seconds
7 + 19 + 5 = 31 s
5 seconds
19 seconds
“Flows” as an abstraction for smart instruments
ModelTrain
Globus automation services: Managed research acceleration
services for flow specification, execution, and management
R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
def F(in_args):
# do something
return results
fxc.register_function(F)
$ pip install funcx-endpoint
$ funcx-endpoint configure myep
$ funcx-endpoint start myep
funcX: A managed research acceleration service that
implements a universal computing fabric
https://funcx.org
Z. Li et al., https://doi.org/10.48550/arXiv.2209.11631
F(ep, arg)
F(ep, arg)
F(ep, “A”)
f = fxc.run(“A”,
endpoint_id=ep,
function_id=F)
R = fxc.get_result(f)
Deploy funcX agents
Register functions
Run functions
New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
35
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Seven flows in use at the
Advanced Photon Source
New applications mean new computing workloads
Globus Flows can invoke
arbitrary functions via the
funcX action provider
Functions may be
executed in various
locations: at a beamline,
local server, cluster, cloud
36
R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606
Execution times at the
Argonne Leadership
Computing Facility
(5) “Run F()”
F()
@A
node A
F()
Auth
(3) “Run F()
on node A”
Flows (2) “Provide access
token for funcX”
(4) “Provide
dependent token
for F() on node A”
(6) F() runs
Consent
Dependent
token
Token hierarchy
(1) “Run flow”
with consent
funcX
Access
token
Cloud-hosted
services
Globus Auth: A managed research acceleration service
providing distributed authorization with delegation
Not shown:
Token refresh
Who do I trust to act on
my behalf, when, and for
what purpose?
Leverage OAuth to:
• Increase security via
scoped credentials
• Improve usability via
browser compatibility
S. Tuecke et al., https://doi.org/10.1109/eScience.2016.7870901, R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
1.3 B access tokens
2.7 M consents
Polaris
Bebop
Cluster
Argonne
Leadership
Computing
Facility
Laboratory
Computing
Research
Center
Eagle store
APS
Computing
Orthros Cluster
APS DM system
Portal
server
Portal
server
Theta
Advanced
Photon
Source Key: funcX agent
Globus Connect agent
Globus-accessible
storage and
computing
(10,000s of systems)
Deployment at science facilities
Elements of an
Integrated Research
Infrastructure
Foucault’s pendulum, Paris Panthéon
Problems we have solved, in part
• Authorization (“enterprise grade
security, consumer-grade usability”)
• Universal connectivity
• Location-independent computing
• Programming
New opportunities and challenges
• New devices and applications
• Massive performance and scale
• Energy as a ubiquitous constraint
• Global data space architecture
• Programming
Utility (‘60s) … xSP (’80s) … Grid (‘90s) … Cloud (2010) … Edge … Continuum …
The pendulum swings between centralization and decentralization
There are important questions to answer
• What will continuum networks look like in practice?
• Energy vs. accuracy vs. cost vs. … tradeoffs
• Where will we place computing and storage?
• What new instruments will be created?
• What new applications and new science will be enabled?
• What new abstractions, services, and tools do we need?
• Do we need new continuum-aware algorithm design methods?
• What will be economic foundation of this new computing fabric?
• (How) Will we integrate quantum sensors, networks, computers?
Thank you for your attention!
Two wonderful institutions: Argonne National Laboratory and the University of Chicago
Federal agencies for continued support: DOE, NSF, NIH, NIST
Wonderful colleagues: Rachana Ananthakrishnan, Ben Blaiszik, Pete Beckman, Charlie Catlett,
Kyle Chard, Ryan Chard, Carl Kesselman, Rick Stevens, Vas Vasiliadis, Logan Ward, & many more
To learn more about our work: https://labs.globus.org
Ask questions or share your thoughts: foster@anl.gov
Experiment with tools:
https://braid-project.org
Thanks to:
https://doi.org/10.1016/j.patter.2022.100606

Weitere ähnliche Inhalte

Ähnlich wie Better Information Faster: Programming the Continuum

HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores inside-BigData.com
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Ian Foster
 
The OptIPuter and Its Applications
The OptIPuter and Its ApplicationsThe OptIPuter and Its Applications
The OptIPuter and Its ApplicationsLarry Smarr
 
High Performance Collaboration – The Jump to Light Speed
High Performance Collaboration – The Jump to Light SpeedHigh Performance Collaboration – The Jump to Light Speed
High Performance Collaboration – The Jump to Light SpeedLarry Smarr
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific Applications
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific ApplicationsOptIPuter-A High Performance SOA LambdaGrid Enabling Scientific Applications
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific ApplicationsLarry Smarr
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the ContinuumIan Foster
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter OverviewLarry Smarr
 
Envisioning the Future
Envisioning the FutureEnvisioning the Future
Envisioning the FutureLarry Smarr
 
The Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANThe Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANLarry Smarr
 
Blowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary ComputerBlowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary ComputerLarry Smarr
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer OverlordsIan Foster
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...Larry Smarr
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it worldChris Dwan
 

Ähnlich wie Better Information Faster: Programming the Continuum (20)

HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Dice presents-feb2014
Dice presents-feb2014Dice presents-feb2014
Dice presents-feb2014
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
 
The OptIPuter and Its Applications
The OptIPuter and Its ApplicationsThe OptIPuter and Its Applications
The OptIPuter and Its Applications
 
High Performance Collaboration – The Jump to Light Speed
High Performance Collaboration – The Jump to Light SpeedHigh Performance Collaboration – The Jump to Light Speed
High Performance Collaboration – The Jump to Light Speed
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific Applications
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific ApplicationsOptIPuter-A High Performance SOA LambdaGrid Enabling Scientific Applications
OptIPuter-A High Performance SOA LambdaGrid Enabling Scientific Applications
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
1. GRID COMPUTING
1. GRID COMPUTING1. GRID COMPUTING
1. GRID COMPUTING
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter Overview
 
Envisioning the Future
Envisioning the FutureEnvisioning the Future
Envisioning the Future
 
The Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LANThe Optiputer - Toward a Terabit LAN
The Optiputer - Toward a Terabit LAN
 
Blowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary ComputerBlowing up the Box--the Emergence of the Planetary Computer
Blowing up the Box--the Emergence of the Planetary Computer
 
So Long Computer Overlords
So Long Computer OverlordsSo Long Computer Overlords
So Long Computer Overlords
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
The Academic and R&D Sectors' Current and Future Broadband and Fiber Access N...
 
2015 04 bio it world
2015 04 bio it world2015 04 bio it world
2015 04 bio it world
 

Mehr von Ian Foster

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxIan Foster
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsIan Foster
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationIan Foster
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptxIan Foster
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceIan Foster
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationIan Foster
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryIan Foster
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon SummaryIan Foster
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperabilityIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformIan Foster
 

Mehr von Ian Foster (20)

Global Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptxGlobal Services for Global Science March 2023.pptx
Global Services for Global Science March 2023.pptx
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
ESnet6 and Smart Instruments
ESnet6 and Smart InstrumentsESnet6 and Smart Instruments
ESnet6 and Smart Instruments
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Foster CRA March 2022.pptx
Foster CRA March 2022.pptxFoster CRA March 2022.pptx
Foster CRA March 2022.pptx
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Data Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud AutomationData Tribology: Overcoming Data Friction with Cloud Automation
Data Tribology: Overcoming Data Friction with Cloud Automation
 
Research Automation for Data-Driven Discovery
Research Automation for Data-Driven DiscoveryResearch Automation for Data-Driven Discovery
Research Automation for Data-Driven Discovery
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Thoughts on interoperability
Thoughts on interoperabilityThoughts on interoperability
Thoughts on interoperability
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Software Infrastructure for a National Research Platform
Software Infrastructure for a National Research PlatformSoftware Infrastructure for a National Research Platform
Software Infrastructure for a National Research Platform
 

Kürzlich hochgeladen

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Better Information Faster: Programming the Continuum

  • 1. Better Information Faster Programming the Continuum Ian Foster
  • 2. A life in supercomputing
  • 3. A life in supercomputing Ken Kennedy 1945-2007 1998-2000
  • 4. A life in supercomputing
  • 5. A life in supercomputing … papers accepted to SC
  • 6. A life in supercomputing … papers accepted to SC … and co-authors
  • 7. A life in supercomputing … papers accepted to SC … and student and postdoc co-authors
  • 8. A life in supercomputing … papers accepted and rejected Data available
  • 9. Data available A life in supercomputing … papers accepted and rejected Future
  • 10. An overarching concern: Better information faster
  • 11. An overarching concern: Better information faster
  • 12. 90 days in 1980 An overarching concern: Better information faster
  • 13. 90 msec in 2022 An overarching concern: Better information faster
  • 14. A Better information faster via computation Subject to constraints on: Time, computation, energy use, money, … Q
  • 15. A Compute at f ops/sec Send at b bytes/s Transfer at 0.67c in fiber Noperations Nmeters Nbytes Better information faster via computation Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c Subject to constraints on: Time, computation, energy use, money, … Q
  • 16. Compute at f ops/sec Send at b bytes/s Transfer at 0.67c m/s Better information faster as a systems problem Nbytes Reduce by: Shorter networks Closer computing Reduce by: Avoiding RPC Reduce by: Compression Increase by: Better optics Parallelism Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c Reduce by: Approximation Increase by: Parallelism Specialization Centralization Increase by: Hollow core fiber Free space optics Subject to constraints on: Time, computation, energy use, money, …
  • 17. Better systems: Do many different things at once Swift, Parsl, … Swift: Parallel Computing 2007 HPF+MPI: SC’96, JPDC’97 Parsl: HPDC’19 Colmena: MLHPC’21
  • 18. Managed research acceleration services Cloud hosted for reliability + global footprint for scientific impact https://globus.org
  • 19. Collins Australian Clear School Atlas, 1964 Wellington, New Zealand to Madrid, Spain: 19,855 km Circumference of the earth: 40,075 km Semi-circumference: 20,037 km
  • 20. Megascale Gigascale Terascale Petascale Exascale Zettascale Faster components Increased connectivity Dial-up lines 155 Mbps 10 Gbps 5G 400 Gbps Free space optics Computing continuum Time Better information faster in a converged world
  • 21. A universal communication fabric Aalyria’s “Spacetime” [originally “Minkowski”] platform Free-space optics
  • 22. A universal communication fabric Free-space optics ”Henceforth, space for itself, and time for itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality.” – Hermann Minkowski, 109 Aalyria’s “Spacetime” [originally “Minkowski”] platform
  • 23. 23 500 km 2.5 ms The space-time continuum in converged systems 5 ms 7.5 ms 10 ms 0 km C2 C1 Misquoted [2022]: “Henceforth, location for itself, and speed for itself, are doomed to fade away into mere shadows, and only a kind of union of the two will preserve an independent reality." The behaviors of the two computers are indistinguishable t(C1) = T1 = 0.01 sec t(C2) = T2 + 2 x 500 x 5 x 10−6 = 0.01 sec T1 T2 Time Space
  • 24. 24 0km (Illinois) 2000 km (Virginia) 10 ms Example: High energy physics trigger analysis T1 = 2 seconds on CPU (not to scale) T2 = 30 msec on FPGA Local: 2000 msec Remote: 30 + 10 + 10 = 50 msec 40x acceleration 40 ms 50 ms Nhan Tran, FermiLab, et al. arXiv:1904.08986
  • 25. ources Request Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967 Example: High energy diffraction microscopy Cerebras CS-2 1102 1102 1102 1102 Model (re)training on local GPU: 1102 seconds @ LCLS
  • 26. ources Request Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967 Example: High energy diffraction microscopy 7 seconds 7 + 19 + 5 = 31 s 35x faster 5 seconds 19 seconds Cerebras CS-2 @ LCLS
  • 27. Multiple techniques are used in these two examples Nbytes Reduce by: Shorter networks Closer computing Reduce by: Avoiding RPC Reduce by: compression Increase by: Better optics Parallelism Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c Reduce by: Approximation Increase by: Parallelism Specialization Centralization Increase by: Hollow core fiber Free space optics
  • 28. Better information faster in the computing continuum is an application-infrastructure codesign problem Nbytes Reduce by: Shorter networks Closer computing Reduce by: Avoiding RPC Reduce by: compression Increase by: Better optics Parallelism Tanswer = Noperations/f + Nbytes/b + 2.Nmeters/0.67c Reduce by: Approximation Increase by: Parallelism Specialization Centralization Increase by: Hollow core fiber Free space optics What to compute What algorithms to use Where to compute Where to place computers How to communicate What to communicate What networks to use Where to deploy networks
  • 29. Continuum-aware programming Compute fabric Data fabric Trust fabric Compute anywhere Access data anywhere Ensure only authorized operations occur in trusted locations Develop efficient, reliable, reusable applications New methods and tools are needed to enable exploration and exploitation
  • 30. Continuum-aware programming Compute fabric Data fabric Trust fabric We are exploring new programming abstractions https://braid-project.org https://globus.org “Flows”
  • 31. Continuum-aware programming Compute fabric Data fabric Trust fabric Globus Transfer funcX Globus Automation Services Globus Auth We are exploring new programming abstractions and managed research acceleration services* https://braid-project.org https://globus.org *Cloud hosted for reliability + global footprint for scientific impact
  • 32. ources Request Z. Liu et al., https://doi.org/10.48550/arXiv.2105.13967 7 seconds 7 + 19 + 5 = 31 s 5 seconds 19 seconds “Flows” as an abstraction for smart instruments ModelTrain
  • 33. Globus automation services: Managed research acceleration services for flow specification, execution, and management R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513
  • 34. def F(in_args): # do something return results fxc.register_function(F) $ pip install funcx-endpoint $ funcx-endpoint configure myep $ funcx-endpoint start myep funcX: A managed research acceleration service that implements a universal computing fabric https://funcx.org Z. Li et al., https://doi.org/10.48550/arXiv.2209.11631 F(ep, arg) F(ep, arg) F(ep, “A”) f = fxc.run(“A”, endpoint_id=ep, function_id=F) R = fxc.get_result(f) Deploy funcX agents Register functions Run functions
  • 35. New applications mean new computing workloads Globus Flows can invoke arbitrary functions via the funcX action provider Functions may be executed in various locations: at a beamline, local server, cluster, cloud 35 R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606 Seven flows in use at the Advanced Photon Source
  • 36. New applications mean new computing workloads Globus Flows can invoke arbitrary functions via the funcX action provider Functions may be executed in various locations: at a beamline, local server, cluster, cloud 36 R. Vescovi et al., https://doi.org/10.1016/j.patter.2022.100606 Execution times at the Argonne Leadership Computing Facility
  • 37. (5) “Run F()” F() @A node A F() Auth (3) “Run F() on node A” Flows (2) “Provide access token for funcX” (4) “Provide dependent token for F() on node A” (6) F() runs Consent Dependent token Token hierarchy (1) “Run flow” with consent funcX Access token Cloud-hosted services Globus Auth: A managed research acceleration service providing distributed authorization with delegation Not shown: Token refresh Who do I trust to act on my behalf, when, and for what purpose? Leverage OAuth to: • Increase security via scoped credentials • Improve usability via browser compatibility S. Tuecke et al., https://doi.org/10.1109/eScience.2016.7870901, R. Chard et al., https://doi.org/10.48550/arXiv.2208.09513 1.3 B access tokens 2.7 M consents
  • 38. Polaris Bebop Cluster Argonne Leadership Computing Facility Laboratory Computing Research Center Eagle store APS Computing Orthros Cluster APS DM system Portal server Portal server Theta Advanced Photon Source Key: funcX agent Globus Connect agent Globus-accessible storage and computing (10,000s of systems) Deployment at science facilities Elements of an Integrated Research Infrastructure
  • 39. Foucault’s pendulum, Paris Panthéon Problems we have solved, in part • Authorization (“enterprise grade security, consumer-grade usability”) • Universal connectivity • Location-independent computing • Programming New opportunities and challenges • New devices and applications • Massive performance and scale • Energy as a ubiquitous constraint • Global data space architecture • Programming Utility (‘60s) … xSP (’80s) … Grid (‘90s) … Cloud (2010) … Edge … Continuum … The pendulum swings between centralization and decentralization
  • 40. There are important questions to answer • What will continuum networks look like in practice? • Energy vs. accuracy vs. cost vs. … tradeoffs • Where will we place computing and storage? • What new instruments will be created? • What new applications and new science will be enabled? • What new abstractions, services, and tools do we need? • Do we need new continuum-aware algorithm design methods? • What will be economic foundation of this new computing fabric? • (How) Will we integrate quantum sensors, networks, computers?
  • 41. Thank you for your attention! Two wonderful institutions: Argonne National Laboratory and the University of Chicago Federal agencies for continued support: DOE, NSF, NIH, NIST Wonderful colleagues: Rachana Ananthakrishnan, Ben Blaiszik, Pete Beckman, Charlie Catlett, Kyle Chard, Ryan Chard, Carl Kesselman, Rick Stevens, Vas Vasiliadis, Logan Ward, & many more To learn more about our work: https://labs.globus.org Ask questions or share your thoughts: foster@anl.gov Experiment with tools: https://braid-project.org Thanks to: https://doi.org/10.1016/j.patter.2022.100606

Hinweis der Redaktion

  1. A brief retrospective of my life from the perspective of the SC conference. Here is me arriving at Argonne in January 1989. Jack Dongarra had just left for Tennessee.
  2. I had the great good fortune to be welcomed soon after my arrival into the Center for Research on Parallel Computation, an NSF Science and Technology Center, led by Ken Kennedy. CRPC under Ken’s leadership was a remarkably collegial and productive community. Ken was also very supportive of me, for example by providing resources that allowed me to recruit a young 22-year-old hot shot, Steve Tuecke.
  3. Steve and I worked together for 30 years until his recent passing. So much of what I talk about here was dreamed up in joint brainstorming sessions. I dedicate this talk to his memory.
  4. But to return to the SC conference. I have published 30 papers here since 1993, my first SC conference.
  5. Here are the faces of 91 of my 97 wonderful SC co-authors. Time is too short to call them out by name, but what exceptional people. The people that we work with are much of what makes HPC a joy.
  6. Many of these were students or postdocs at the time. Now, you may be thinking that publishing 30 papers in 30 years suggests a consistent ability to impress SC reviewers.
  7. But with help from Mark Montague from Linklings, I refreshed my memory of all papers from 2011 onwards. I can reveal here that some were less appreciated.
  8. Furthermore, applying the powerful AI method of linear regression, I can build a predictor of my future personal acceptance rates. My last paper to be accepted at SC will be in 2023, and only if I submit around 20 papers. The power of data.
  9. Back to the theme of my talk. I have always been fascinated by the question of how to use computing to deliver better information faster to decision makers.
  10. Perhaps relates to growing up in New Zealand, which is a long way away from anywhere.
  11. Indeed, when I left in 1980, which for idiosyncratic reasons I did on a sailboat, it took me 90 days to reach North America.
  12. Now we can transmit information much faster, reaching all the way to Dallas in just 90 msec: Close to 100M times faster But still quite slow for many purposes, and not directly improvable further, given light speed limits.
  13. So let’s look briefly at what is involved in obtaining useful information. We send a message to retrieve an answer, which often requires computation.
  14. The time required to get that answer depends on the number of operations, data communication needs, the distance to be traversed, and the speeds of computation and communication.
  15. We can accelerate each element in this equation in numerous ways, from better algorithms that reduce the number of operations to be performed to better computers and networks. This complexity is of course what makes HPC so much fun. My colleagues and I have worked to optimize various of these elements over the years. Let me mention just a few of those activities.
  16. One area of focus has been using HPC systems to do many different things at once. Back in the day, HPC systems were used only for single program multiple data, or SPMD, computations. To escape this straightjacket, my colleagues and I have developed new methods and tools, like Strand, Swift, Parsl, and Colmena that applications that combine different components to be defined and executed. These tools have been used to build large heterogeneous applications that, for example, perform AI-guided simulation. Many talented contributors here; I will call out Mike Wilde, Justin Wozniak, Kyle Chard, Shantenu Jha, Logan Ward as leaders of individual projects. Today, such applications are largely mainstream, although arguably our runtimes and operating systems still need to catch up.
  17. As a second example of work intended to produce better information faster, I will describe the work of the Globus team on science services to accelerate complex tasks, for example, by making fast and reliable data transfer a one-click operation. The Globus system, developed and operated by a wonderful team at the University of Chicago, uses software running on the Amazon cloud to manage data transfers, and more, with extreme reliability and high performance: what we may call a managed research acceleration service. I created this animation of Globus transfers from late 2010 to today. The x-axis is the great circle distance between source and destination, from 0 to 20,000 kilometers. The y-axis the transfer size, from 100MB to several petabytes. Each dot represents a transfer. I also track total exabytes and gigafiles. I label a few source-destination pairs You’ll see a large gap around 5000 km. It turns out that the sizes of the continents and oceans means that few communications occur over that distance. You may also see a point to the far right at 19855 kilometers. Can any two places be that far apart?
  18. It turns out those dots represent transfers between Madrid and Wellington, And those two cities are as about as far apart on the globe as you can get, as shown in this map from a childhood atlas explaining the concept of ”antipodes.”
  19. Now to the title of my talk,, computing in the continuum. Jack talked yesterday about how computers have grown ever faster. Over that same period, networks have become not only faster but also immensely more reliable. We can now envision a world in which barriers to remote computing, other than the speed of light, disappear. When computation can be accessed remotely as easily as locally, we may talk about a computing continuum.
  20. One aspect of this new world is that the large white spaces in our maps, in which little or no communication is possible, will be filled in. 5G and 6G are one reason. Another technology that may be even more transformative is coherent light free-space optics: that is, space lasers. This image depicts such a device, on the right, and the “SpaceTime” platform that ah-Leer-eeh-ah, a Google spinout, is developing to provide global communication. The original name for their platform was Minkowski, after Herman Minkowski
  21. Hermann Minkowski, a colleague of Einstein’s, observed about the special theory of relativity that it defined a space-time continuum
  22. With universal high-speed communications, we may adapt his words to refer to location and speed. The speed of a computer, as perceived by its user, is not an absolute, but rather varies with its distance, As shown here where a computer twice as fast but 500 km distance takes the same time to compute as a local computer
  23. For example: Nhan Tran of FermiLab observes that he can accelerate a trigger analysis code from 2000 to 50 ms by running it on an FPGA. The FPGA is in an Amazon data center in Virginia, 10 ms away. The actual computation takes 30 msec, round trip time is 50ms. Net result is 40x acceleration.
  24. Another example, involving work by Zhengchun Liu and others at Argonne and SLAC. High energy diffraction microscopy experiments at the Linac Coherent Light Source use a deep neural network, running on a GPU, to extract information from high-rate measurements of microstructure evolution in materials. The GPU is not powerful enough, however, for the periodic retraining required to respond to structural deformation.
  25. Dispatching requests to a Cerebras system at Argonne reduces this time to 31 seconds: 12 seconds to move data and the new model; 19 seconds for training. So 35x faster
  26. Several techniques are used in these two examples. Scientists use NN as efficient approximators, stream data, and use parallelism, specialized hardware, and data center AI systems. ESNet is increasing communication rates and reducing latency, and ALCF is deploying specialized computers.
  27. In general, solving massive co-design problem: designing applications,
  28. We have identified the following key requirements. Surely not a complete list, but all essential.
  29. As I’ll briefly describe, we are developing new methods in each area to allow experimentation. One is a notation for expressing data pipelines.
  30. Others include methods for managing flows, computing anywhere, accessing data anywhere, and delegating credentials. As with Globus Transfer, we make extensive use of cloud-hosted research acceleration services, Which I remind you combine cloud hosted management logic for high reliability with local agents for global footprint
  31. Flows are built up from simple operations like data transfer, computation, data cataloging.
  32. A wonderful team at the University of Chicago led by Rachana Ananthakrishnan has developed Globus Automation Services to manage the execution of such flows. As a cloud-hosted service, they can manage flows that run for seconds or months.
  33. Another managed research acceleration service, funcX, led by Kyle Chard, …
  34. I’ll mention that these new applications represent a new source of increasingly different workloads. Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action. Durations range from seconds to hours Globus services ensure reliable execution.
  35. I’ll mention that these new applications represent a new source of increasingly different workloads. Here, for example, I show for thousands of instances of seven synchrotron light source flows the average duration of each action. Durations range from seconds to hours Globus services ensure reliable execution.
  36. Another essential building block, this one operational for several years, is the Globus Auth identity and access management platform service. We need flows to perform actions on instruments, computers, repositories, and networks at remote locations. To do this securely, we provide consent and tokens mechanisms for controlling what agents acting on a users behalf can do.
  37. Using managed research automation services, we can create what Ben Brown of DOE calls an “integrated research infrastructure” by deploying local agents at each system that is to be integrated into the continuum. Here I show what this means at Argonne. Most beamlines at the Advanced Photon Source run Globus Connect agents, and a variety of computers run funcX agents.
  38. You might ask, how are the problems and solutions that I have presented here different from those deployed 10, 20, or 30 years ago? Dan Reed has observed that computing differs from other sciences in that the questions often stay the same while the answers change. Far more reliable and performant networks, and effective containerization, allow us to revisit old answers. And new concerns, like energy, result in new questions.
  39. Indeed, our early experiences programing the continuum suggest numerous other questions.