SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
Vas Vasiliadis
vas@uchicago.edu
February 28, 2024
Reliable, Remote Computation at all Scales
Why do we need remote computing?
Workloads
• Interactive, real-time
workloads
• Machine learning training
and inference
• Components may best be
executed in different places
Users
• Diverse backgrounds
and expertise
• Different user interfaces
(e.g., notebooks)
Resources
• Hardware specialization
• Specialization leads to
distribution
Why do researchers “hate” remote computing?
• Remote computing is notoriously complicated
– Authentication
– Network connections
– Configuring/managing jobs
– Interacting with resources (waiting in queues, scaling nodes)
– Configuring execution environment
– Getting results back again
• Researchers need to overcome the same obstacles
every time they move to a new resource
3
How can we bridge this gap?
Borrow page from data management playbook
è“Fire-and-forget” computation
èUniform access interface
èFederated access control
èFast networks making compute resource “local”
How can we bridge this gap?
Move closer to researchers’ environments
èResearchers primarily work in high level languages
èFunctions are a natural unit of computation
Function-as-a-Service (FaaS)!
• Researchers work in familiar language, e.g., Python…
• …using familiar interfaces, e.g., Jupyter
FaaS as offered by cloud providers
6
• Single provider, single
location to submit and
manage tasks
• Homogenous execution
environment
• Transparent and elastic
execution (of even very
small tasks)
• Integrated with
cloud provider data
management
Cloud
Provider 1
Cloud
Provider 2
FaaS as an interface to the advanced computing
ecosystem
7
• Multiple provider,
multiple location, single
interface
• Homogenous execution
environment
• Transparent and elastic
execution
• Integrated with
institution-wide data
management
Globus Compute helps bridge the gap
Managed, federated
Function-as-a-Service for
reliably, scaleably and
securely executing functions
on remote endpoints from
laptops to supercomputers
The Globus Compute model
• Compute service — Highly available cloud-hosted
service for managed, fire-and-forget function execution
• Compute endpoint — Abstracts access to compute
resources, from edge device to supercomputer
• Compute SDK — Python interface for interacting with the
service, suing the familiar (to many) Globus look and feel
• Security — Leverages Globus Auth; Compute endpoints
are resource servers, authN and access via OAuth tokens
9
User interaction with Globus Compute
10
A B
You request a function be
executed on endpoints A and B
1
2 Globus Compute manages
the reliable and secure
execution on these endpoints
3
Globus Compute returns results or
stores them until requested
Compute
Service
You register your code
(as a Python function)
0
A compute
resource
Another
compute
resource
Globus Compute transforms any computing
resource into a function serving endpoint
• Python pip installable agent
• Elastic resource provisioning
from local, cluster, or cloud
system (via Parsl)
• Parallel execution using local
fork or via common schedulers
– Slurm, PBS, LSF, Cobalt, K8s
11
Compute
Service
Executing functions with Globus Compute
• Users invoke functions as tasks
– Register Python function
– Pass input arguments
– Select endpoint(s)
• Service stores tasks in the cloud
• Endpoints fetch waiting tasks (when
online), run tasks, and return results
• Results stored in the cloud and on
Globus storage endpoints
• Users retrieve results asynchronously
12
Compute
Service
Common Use Cases
13
Use case 1: fire-and-forget execution
Executing a bag of tasks, e.g., running simulations with
different parameters, executing ML inferences, on one
or more remote computers directly from your
environment, e.g., Jupyter on your laptop
• Fire-and-forget execution managed by Globus Compute;
tasks/results cached until endpoint/client are online
• Portability across different systems—optionally making use of
specialized hardware
• Elastic scaling to provision resources as needed (from HPC and
cloud systems)
ML-based drug screening
15
National Virtual Biotechnology Laboratory, arXiv:2006.02431
Use case 2: automated analysis of data
Constructing and running automated analysis
pipelines with data processing steps that need to be
executed in different locations
• Automatically process data as they are acquired (event- and
workflow-based)
• Integrate with data movement and other actions—human and
machine
• Execute functions across the computing continuum e.g., near
instrument, in data center, on specialized hardware
Enabling serial crystallography at scale
• Serially image chips with
000’s of embedded crystals
• Quality control first 1,000 to
report failures
• Analyze batches of images as
they are collected
• Report statistics and images
during experiment
• Return structure to scientist
Darren Sherrell, Gyorgy Babnigg, Andrzej Joachimiak
Globus
Flows
How remote computing enables this use case
Image
processing
Local workstation
Carbon!
Check
threshold
Data publication
Transfer
Transfer
raw files
Transfer
Move results
to repo
Compute
Analyze
images
Compute
Visualize
Compute
Gather
metadata
Share
Set access
controls
Compute
Launch QA
job
Search
Ingest to
index
Supercomputer
GPU Cluster
Globus enables “smart” instruments
19
aining of Deep Neural Networks
ources
7sec
19sec
5sec
31sec.
cycle time
Seems too easy? Let’s take a look…
• Run function on my laptop …test, debug
• Scale (a little) on a cloud VM (or K8s cluster)
• Scale (a bit more) on my campus cluster
• …run complete model on DOE supercomputer :--(
20
Common pattern of computing in a flow
Transfer raw
instrument
images
Run a compute
job to process
image files
Transfer Compute
Move processed
images to sharing
system/repository
Set access
controls for
sharing data
Share
Transfer
1 2 3 4
EC2
Instance
“Instrument”
EC2
Instance
Compute
Resource
Our environment Registered
Function
Monitor
script
transfer control
access
result files
0 trigger
flow run
transfer
raw files
1
invoke image
processing function 2
set
permissions
4
transfer
result files
3
Data
Endpoint
Data
Endpoint
Data
Endpoint
ALCF
Eagle
Compute
Endpoint
Globus Compute current state
• Service is running in production
– Tried and tested by 10MMs of invocations on 100Ks endpoints
• Globus Computer Agent supports single user
– Akin to Globus Connect Personal
• Endpoints visible via Globus web app
• Function registration and invocation via SDK
23
Compute service will evolve rapidly
• Multi-user compute endpoints
– Akin to Globus Connect Server
• Native integration with transfer for stage in and stage
out of data for compute tasks
– Can be done today via Globus Flows
• Expanding compute service interfaces in the webapp
for administrators and users
Support resources
• Globus documentation: docs.globus.org
• YouTube channel: youtube.com/GlobusOnline
• Helpdesk: support@globus.org
• Mailing Lists: globus.org/mailing-lists
• Customer engagement team (office hours)
• Professional services team (advisory, custom work)

Weitere ähnliche Inhalte

Ähnlich wie Reliable, Remote Computation at All Scales

Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudDatadog
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduriRavi Madduri
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterIan Foster
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!Ian Foster
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project OnboardingShuquan Huang
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systemsSri Prasanna
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio, Inc.
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 

Ähnlich wie Reliable, Remote Computation at All Scales (20)

Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Scaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and JupyterScaling collaborative data science with Globus and Jupyter
Scaling collaborative data science with Globus and Jupyter
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
StarlingX - Project Onboarding
StarlingX - Project OnboardingStarlingX - Project Onboarding
StarlingX - Project Onboarding
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
Globus Labs: Forging the Next Frontier
Globus Labs: Forging the Next FrontierGlobus Labs: Forging the Next Frontier
Globus Labs: Forging the Next Frontier
 
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & AlluxioAlluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 

Mehr von Globus

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration TopicsGlobus
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowGlobus
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaSGlobus
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusGlobus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for ResearchersGlobus
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with GlobusGlobus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System AdministratorsGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersGlobus
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersGlobus
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Globus
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeGlobus
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System AdministratorsGlobus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New UsersGlobus
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsGlobus
 
Globus Automation
Globus AutomationGlobus Automation
Globus AutomationGlobus
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System AdministrationGlobus
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to GlobusGlobus
 

Mehr von Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 

Kürzlich hochgeladen

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Kürzlich hochgeladen (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Reliable, Remote Computation at All Scales

  • 1. Vas Vasiliadis vas@uchicago.edu February 28, 2024 Reliable, Remote Computation at all Scales
  • 2. Why do we need remote computing? Workloads • Interactive, real-time workloads • Machine learning training and inference • Components may best be executed in different places Users • Diverse backgrounds and expertise • Different user interfaces (e.g., notebooks) Resources • Hardware specialization • Specialization leads to distribution
  • 3. Why do researchers “hate” remote computing? • Remote computing is notoriously complicated – Authentication – Network connections – Configuring/managing jobs – Interacting with resources (waiting in queues, scaling nodes) – Configuring execution environment – Getting results back again • Researchers need to overcome the same obstacles every time they move to a new resource 3
  • 4. How can we bridge this gap? Borrow page from data management playbook è“Fire-and-forget” computation èUniform access interface èFederated access control èFast networks making compute resource “local”
  • 5. How can we bridge this gap? Move closer to researchers’ environments èResearchers primarily work in high level languages èFunctions are a natural unit of computation Function-as-a-Service (FaaS)! • Researchers work in familiar language, e.g., Python… • …using familiar interfaces, e.g., Jupyter
  • 6. FaaS as offered by cloud providers 6 • Single provider, single location to submit and manage tasks • Homogenous execution environment • Transparent and elastic execution (of even very small tasks) • Integrated with cloud provider data management Cloud Provider 1 Cloud Provider 2
  • 7. FaaS as an interface to the advanced computing ecosystem 7 • Multiple provider, multiple location, single interface • Homogenous execution environment • Transparent and elastic execution • Integrated with institution-wide data management
  • 8. Globus Compute helps bridge the gap Managed, federated Function-as-a-Service for reliably, scaleably and securely executing functions on remote endpoints from laptops to supercomputers
  • 9. The Globus Compute model • Compute service — Highly available cloud-hosted service for managed, fire-and-forget function execution • Compute endpoint — Abstracts access to compute resources, from edge device to supercomputer • Compute SDK — Python interface for interacting with the service, suing the familiar (to many) Globus look and feel • Security — Leverages Globus Auth; Compute endpoints are resource servers, authN and access via OAuth tokens 9
  • 10. User interaction with Globus Compute 10 A B You request a function be executed on endpoints A and B 1 2 Globus Compute manages the reliable and secure execution on these endpoints 3 Globus Compute returns results or stores them until requested Compute Service You register your code (as a Python function) 0 A compute resource Another compute resource
  • 11. Globus Compute transforms any computing resource into a function serving endpoint • Python pip installable agent • Elastic resource provisioning from local, cluster, or cloud system (via Parsl) • Parallel execution using local fork or via common schedulers – Slurm, PBS, LSF, Cobalt, K8s 11 Compute Service
  • 12. Executing functions with Globus Compute • Users invoke functions as tasks – Register Python function – Pass input arguments – Select endpoint(s) • Service stores tasks in the cloud • Endpoints fetch waiting tasks (when online), run tasks, and return results • Results stored in the cloud and on Globus storage endpoints • Users retrieve results asynchronously 12 Compute Service
  • 14. Use case 1: fire-and-forget execution Executing a bag of tasks, e.g., running simulations with different parameters, executing ML inferences, on one or more remote computers directly from your environment, e.g., Jupyter on your laptop • Fire-and-forget execution managed by Globus Compute; tasks/results cached until endpoint/client are online • Portability across different systems—optionally making use of specialized hardware • Elastic scaling to provision resources as needed (from HPC and cloud systems)
  • 15. ML-based drug screening 15 National Virtual Biotechnology Laboratory, arXiv:2006.02431
  • 16. Use case 2: automated analysis of data Constructing and running automated analysis pipelines with data processing steps that need to be executed in different locations • Automatically process data as they are acquired (event- and workflow-based) • Integrate with data movement and other actions—human and machine • Execute functions across the computing continuum e.g., near instrument, in data center, on specialized hardware
  • 17. Enabling serial crystallography at scale • Serially image chips with 000’s of embedded crystals • Quality control first 1,000 to report failures • Analyze batches of images as they are collected • Report statistics and images during experiment • Return structure to scientist Darren Sherrell, Gyorgy Babnigg, Andrzej Joachimiak
  • 18. Globus Flows How remote computing enables this use case Image processing Local workstation Carbon! Check threshold Data publication Transfer Transfer raw files Transfer Move results to repo Compute Analyze images Compute Visualize Compute Gather metadata Share Set access controls Compute Launch QA job Search Ingest to index Supercomputer GPU Cluster
  • 19. Globus enables “smart” instruments 19 aining of Deep Neural Networks ources 7sec 19sec 5sec 31sec. cycle time
  • 20. Seems too easy? Let’s take a look… • Run function on my laptop …test, debug • Scale (a little) on a cloud VM (or K8s cluster) • Scale (a bit more) on my campus cluster • …run complete model on DOE supercomputer :--( 20
  • 21. Common pattern of computing in a flow Transfer raw instrument images Run a compute job to process image files Transfer Compute Move processed images to sharing system/repository Set access controls for sharing data Share Transfer 1 2 3 4
  • 22. EC2 Instance “Instrument” EC2 Instance Compute Resource Our environment Registered Function Monitor script transfer control access result files 0 trigger flow run transfer raw files 1 invoke image processing function 2 set permissions 4 transfer result files 3 Data Endpoint Data Endpoint Data Endpoint ALCF Eagle Compute Endpoint
  • 23. Globus Compute current state • Service is running in production – Tried and tested by 10MMs of invocations on 100Ks endpoints • Globus Computer Agent supports single user – Akin to Globus Connect Personal • Endpoints visible via Globus web app • Function registration and invocation via SDK 23
  • 24. Compute service will evolve rapidly • Multi-user compute endpoints – Akin to Globus Connect Server • Native integration with transfer for stage in and stage out of data for compute tasks – Can be done today via Globus Flows • Expanding compute service interfaces in the webapp for administrators and users
  • 25. Support resources • Globus documentation: docs.globus.org • YouTube channel: youtube.com/GlobusOnline • Helpdesk: support@globus.org • Mailing Lists: globus.org/mailing-lists • Customer engagement team (office hours) • Professional services team (advisory, custom work)