We introduce the various Globus approaches available for automating data flows, including the command line interface (CLI), the Globus Timer service and the Globus Flows service.
2. Globus Automation Capabilities
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
3. “Simple” Automation Use Cases
• Data backup – as user, as system
• Stage data in or out as part of a compute
job
• Portal/science gateway submits a
transfer of compute results as the user
• Portal/science gateway monitors users
transfer, and initiates processing or
backup of data.
4
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
7. Globus Command Line Interface
Automation of
simple data
management tasks
Integration with
existing scripts
(job submission …)
Open source, uses
the Python SDK
8. Commands refer to resources by UUID
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Tutorial Endpoint 1'
$ globus task list
$ globus get-identities vas@globusid.org
bfc122a3-af43-43e1-8a41-d36f28a2bc0a
9. Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
11. A simple, yet very common use case
Transfer data
Transfer
Set access controls for
sharing data
Share
1 2
• Analyze raw data from an instrument
• Distribute results from computation
12. Key Globus capabilities for automation
• Applications are first class entities
– Register application at developers.globus.org
– <client_id>@clients.auth.globus.org
• Guest collections
– No human in the loop for data access
– Creation of guest collection requires user authentication
13. Key Globus capabilities for automation
• Permissions management can be delegated
– Applications can be access managers
• Applications can renew tokens
– Refresh tokens along with Access tokens
– Refresh tokens can be used to get Access tokens
– Refresh token good for 6 months after last use
– Consent rescindment revokes refresh token
14
14. Examples: automation using CLI
github.com/globus/automation-examples
• ./share_data.sh
– Transfer a folder, and set permission for a users
• ./cli-sync.sh
– Sync one folder with the other
• See README for installation
• Python scripts that use SDK
15
16. CityCOVID
• Integrated COVID-19 pandemic
monitoring, modeling, and analysis
capability.
• CityCOVID is a city-scale agent-
based model
• Automate flow
– Scrape daily Chicago reports
– Perform simulations at ALCF
– Postprocess data at LCRC
Jonathan Ozik, Nick Collier, and
Charles Macal
17. Enabling serial crystallography at scale
• Serially image chips with
thousands of embedded crystals
• Quality control first 1,000 to report
failures
• Analyze batches of images as they
are collected
• Report statistics and images during
experiment
• Return crystal structure to scientist
Darren Sherrell, Gyorgy Babnigg, Andrzej Joachimiak
18. 19
Automation using the Globus platform
Managed, secure, reliable task
orchestration across heterogenous
resources, using a declarative language
for composition and an event driven
execution model, extensible via
custom actions, for automation at scale
19. The Globus Flows service
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* Coming soon
20. Create and deploy flows
21
• Define the flow and
deploy to Flows service
• Uses declarative
language (JSON or
YAML)
• Set policy: visibility,
runnable by
Action 1 Action 2 Action 3 Action 4
Action 1
Action 2
Choice
Action 4 Action 5
Action 3
21. Start and manage runs
22
• An instance of Flow
execution
– Provide input parameter
– Check status
– Cancel
• Set policy: monitor,
manager
• Triggers to start flows
24. Flow 1: transfer and set permissions
25
• Notebook at jupyter.demo.globus.org
• Choose “Automation Using Globus
Flows”
• Define and deploy flow using notebook
(Section A and B)
• Use Globus webapp to run the flow and
manage the run
25. Programmatic start of flows
26
• API to start and manage runs
• Globus Automate CLI and SDK
• Event driven start of flows: Triggers
- When a file of specific type is created
- Every 12 hours
26. Trigger: start flow when file is created
27
• SSH to the tutorial machine
• Set up GCP (if not done)
• Edit simple_sync.py
–Set it to run flow created using notebook
• Run simple_sync.py
• Monitor runs on the webapp
bit.ly/gw-tut
27. End to end instrument data management
28
• Trigger:
– Watch for file of specific type
– Start a flow with folder path and metadata about folder
• Flow
– Transfer data
– Set permissions
– Ingest public metadata to index
– Ingest restricted metadata to index
28. Flow 2: transfer, set permissions & ingest
29
• Notebook at jupyter.demo.globus.org
• Choose “Automation Using Flows with
Search”
• Define and deploy flow using notebook
(Section A and B)
29. Trigger: start flow when file is created
30
• SSH to the tutorial machine
• cd globus-flows-trigger-examples/
• Set up GCP (if not done)
• Edit trigger_transfer_share_flow.py
– Set it to run flow created using notebooks
• Edit and run trigger_transfer_publish_flow.py
• Monitor runs on the webapp
bit.ly/gw-tut
30. Automation services ecosystem
GET /provider_url/
POST /provider_url/run
GET /provider_url/action_id/status
GET /provider_url/action_id/cancel
GET /provider_url/action_id/status
Create Action
Providers
Define and
deploy flows
{ “StartAt”: ”ToProject”,
”States” : {
”ToProject” : { … },
”SetPermission” : { …},
“ProcessData” : { … } … }}
Run flows
31. Build action providers
32
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
32. Automating computation with funcX*
Managed, federated
Functions-as-a-Service for
reliably, scalably and securely
executing functions on remote
endpoints from laptops to
supercomputers
* funcX is in currently under development and in limited production use