3. Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
Instrument data orchestration:
A common design pattern
Image Analysis
3
Search/Discovery
5
Science!
6
Imaging
1 Acquisition
2
Description/Identification
4
v
4. Three Degrees of Automation
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Platform Services
Comprehensive data—and
compute—orchestration (with
human in the loop)
Search
Flows
Transfer
& Sharing
7. The Globus Timer service
• Scheduled/recurring file transfers
• Well suited to backup/sync tasks
• Service with a command line interface
– Simple installation: pypi.org/project/globus-timer-cli
– One-time authentication with a user identity
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
7
8. Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
8
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
9. Using the Globus Timer service
9
$ globus–timer session {login, logout, whoami}
$ globus–timer job transfer
--name example–job
--label "Timer Transfer Job"
--interval 28800
--start '2020–01–01T12:34:56'
--source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec
--dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec
--item ~/file1.txt ~/new_file1.txt false
--item ~/file2.txt ~/new_file2.txt false
10. Globus Timer service options
• ––items–file {file_name}
• ––stop–after–runs
• ––stop–after–date
• Transfer behavior (equivalent to options in web app)
––sync–level (how timer behaves if files exist)
––verify–checksum
––encrypt–data
––preserve–timestamp
10
13. Relevant Globus platform capabilities
• Data transfer and sharing
• Data description and discovery
• Data (and compute) orchestration
• Authentication and Authorization
13
Auth Search Transfer Groups Flows
14. Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
Auth
14
15. Several authentication models supported
• Application acting as user with consent
– Authorization code grant
• Application authenticating as itself
– Client credentials grant
• Application able to manage tokens for offline or long
running tasks
– Refresh tokens
16. Authorization Code Grant
16
Client
(Web Portal,
Application)
Globus service
(Resource Server)
Globus Auth
(Authorization Server)
5. Authenticate using client id and
secret, send authorization code
Browser (User)
1. Access
portal
2. Redirect
user
3. User authenticates
and consents
4. Authorization
code
6. Access token(s)
7. Authenticate with access token(s),
giving client authority to invoke the
requested service
Identity
Provider
17. Client credential grant
17
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Data Portal
(Client)
3. Authenticate as app
with access tokens to invoke
service (on behalf of authorized
user, within a given scope)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization Server)
18. Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
18
Auth
developers.globus.org
19. Data transfer and sharing
…you already know how to do this ;-)
• Move data to collection à Submit Transfer task
• Make data accessible à Set guest collection access rule
• Grant user/app access à Add/confirm Group membership
19
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer
20. Using guest collections in your apps
• Create a guest collection; requires authentication
– Cannot be completely automated – must ”log in”
– Create once and automate rest of the steps
• Grant the application Access Manager role
– Allows the application to manage permissions on the collection
– Set for application identity: appclientid@clients.auth.globus.org
• Grant roles for management of endpoint and tasks
Transfer
22. Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
22
docs.globus.org/api/search
Search
Index
Search
23. Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
• Federation via Globus: network scale ßà local control
– Data owners input/export data sets, apply QC, set access policies
– Registry data remain at the institution where they were generated
– Identities are provided/authenticated by the institution, not Globus
– System scale depends on data owners providing storage resources
24. CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated to
GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Request
Service
Cancer Registry De-identified
Data Index (minimal criteria
data: e.g., staging)
25. CR3 requirements
• Search Index
– Only de-identified data in search index
– No record-level for researchers
• Portal
– Fine-grained access control
– Researchers must use a specific identity
– Access must be logged
– Render graphs based on search results
– Faceted search in real time
26. CR3 Portal (simulated data)
Federated logon using Globus Auth
with Pitt/UPMC as identity providers
Dynamically updating
charts as facets change
Variable facets based on
source registry index
Google-like text search with
facets for filtering
Developed using a framework based
on the Globus Modern Research
Data Portal* design pattern
(docs.globus.org/mrdp)
* PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
34. Data (and compute) automation
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
35. Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
36. Extending the ecosystem: Action providers
36
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided