SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Instrument data orchestration with
Globus Search and Flows
Vas Vasiliadis
vas@uchicago.edu
October 13, 2021
Why we’re all here this week…
2
Distribution Store
Data Portal
Advanced Computing Facility
Instrument Facility
Instrument data orchestration:
A common design pattern
Image Analysis
3
Search/Discovery
5
Science!
6
Imaging
1 Acquisition
2
Description/Identification
4
v
Three Degrees of Automation
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Platform Services
Comprehensive data—and
compute—orchestration (with
human in the loop)
Search
Flows
Transfer
& Sharing
Globus Command Line
Interface (CLI)
…you’re all experts on this already!
Globus Timer Service
The Globus Timer service
• Scheduled/recurring file transfers
• Well suited to backup/sync tasks
• Service with a command line interface
– Simple installation: pypi.org/project/globus-timer-cli
– One-time authentication with a user identity
• Example: NIH – hpc.nih.gov/storage/globus_cron.html
7
Use case: Data replication
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
8
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Using the Globus Timer service
9
$ globus–timer session {login, logout, whoami}
$ globus–timer job transfer 
--name example–job 
--label "Timer Transfer Job" 
--interval 28800 
--start '2020–01–01T12:34:56' 
--source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec 
--dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec 
--item ~/file1.txt ~/new_file1.txt false 
--item ~/file2.txt ~/new_file2.txt false
Globus Timer service options
• ––items–file {file_name}
• ––stop–after–runs
• ––stop–after–date
• Transfer behavior (equivalent to options in web app)
––sync–level (how timer behaves if files exist)
––verify–checksum
––encrypt–data
––preserve–timestamp
10
Timer options in the webapp
Coming soon….
Platform Services
Relevant Globus platform capabilities
• Data transfer and sharing
• Data description and discovery
• Data (and compute) orchestration
• Authentication and Authorization
13
Auth Search Transfer Groups Flows
Globus Auth: Foundational IAM service
Brokers authentication and authorization among…
– End-users
– Identity providers: enterprise, external (federated identities)
– Services: resource servers with REST APIs
– Apps: web, mobile, desktop, command line clients
– Services acting as clients to other services
• OAuth 2.0 Authorization Framework (a.k.a. OAuth2)
• OpenID Connect Core 1.0 (a.k.a. OIDC)
Auth
14
Several authentication models supported
• Application acting as user with consent
– Authorization code grant
• Application authenticating as itself
– Client credentials grant
• Application able to manage tokens for offline or long
running tasks
– Refresh tokens
Authorization Code Grant
16
Client
(Web Portal,
Application)
Globus service
(Resource Server)
Globus Auth
(Authorization Server)
5. Authenticate using client id and
secret, send authorization code
Browser (User)
1. Access
portal
2. Redirect
user
3. User authenticates
and consents
4. Authorization
code
6. Access token(s)
7. Authenticate with access token(s),
giving client authority to invoke the
requested service
Identity
Provider
Client credential grant
17
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Data Portal
(Client)
3. Authenticate as app
with access tokens to invoke
service (on behalf of authorized
user, within a given scope)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization Server)
Step 0: Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
18
Auth
developers.globus.org
Data transfer and sharing
…you already know how to do this ;-)
• Move data to collection à Submit Transfer task
• Make data accessible à Set guest collection access rule
• Grant user/app access à Add/confirm Group membership
19
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer
Using guest collections in your apps
• Create a guest collection; requires authentication
– Cannot be completely automated – must ”log in”
– Create once and automate rest of the steps
• Grant the application Access Manager role
– Allows the application to manage permissions on the collection
– Set for application identity: appclientid@clients.auth.globus.org
• Grant roles for management of endpoint and tasks
Transfer
Globus Search Service
Data description and discovery
• Metadata store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
22
docs.globus.org/api/search
Search
Index
Search
Cancer Registry Records for Research (CR3)
• Create network of federated cancer registries
– Deploy similar infrastructure at other cancer registries
– Enable queries across multiple registries
• Federation via Globus: network scale ßà local control
– Data owners input/export data sets, apply QC, set access policies
– Registry data remain at the institution where they were generated
– Identities are provided/authenticated by the institution, not Globus
– System scale depends on data owners providing storage resources
CR3
Discovery
Portal
Cohort
aggregate
counts
Login with
UPMC/Pitt
credentials
Globus
Search (GS)
Globus
Auth (GA)
UPMC/Pitt
Identity
Providers
Authentication
Auth
initiated to
GA
Cohort
search
initiated to
GS
Researcher
Cohort
aggregate
counts
returned
CR3 Architecture
Globus
Transfer (GT)
Registry Staff
Data transfer from registrar to
researcher mediated by GT
Manage
authorization
Elasticsearch
Request
Service
Cancer Registry De-identified
Data Index (minimal criteria
data: e.g., staging)
CR3 requirements
• Search Index
– Only de-identified data in search index
– No record-level for researchers
• Portal
– Fine-grained access control
– Researchers must use a specific identity
– Access must be logged
– Render graphs based on search results
– Faceted search in real time
CR3 Portal (simulated data)
Federated logon using Globus Auth
with Pitt/UPMC as identity providers
Dynamically updating
charts as facets change
Variable facets based on
source registry index
Google-like text search with
facets for filtering
Developed using a framework based
on the Globus Modern Research
Data Portal* design pattern
(docs.globus.org/mrdp)
* PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
Distinct access policies
may be applied to
Data and Metadata
Data ingest with Globus Search
28
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
Data ingest with Globus Search
29
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": ”weight",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": ”37.6",
"metadata-schema/file#size_human": ”<50lb”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
Data discovery with Globus Search
30
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{ ... }
],
"subject": "https://..."
}
],
"offset": 0,
"total": 1
}
GET /index/{index_id}/search?q=type%3Ahdf5
Search
Index
Simple query
Search
Data discovery with Globus Search
31
POST /index/{index_id}/search
Search
Index
Complex query
{
"filters": [
{
"type": "range",
"field_name": ”pubdate",
"values": [
{
"from": "*",
"to": "2020-12-31"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "pubdate",
...
}
]
}
Search
Working with Globus
Search
32
jupyter.demo.globus.org
Metadata, Search and Discovery
Globus Flows Service
Data (and compute) automation
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
Automation with Globus Flows
• Built on AWS Step Functions
– Simple JSON-based state machine
language
– Conditions, loops, fault tolerance, etc.
– Propagates state through the flow
• Standardized API for integrating
custom event and action services
– Actions: synchronous or asynchronous
– Custom Web forms prompt for user input
• Actions secured with Globus Auth
Extending the ecosystem: Action providers
36
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-
tools.readthedocs.io/en/latest
Search
Transfer
Notification
ACLs Identifier
Delete
Ingest
User
Form
Describe Xtract
funcX Web
Form
Custom built
Globus Provided
Working with Globus
Flows
37
jupyter.demo.globus.org
Automation Using Globus Flows
Coming soon: Globus Trigger service
• Trigger–Action platform
• Predefined triggers and
actions to create rules
• Globus processes triggers
and reliably executes actions
globus.org
docs.globus.org
outreach@globus.org
support@globus.org

Weitere ähnliche Inhalte

Was ist angesagt?

Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
Globus
 

Was ist angesagt? (20)

Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)Introduction to Globus (GlobusWorld Tour West)
Introduction to Globus (GlobusWorld Tour West)
 
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDKGlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
 
Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)Data Orchestration at Scale (GlobusWorld Tour West)
Data Orchestration at Scale (GlobusWorld Tour West)
 
What's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtraWhat's New in Globus - Internet2 TechEXtra
What's New in Globus - Internet2 TechEXtra
 
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
 
Best Practices for Data Sharing (GlobusWorld Tour - UCSD)
Best Practices for Data Sharing (GlobusWorld Tour - UCSD)Best Practices for Data Sharing (GlobusWorld Tour - UCSD)
Best Practices for Data Sharing (GlobusWorld Tour - UCSD)
 
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
 
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)Making Storage Systems Accessible via Globus (GlobusWorld Tour West)
Making Storage Systems Accessible via Globus (GlobusWorld Tour West)
 
Automating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with GlobusAutomating Research Data Management at Scale with Globus
Automating Research Data Management at Scale with Globus
 
Globus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data PublicationGlobus and Dataverse: Towards big Data Publication
Globus and Dataverse: Towards big Data Publication
 
Globus Platform Overview
Globus Platform OverviewGlobus Platform Overview
Globus Platform Overview
 
Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)Globus for System Administrators (GlobusWorld Tour - UCSD)
Globus for System Administrators (GlobusWorld Tour - UCSD)
 
Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)Simple Data Automation with Globus (GlobusWorld Tour West)
Simple Data Automation with Globus (GlobusWorld Tour West)
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Tutorial: Managing Protected Data with Globus Connect Server v5
Tutorial: Managing Protected Data with Globus Connect Server v5Tutorial: Managing Protected Data with Globus Connect Server v5
Tutorial: Managing Protected Data with Globus Connect Server v5
 
Tutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research ApplicationsTutorial: Leveraging Globus in your Research Applications
Tutorial: Leveraging Globus in your Research Applications
 
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with GlobusGateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
 
"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018"What's New With Globus" Webinar: Spring 2018
"What's New With Globus" Webinar: Spring 2018
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17Globus: Research Data Management as Service and Platform - pearc17
Globus: Research Data Management as Service and Platform - pearc17
 

Ähnlich wie Instrument Data Orchestration with Globus Search and Flows

Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRness
Globus
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
Ian Foster
 

Ähnlich wie Instrument Data Orchestration with Globus Search and Flows (20)

Building Research Applications with Globus PaaS
Building Research Applications with Globus PaaSBuilding Research Applications with Globus PaaS
Building Research Applications with Globus PaaS
 
Working with Globus Platform Services
Working with Globus Platform ServicesWorking with Globus Platform Services
Working with Globus Platform Services
 
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
 
Advanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRnessAdvanced Computing Meets Data FAIRness
Advanced Computing Meets Data FAIRness
 
Working with Globus Platform Services and Portals
Working with Globus Platform Services and PortalsWorking with Globus Platform Services and Portals
Working with Globus Platform Services and Portals
 
Scalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data PortalScalable Data Management: Automation and the Modern Research Data Portal
Scalable Data Management: Automation and the Modern Research Data Portal
 
Best Practices for Data Sharing (GlobusWorld Tour - Columbia University)
Best Practices for Data Sharing (GlobusWorld Tour - Columbia University)Best Practices for Data Sharing (GlobusWorld Tour - Columbia University)
Best Practices for Data Sharing (GlobusWorld Tour - Columbia University)
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
Introduction to Globus for New Users
Introduction to Globus for New UsersIntroduction to Globus for New Users
Introduction to Globus for New Users
 
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus PlatformAutomating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
 
Jupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data ScienceJupyter + Globus: The Foundation for Interactive Data Science
Jupyter + Globus: The Foundation for Interactive Data Science
 
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus PlatformAutomating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
 
Tutorial: Best Practices for Data Sharing
Tutorial: Best Practices for Data SharingTutorial: Best Practices for Data Sharing
Tutorial: Best Practices for Data Sharing
 
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
Leveraging the Globus Platform in Web Applications (CHPC 2019 - South Africa)
 
Globus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management PlatformGlobus Auth: A Research Identity and Access Management Platform
Globus Auth: A Research Identity and Access Management Platform
 
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with GlobusGateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
 
Using Globus to Streamline Research at Scale
Using Globus to Streamline Research at ScaleUsing Globus to Streamline Research at Scale
Using Globus to Streamline Research at Scale
 
Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus Managing Protected and Controlled Data with Globus
Managing Protected and Controlled Data with Globus
 
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
 
Best Practices for Data Sharing (CHPC 2019 - South Africa)
Best Practices for Data Sharing (CHPC 2019 - South Africa)Best Practices for Data Sharing (CHPC 2019 - South Africa)
Best Practices for Data Sharing (CHPC 2019 - South Africa)
 

Mehr von Globus

Mehr von Globus (20)

Advanced Globus System Administration Topics
Advanced Globus System Administration TopicsAdvanced Globus System Administration Topics
Advanced Globus System Administration Topics
 
Instrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a FlowInstrument Data Automation: The Life of a Flow
Instrument Data Automation: The Life of a Flow
 
Reliable, Remote Computation at All Scales
Reliable, Remote Computation at All ScalesReliable, Remote Computation at All Scales
Reliable, Remote Computation at All Scales
 
Best Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using GlobusBest Practices for Data Sharing Using Globus
Best Practices for Data Sharing Using Globus
 
An Introduction to Globus for Researchers
An Introduction to Globus for ResearchersAn Introduction to Globus for Researchers
An Introduction to Globus for Researchers
 
Introduction to Research Automation with Globus
Introduction to Research Automation with GlobusIntroduction to Research Automation with Globus
Introduction to Research Automation with Globus
 
Globus for System Administrators
Globus for System AdministratorsGlobus for System Administrators
Globus for System Administrators
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Introduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for ResearchersIntroduction to Data Transfer and Sharing for Researchers
Introduction to Data Transfer and Sharing for Researchers
 
Introduction to the Globus Platform for Developers
Introduction to the Globus Platform for DevelopersIntroduction to the Globus Platform for Developers
Introduction to the Globus Platform for Developers
 
Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)Introduction to the Command Line Interface (CLI)
Introduction to the Command Line Interface (CLI)
 
Automating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and ComputeAutomating Research Data with Globus Flows and Compute
Automating Research Data with Globus Flows and Compute
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Globus Automation
Globus AutomationGlobus Automation
Globus Automation
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus
Introduction to GlobusIntroduction to Globus
Introduction to Globus
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 
Advanced Globus System Administration
Advanced Globus System AdministrationAdvanced Globus System Administration
Advanced Globus System Administration
 
Introduction to Globus for System Administrators
Introduction to Globus for System AdministratorsIntroduction to Globus for System Administrators
Introduction to Globus for System Administrators
 

Kürzlich hochgeladen

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 

Instrument Data Orchestration with Globus Search and Flows

  • 1. Instrument data orchestration with Globus Search and Flows Vas Vasiliadis vas@uchicago.edu October 13, 2021
  • 2. Why we’re all here this week… 2
  • 3. Distribution Store Data Portal Advanced Computing Facility Instrument Facility Instrument data orchestration: A common design pattern Image Analysis 3 Search/Discovery 5 Science! 6 Imaging 1 Acquisition 2 Description/Identification 4 v
  • 4. Three Degrees of Automation Timer Service Scheduled and recurring transfers (a.k.a. Globus cron) Command Line Interface Ad hoc scripting and integration Platform Services Comprehensive data—and compute—orchestration (with human in the loop) Search Flows Transfer & Sharing
  • 5. Globus Command Line Interface (CLI) …you’re all experts on this already!
  • 7. The Globus Timer service • Scheduled/recurring file transfers • Well suited to backup/sync tasks • Service with a command line interface – Simple installation: pypi.org/project/globus-timer-cli – One-time authentication with a user identity • Example: NIH – hpc.nih.gov/storage/globus_cron.html 7
  • 8. Use case: Data replication • For backup: initiated by user or system back up • Automated transfer of data from science instrument 8 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  • 9. Using the Globus Timer service 9 $ globus–timer session {login, logout, whoami} $ globus–timer job transfer --name example–job --label "Timer Transfer Job" --interval 28800 --start '2020–01–01T12:34:56' --source–endpoint ddb59aef–6d04–11e5–ba46–22000b92c6ec --dest–endpoint ddb59af0–6d04–11e5–ba46–22000b92c6ec --item ~/file1.txt ~/new_file1.txt false --item ~/file2.txt ~/new_file2.txt false
  • 10. Globus Timer service options • ––items–file {file_name} • ––stop–after–runs • ––stop–after–date • Transfer behavior (equivalent to options in web app) ––sync–level (how timer behaves if files exist) ––verify–checksum ––encrypt–data ––preserve–timestamp 10
  • 11. Timer options in the webapp Coming soon….
  • 13. Relevant Globus platform capabilities • Data transfer and sharing • Data description and discovery • Data (and compute) orchestration • Authentication and Authorization 13 Auth Search Transfer Groups Flows
  • 14. Globus Auth: Foundational IAM service Brokers authentication and authorization among… – End-users – Identity providers: enterprise, external (federated identities) – Services: resource servers with REST APIs – Apps: web, mobile, desktop, command line clients – Services acting as clients to other services • OAuth 2.0 Authorization Framework (a.k.a. OAuth2) • OpenID Connect Core 1.0 (a.k.a. OIDC) Auth 14
  • 15. Several authentication models supported • Application acting as user with consent – Authorization code grant • Application authenticating as itself – Client credentials grant • Application able to manage tokens for offline or long running tasks – Refresh tokens
  • 16. Authorization Code Grant 16 Client (Web Portal, Application) Globus service (Resource Server) Globus Auth (Authorization Server) 5. Authenticate using client id and secret, send authorization code Browser (User) 1. Access portal 2. Redirect user 3. User authenticates and consents 4. Authorization code 6. Access token(s) 7. Authenticate with access token(s), giving client authority to invoke the requested service Identity Provider
  • 17. Client credential grant 17 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens to invoke service (on behalf of authorized user, within a given scope) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  • 18. Step 0: Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 18 Auth developers.globus.org
  • 19. Data transfer and sharing …you already know how to do this ;-) • Move data to collection à Submit Transfer task • Make data accessible à Set guest collection access rule • Grant user/app access à Add/confirm Group membership 19 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  • 20. Using guest collections in your apps • Create a guest collection; requires authentication – Cannot be completely automated – must ”log in” – Create once and automate rest of the steps • Grant the application Access Manager role – Allows the application to manage permissions on the collection – Set for application identity: appclientid@clients.auth.globus.org • Grant roles for management of endpoint and tasks Transfer
  • 22. Data description and discovery • Metadata store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 22 docs.globus.org/api/search Search Index Search
  • 23. Cancer Registry Records for Research (CR3) • Create network of federated cancer registries – Deploy similar infrastructure at other cancer registries – Enable queries across multiple registries • Federation via Globus: network scale ßà local control – Data owners input/export data sets, apply QC, set access policies – Registry data remain at the institution where they were generated – Identities are provided/authenticated by the institution, not Globus – System scale depends on data owners providing storage resources
  • 24. CR3 Discovery Portal Cohort aggregate counts Login with UPMC/Pitt credentials Globus Search (GS) Globus Auth (GA) UPMC/Pitt Identity Providers Authentication Auth initiated to GA Cohort search initiated to GS Researcher Cohort aggregate counts returned CR3 Architecture Globus Transfer (GT) Registry Staff Data transfer from registrar to researcher mediated by GT Manage authorization Elasticsearch Request Service Cancer Registry De-identified Data Index (minimal criteria data: e.g., staging)
  • 25. CR3 requirements • Search Index – Only de-identified data in search index – No record-level for researchers • Portal – Fine-grained access control – Researchers must use a specific identity – Access must be logged – Render graphs based on search results – Faceted search in real time
  • 26. CR3 Portal (simulated data) Federated logon using Globus Auth with Pitt/UPMC as identity providers Dynamically updating charts as facets change Variable facets based on source registry index Google-like text search with facets for filtering Developed using a framework based on the Globus Modern Research Data Portal* design pattern (docs.globus.org/mrdp) * PeerJ Articles:cs-144 https://peerj.com/articles/cs-144/
  • 27. Distinct access policies may be applied to Data and Metadata
  • 28. Data ingest with Globus Search 28 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  • 29. Data ingest with Globus Search 29 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": ”weight", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": ”37.6", "metadata-schema/file#size_human": ”<50lb” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  • 30. Data discovery with Globus Search 30 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  • 31. Data discovery with Globus Search 31 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  • 34. Data (and compute) automation • Flows: A platform service for defining, applying, and sharing distributed research automation flows • Flows comprise Actions • Action Providers: Called by Flows to perform tasks • Triggers*: Start flows based on events * In development
  • 35. Automation with Globus Flows • Built on AWS Step Functions – Simple JSON-based state machine language – Conditions, loops, fault tolerance, etc. – Propagates state through the flow • Standardized API for integrating custom event and action services – Actions: synchronous or asynchronous – Custom Web forms prompt for user input • Actions secured with Globus Auth
  • 36. Extending the ecosystem: Action providers 36 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider- tools.readthedocs.io/en/latest Search Transfer Notification ACLs Identifier Delete Ingest User Form Describe Xtract funcX Web Form Custom built Globus Provided
  • 38. Coming soon: Globus Trigger service • Trigger–Action platform • Predefined triggers and actions to create rules • Globus processes triggers and reliably executes actions