An overview of developments in the Globus platform during 2020-2021, presented at a webinar hosted by Internet2. Includes an overview of Globus Connect Server v5, cloud storage connectors, and platform services for developers (e.g., Globus Search and Globus Flows).
3. What a difference three years make!
We’ve been busy…
• …Rebuilding the technical foundation
• …Growing the connected storage ecosystem
• …Expanding the data management platform
• …Automating data management at scale
3
4. Rebuilding the technical foundation: GCSv5
• Modern security model
• Support compliance requirements
• Deployment flexibility
• Enhanced sharing policies
• Simplified scaling/availability
4
10. Partnership with the community
to develop new connectors
Community Connector Program
11. Easy egress and ingress of data
Data sharing with collaborators
via unified interface
12. POSIX Staging Connector
• Example:
– IBM Spectrum Archive plugin, Brock Palen at University of
Michigan - github.com/brockpalen/ltfsee-globus
12
• For POSIX file systems
that cache from tertiary
storage
• Custom plug-in for
staging files
18. 3 Degrees of Automation
Timer Service
Scheduled and recurring transfers
(a.k.a. Globus cron)
Command Line Interface
Ad hoc scripting and integration
Globus Flows service
Comprehensive task (data and
compute) orchestration with human in
the loop interactions
20. CLI v3
• Collection support
• Group support
• Search support
• Multiple profiles
• Python SDK updates
20
21. Globus Flows: Managed task automation
• Flows: A platform service for defining, applying, and
sharing distributed research automation flows
• Flows comprise Actions
• Action Providers: Called by Flows to perform tasks
• Triggers*: Start flows based on events
* In development
22. Serial Synchrotron Crystallography of SARS-CoV-2 proteins
www.alcf.anl.gov/news/argonne-researchers-use-theta-real-time-analysis-covid-19-proteins
The pipeline generates large image batches at a high rate, with data transfers achieving speeds of
700 megabytes per second thanks to Globus, a University of Chicago-run data management service.
23. Gladier: The Globus Architecture for Data-Intensive
Experimental Research
• Accelerate and simplify flow
development and deployment
• Combine tools into reliable,
flexible, secure, distributed
flows
• Bridge instruments and
computing facilities
• Automate data collection and
publication to create FAIR data
25. funcX: managed and federated FaaS
• Cloud-hosted service for managing compute
• Register and share compute endpoints
• Register and share Python functions
• Reliably, scalable, securely execute functions on
remote endpoints
• Integrated with Globus Auth and data ecosystem
25
Try funcx on Binder
https://funcx.org
26. Our Mission
Increase the efficiency and
effectiveness of researchers
engaged in data-driven
science and scholarship
through sustainable software