In this presentation from June 26, 2018, Globus co-founder Steve Tuecke discussed Globus Connect Server 5.1 with HTTPS file access; plans for new premium storage connectors; upcoming publication services including the new Globus Search and Identifiers services; the new Globus Web App, SSH with Globus Auth, and more.
1. Whatâs New with Globus
Q&A Webinar Series
June 26, 2018
Steve Tuecke, Globus Co-Founder
2. 2
Research Computing HPC
Desktop Workstations
Mass Storage Instruments
Personal Resources
Public Cloud
National Resources
Unify access to data across tiers
3. Public / private cloud stores
External
campus
storage
EC2
Project
repositories,
replication stores
Public repositories
Share with collaborators/community
6. 8,000
active shared
endpoints
90
subscribers
425 PB
transferred
18,000
active GCP
endpoints
70 billion
files processed
1,700
active GCS
endpoints
3 months
longest running transfer
1 PB
largest single
transfer to date
99.9%
availability
500
identity providers
1,042
most shared
endpoints
at a single
institution 100,000
users
Globus by the numbers
9. Connectors for S3 âcompatibleâ systems
⢠S3 API is de-facto standard API for object storage
⢠Make it easier to validate and support connectors for
S3 âcompatibleâ object storage systems
â Functionality and performance test suite
â Improving connector robustness and performance
â E.g., Ceph, ActiveScale, SwiftStack, Wasabi,
IBM Cloud Object Storage System (CleverSafe)
⢠Also requires vendor engagement and market interest
9
10. HPSS Connector
⢠Community has agreed on sustainability model
⢠NERSC & ORNL investing in enhancements
⢠Premium storage connector subscription
10
11. Globus Connect Server version 5.x
⢠HTTPS access to storage
⢠Globus Auth (OAuth2) for authentication and authorization
⢠Scale out deployment without shared file system
⢠Multiple storage systems simultaneously
⢠Single port for data access
⢠Improved endpoint administration
⢠And more
GCSv5.1 Webinar: https://www.youtube.com/watch?v=Ubu0KhIbIA0
12. GCS v5 Milestones
v5.0: Google
Drive
v5.1: POSIX guest
collections, HTTPS
v5.2: High assurance
(e.g. HIPAA)
v5.N: includes all
version 4.0 features
v5.3: âŚ
Multi DTN support,
other storage types,
custom identity
providers
âŚ
Other features
v5.1: POSIX guest
collections, HTTPS
Upgrade from GCS v4
20. GCS v5 differences from GCS v4
⢠Globus ID is not needed
⢠Endpoint created using endpoint client identity
â <client_id>@clients.auth.globus.org
â Managed through https://developers.globus.org
⢠Port 443 rather than 2811 used for GridFTP control channel
â Also HTTPS access, and (eventually) GridFTP data channel
⢠Each collection is assigned a DNS name under dn.glob.us
â E.g. 988c.8540.dn.glob.us
⢠Certificates are obtained from Letâs Encrypt
⢠Need to create storage gateway(s) for data access
21. Globus Connect Server v5.1 features
⢠HTTPS (and GridFTP) access to data
⢠Multiple storage connectors
â POSIX
â Google Drive
⢠Guest collections (shared endpoints) only
⢠Single DTN install only
⢠Authentication for data access using only identity
providers used to login to Globus
https://docs.globus.org/globus-connect-server-v5-installation-guide
22. Use GCS v5.1 only if you needâŚ
⢠Google Drive support
â Migrate from 5.0 to 5.1
â Contact us for migration documentation
⢠HTTPS access to data
â with guest collections (shared endpoint)
⢠Else wait for feature complete GCS 5.N
GCSv5.1 Webinar: https://www.youtube.com/watch?v=Ubu0KhIbIA0
24. Protected data
⢠High assurance endpoints
â User must authenticate with specific identity within a specified time period,
with browser session and native app device instance isolation
â Audit logging
â Multi-factor authentication
⢠For data that requires additional security
â HIPAA Personal Health Information (PHI) w/ BAA
â Personally Identifiable Information (PII)
â Sensitive but unclassified
⢠NIST 800-171 Low
⢠Requires Globus Connect Server v5.2
⢠Two additional subscription tiers
â High assurance tier: for all added security features
â BAA tier: high assurance features plus BAA with Uchicago
⢠Available this Summer
â Transfer, sharing, web app, CLI only.
â Excludes publish, search, identifiers, hosted CLI, GlobusID 24
25. Command Line Interface
⢠New Globus CLI is
generally available
â Fully functional
â Many enhancements
â Simple updater
⢠Deprecating old
hosted SSH CLI
â Will be turned off August 1
⢠pip install --upgrade --user
globus-cli
https://docs.globus.org/cli
27. Publication v1
⢠Publication v1 app
â Publish datasets
to Globus Search
â Internationalization
⢠Canadian Federated
Research Data Repository
â https://frdr.ca/
â Uses v1 open source
and Globus Search
27
28. ⢠Decompose Publication v1 into platform components
⢠Allow flexible re-composition & adaptation by customers
Describe
Get
metadata
Auth
Get
credentials
Identifiers
Mint DOI
Search
Catalog
Transfer
Create
folder
Transfer
data
Set ACLAutomate⌠⌠âŚâŚ
Publication v2 platform
29. Globus Search platform service
⢠Search service:
â Schema agnostic: can use standard (e.g., DataCite) or custom metadata
â Fine grain access control: only returns results that are visible to user
â Plain text search: ranked results
â Faceted search: for data discovery
â Rich query language: ranges, expressions, regex, fuzzy, stemming, etc.
â Scalable: to billions of entries
⢠Limited production
â Contact us at support@globus.org if you are interested in using it
29
32. Globus Identifiers platform service
⢠Issue persistent identifiers
â DOI, ARK, Handle, Globus
â E.g., https://identifiers.globus.org/doi:10.1145/2076450.2076468
⢠Within a namespace
â E.g., Your Universityâs DataCite namespace
â Control which identities and groups can create identifiers in your namespace
⢠Each identifier has:
â Link to data: one or more https URLs, to file, folder or manifest
â Landing page: provided by service, or by user
â Visibility: which identities and groups can see identifier
â Checksum: of the file or manifest
â Metadata: as required by identifier (e.g., DataCite), extensible
â Replaces / Replaced-by: for versioning
⢠Limited beta available now
â Contact us at support@globus.org if you are interested in using it
32
33. Jupyter integration
⢠Authenticate to JupyterHub
with Globus Auth
â Passes tokens into notebooks
as environment variable
⢠Use Globus data management platform from notebooks
â With Globus Python SDK
33
https://github.com/globus/globus-jupyter-notebooks
35. SSH with Globus Auth
⢠Securely access resource using SSH with federated identity
â Leverage same security model as rest of data infrastructure
â Facilitates automation
â Eliminate need to manage SSH key lifecycle and provisioning
⢠Replaces GSI SSH
⢠Client side wrapper around local SSH client (globus-ssh âŚ)
⢠No changes to the SSH server (PAM module)
⢠Status:
â Beta is imminent, for early customer feedback
â Generally available by end of year
35
37. Groups
⢠Generally available in web app
⢠REST API has been in limited production
⢠Plan on opening some portion to general availability
â Please tell us your use cases
37
38. New web app
⢠Complete file manager for any research storage
⢠Improved browser experience
â Accessibility: WCAG 2.0 AA
â Responsiveness: from large desktop to small phone
â Touch support: for phones and pads
⢠Leverage Globus Connect HTTPS
â E.g., Preview, download
⢠Beta available now:
https://app.globus.org
38
The first in a semi-annual series, this webinar will cover new HTTPS capabilities, upcoming HIPAA support, plans for new storage connectors, SSH with Globus Auth, and much more. We'll also leave plenty of time for audience questions.
Cleanup before demo:
rm ârf ~/.local ~/.globus.cfg
Demo:
pip install --upgrade --user globus-cli
globus login
globus endpoint search âmidway tueckeâ
⌠copy <UUID> from output listing âŚ
globus ls <UUID>
https://ramsesproject.org
Demo:
Start logged out
Login to show that more facets with more dataset shows up
Search for âgpfsâ
Select â2Gâ facet
Select first dataset
Show overview with data dictionary
Show preview
Show transfer