In this talk we shortly introduce potentials and shortcomings of Swift when using it with high-latency storage, such as tape, optical disk or MAID based archival storage. Then we explain how we think those shortcomings should be overcome, and share the most relevant technical aspects and experience from our initial experiments with tape on doing so. Customized data collocation, an API for ILM between fast-access and high-latency storage and data auditing aspects are covered.
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Adapting Swift for Tape Storage or other high-latency media
1. Adapting Swift for Tape
Storage or other high-latency
media
October 27, 2015
Harald Seipp (IBM Systems – Presenter)
Slavisa Sarafijanovic (IBM Research)
2. Goal
Augment cloud object storage with a
low-cost, cold storage tier for
archive/backup use cases
Reduced cost
●
significantly lower than disk
Reduced availability
●
on the order of minutes
primary storage
highly available
archival storage
low-cost
archive
restore
Standard API (REST)
Client Application
HDD High-latency
media
OpenStack Swift Cluster
3. Main Idea
Single Object Storage name space
for Objects on
●
Tape or
●
Optical Disc or
●
SMR or MAID Disk
integrated with a standard disk-
based OpenStack Swift installation
primary storage
highly available
archival storage
low-cost
archive
restore
Standard API (REST)
Client Application
HDD High-latency
media
OpenStack Swift Cluster
4. Facts about Tape
Tape is 5x-10x cheaper than disk
Tape density scaling and cost are
projected to be advantageous over
disk for the next 10 years (see 220
TB cartridge demo)
Tape is a mature technology
Tape is already used in today’s
cloud offerings
LTFS is a widely adapted standard
primary storage
highly available
archival storage
low-cost
archive
restore
Standard API (REST)
Client Application
HDD LTFS Tape
OpenStack Swift Cluster
5. Shortcomings to be solved
Time-to-data
●
Up to (single-digit) minutes
→ Not playing well with Swift infrastructure
(application/load balancer) time-out
assumptions
Resource availability
●
Few drives per 100s cartridges
→ Random access (mounts/seeks) can lead
to resource congestion
6. Addressing shortcomings
Swift API for archiving operations
●
Support explicit bulk operations (to minimize tape mounts and seeks)
●
Store/provide object state (“offline bit”) in a standardized way
●
Provide additional error code (“in transit”) upon access of migrated object
Improved timeout management
Configurable Data Ring Auditing
●
Support asynchronous tape data verification
Policy based global cluster object distribution
●
Assumption: related data (e.g. container) is likely to be accessed together
7. Discussed at
Vancouver Summit
Addressing shortcomings
Reference: https://etherpad.openstack.org/p/liberty-swift-tape-storage
Swift API for archiving operations
●
Support explicit bulk operations (to minimize tape mounts and seeks)
●
Store/provide object state (“offline bit”) in a standardized way
●
Provide additional error code (“in transit”) upon access of migrated object
Improved timeout management
Configurable Data Ring Auditing
●
Support asynchronous tape data verification
Policy based global cluster object distribution
●
Assumption: related data (e.g. container) is likely to be accessed together
8. Swift
Swift API
Swift API ILM extensions
ILM
capable
backend
POSIX
File System
Swift API ILM* extensions:
• Migrate (High-Latency media → Disk)
• Recall (Disk → high-latency media)
• Query status
Implementation proposal:
• SwiftILM middleware
• Control path to ILM capable backend:
• (1) Swift EA ←→ file attribute (async)
• (2) Backend executable (sync/async)
(1)
(2)
SwiftILM
Middleware
Disk
cache
Tape
Optical
Disc
MAID/
SMR
Call
Executable
Swift archiving API through SwiftILM
*Information Lifecycle Management
9. SwiftILM API proposal
To migrate a single object, issue following HTTP POST
http://SWIFT-URL/ACCT/CONT/OBJ?MIGRATE
●
Similar GET/HEAD requests for RECALL and STATUS
Bulk operations on container level
http://SWIFT-URL/ACCT/CONT?MIGRATE
...or through regular expressions on Swift namespace
●
Get back a request ID for efficient status tracking
10. SwiftILM API proposal – advanced
(Optional) Setting ILM operations through SwiftILM API
●
Migration/recall based on object age/size/type etc.
(Optional) Backend-specific additions
●
e.g. to control placement to specific library/medium/pool
(Optional) Co-existence with Swift3
●
enabling ILM for S3 protocol as well
11. Add ILM to your existing Swift cluster
OpenStack Swift
Client Application
Standard Swift API with SwiftILM extensions(REST)
Standard Disk Data Ring
(replication or erasure code)
scale-out
ILM-based Data Ring
(replication across nodes)
scale-out
SwiftILM
Middleware
Take unmodified Swift
Configure ILM-based
Data Ring
Add SwiftILM
middleware
Add ILM-capable
backend
ILM
capable
backend
Storage Node
ILM
capable
backend
Storage Node
Disk
cache
Tape
Optical
Disc
MAID/
SMR
Disk
cache
Tape
Optical
Disc
MAID/
SMR
12. Join us at the Design Summit or IBM booth
for further discussions!
seipp@de.ibm.com
IRC: hseipp
Twitter: @HaraldSeipp
http://www.research.ibm.com/labs/zurich/sto/tier_icetier.html