1. Cinder Enhancements for Replication (and more)
using Stateless Snapshots
Using Stateless Snapshots with Taskflow
•
Caitlin.Bestler@nexenta.com
2. Snapshots
• With Havana, all Cinder Volume Drivers support snapshots.
• But some vendors provide “stateless” volume snapshots:
– Taking the snapshot does not interfere with use of the Volume.
– The Volume remains fully readable and writeable
• Stateless/Low-overhead snapshots are useful for many other activities
– Replication, Migration, Fail-over, Archiving, Deploying Master Images, …
• What is proposed:
– A set of optional enhancements for Cinder Volume Drivers.
– A pattern of usage for Taskflows to take advantage of stateless
snapshots.
3. Backup, Migration, Replication and pre-failover Preparation
• Multiple methods, but a common pattern with the same
issues:
• Need for NDMP/OST-style direct appliance-to-appliance transfers.
– Volumes are big, transferring them twice is not acceptable
– Transferring them “through” the volume-manager is not
acceptable either
• Low-cost snapshots enable “stateless” methods
• Volume Drivers must report their capabilities:
– can-snap-stateless, storage-assist, etc.
4. Handful of Stones, Many Birds
• Proposal: Volumes Drivers to optionally implement:
– Snapshot Replication.
– Severing the tie to the Volume status.
– Reporting capabilities.
• Variety of ways that Taskflow could use those to:
– Backup, Migrate, Implement a variety of data protection strategies,
Enhance automatic failover, Improve deployment of cloned images
– Implement sophisticated snap-retention policies
– https://review.openstack.org/#/c/53480/
– https://blueprints.launchpad.net/cinder/+spec/volume-backup-createtask-flow
5. When Cinder manages non-local storage
•
•
This deployment
was cited as one
of two in the
deep dive
presentation.
But the first
implementation
of backup does
not work
acceptably for
these
deployments.
Nova
Cinder
VM Instance
Specific Volume
Manager
/dev/vda
Hypervisor
iSCSI
Initiator
Storage Backend
Storage Controller
iSCSI
Target
6. Current Cinder Backup
1. Volume Driver fetches content
2. Volume Driver puts Backup Object
as client for Object Storage
• Problem: doubles network traffic
– OK, compression reduces the
second step.
– But even with 90% compression
it would still be 1.1x just
transferring the data.
•
What we want is to do direct
transfer (3 on the diagram), which
would match other Cinder
backend actions
Backup Target
Cinder
Specific Volume
Manager
2
Swift
Block
Initiator
1
3
Storage Backend
Storage Controller
Block
Target
7. Shorten this - Ongoing Use of Volume with Concurrent
Backup/Replication/Etc.
• Existing Cinder pattern for
volume migration could be
applied:
– Use of override flag can
enable doing long
operations on an attached
volume
– Allows clients to continue
to use the volume while the
backup/replication (or
whatever) is in progress.
Client
Volume
Manager
Cinder
Storage
Controller
Backup
Target
Replicate
Replicate
Snapshot/
Replicate Snapshot/
[ Release Snapshot ]
Snap Write
I/O
Ack
I/O
Snap Write
Ack
8. Agenda
• Optional Cinder Enhancements
–Track Status Independent of the
Volume Status
– Snapshot Replication
– Volume Driver Attributes
• Taskflow Usage
9. Volume Status alone blocks 24/7 Volumes
• The problem is that the Volume Status is set to Backing Up
– Or Migrating, or Replicating, etc.
• Other Cinder actions are blocked by this:
– At most one backup/migration/whatever can be in progress at a time.
– You cannot reassign a volume while it is being backed up.
• Proposed Solution: Use a different Status variable
– Allow Backends to modify the Task state, rather than Volume state.
• Backend must declare itself to be “stateless” for this method.
• Progress is reported via the Task state just as it would have via the
Volume state.
10. Impact of allowing Alternate Status
• First, it is optional
– It allows implementations that can do long-term actions without
restricting access to the Volume to do so.
– Stateful implementations are not required to change their code.
• If taking a snapshot is expensive, you don’t want Cinder using this as a “shortcut”.
• This is safe. No reliance on end user knowing when to override.
• For “stateless” Volume Drivers:
– Cinder understands that launching long term methods (such as backup or
replicate) has no impact on the Volume itself.
– The action is actually being performed on a low-cost snapshot.
11. States of a Taskflow Using a Cinder Volume (such as Backup)
cinder.backup.manager
cinder.backup.flows.backup_volume_flow
from taskflow imports states
....
transitions = FlowObjectsTransitions()
transitions.add_transition(volume,
states.IN_PROGRESS,
“BACKING-UP”)
transitions.add_transition(volume, states.ERROR,
“FAILED”)
transitions.add_transition(volume, states.SUCCESS,
“SUCCESS”)
backup_flow = Flow(“backup_flow_api”, transitions)
https://review.openstack.org/#/c/54590/
In Progress
(Backing-up)
Error
(Failed)
Success
(Success)
12. Agenda
• Optional Cinder Enhancements
– Track Status Independent of the Volume
Status
–Snapshot Replication
– Volume Driver Attributes
• Taskflow Usage
13. Proposed Method: Replicate Snapshot
•
Why?
– Why not?
– For many/most implementations
snapshots can be migrated.
– Certain tasks are simpler with snapshots.
• Snapshots are not volatile.
•
Method on an existing snapshot.
– Specifies a different backend as the target.
• Must be under the same Volume
Driver.
• Snapshot formats are inherently
vendor specific.
– Optionally suppresses incremental
transfers, requiring a full copy from
scratch.
Cinder
Specific Volume
Manager
Replicate Snapshot X to Backend Y
Storage Backend
Storage Backend Y
Storage Controller
Storage Controller
Snapshot
X
14. Replicating Snapshots differs from Volume Migration
• Replicates snapshot rather than a volume.
• Original snapshot is not deleted.
• Volume Drivers may use incremental transfer techniques.
– Such as ZFS incremental snapshots.
• Snapshots have vendor specific formats
– So method to replicate them is inherently vendor specific.
– This allows for vendor specific optimization beyond incremental
snapshots:
• Compression.
• Multi-path transfer.
15. Periodic Incremental Snapshots approaches
Continuous Data Replication
• Replicate snapshot can provide Continuous Data
Replication if
– The Volume Driver supports incremental snapshots.
– The snapshots are performed quickly enough.
– Old snapshots are cleaned up automatically.
• Difference between “snapshots” and “remote
mirroring” is more a matter of degree than a
fundamental difference.
16. Benefits of Snapshot Replication
• Several tasks where Snapshot Replication helps
– “Warm Standby” – pool of server synchronized at snapshot
frequency.
– Enhanced deployment of VM boot images from a common master.
– Disaster Recovery.
– “Backup” to other servers.
– Volume migration.
– Check-in/Check-out of Volumes from a central storage server as
VM is deployed.
17. Replicated Snapshots are versatile
1. Restore a volume from a
Snapshot where the snapshot
was replicated.
– Fast restore of a volume, but not
at the optimum location.
• Or:
1. Replicate the Snapshot to a
preferred location
2. And clone it there.
Storage Backend holding
Snapshot
Snapshot
V.s3
Volume V
1
2
Preferred Location
Snapshot
V.s3
3
Volume V
18. Other Issues
• Where does Storage Backend come from?
– At least two methods:
• From a backend_id in the DB, as suggested in Avishay’s Volume
Mirroring proposal.
• By querying the Volume Driver for a list of backends that it controls.
• Volume Driver and/or Backend is responsible for tracking
dependencies created by any incremental snapshot feature.
– The delta snapshot must be made a full snapshot before the
referenced prior snapshot can be deleted on a given server.
19. Agenda
• Optional Cinder Enhancements
– Track Status Independent of the Volume
Status
– Snapshot Replication
–Volume Driver Attributes
• Taskflow Usage
20. Why Volume Driver Attributes
• We do not want to mandate that all snapshots be stateless.
– It’s relatively easy for copy-on-write systems, but not everyone is
copy-on-write.
• My philosophy for building consensus on open-source and standards:
– They should be flexible enough to allow my competition to be
stupid.
– Especially since they think what I’m doing is stupid.
• Volume Driver Attributes let vendor-neutral code decide what will
work well and what will not.
– Taking a snapshot does not optimize replication if it requires
making a copy of the data before making a copy of the data.
21. Proposed Attributes: Volume Driver Capabilities
• Problem: how to optimize long (bulk data intensive) operations of Cinder
volumes.
– Vendor specific algorithms are needed.
– But do we want to require every task be implemented by each vendor.
• Proposal: Have each Volume Driver advertise when they have certain
optional capabilities.
– If the capability is advertised, vendor independent taskflow code can take
advantage of it.
– One method can be useful for many taskflows.
• Publication of these attributed is optional
– If you don’t do X you don’t have to do anything to say you don’t do X.
– If you have no optional capabilities then you don’t have to say anything.
22. Suggested Implementation for Volume Driver attributes
• Suggestion use python
capabilities
• Included in source code of
Volume Driver
– Already used by some
Volume Drivers
• Easily referenced in code
• https://review.openstack.org
/#/c/54803/
• https://blueprints.launchpad.
net/cinder/+spec/backendactivity
cinder.volume.drivers.storwize_svc:
from cinder.volume import capabilities
class StorwizeSVCDriver(san.SanDriver):
...
@capabilities.storage_assist
def migrate_volume(self, ctxt, volume, host):
....
cinder.volume.manager:
from cinder import capabilities
class VolumeManager(manager.SchedulerDependentManager):
...
@utils.require_driver_initialized
def migrate_volume(self, ctxt, volume_id, host, force_host_copy=False):
...
if capabilities.is_supported(self.driver.migrate_volume, 'storage_assist'):
# Then check that destination host is the same backend.
elif capabilities.is_supported(self.driver.migrate_volume, 'local'):
# Then check that destination host is the same host.
...
23. Agenda
• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
–
–
–
–
Optimized provisioning of master image
Live Migration with Incremental Snapshots
Apply policy for Snapshot Retention and Replication
Check-in/Checkout Volume from Central Repository
24. Warm Standby – Before the Failover
1. Snapshot Volume V
2. Fully Replicate to new
Backend.
Periodically/Contiously:
3. Take new snapshot.
4. Transfer incremental
snapshot to standby
Backend.
5. Apply incremental
snapshot to make
new full image
versioned snapshot.
Backend currently
hosting Volume V
Backend selected
as standby for
Volume V
Volume V
1
2
Snapshot
V.s1
Snapshot
V.s1
Backend currently
hosting Volume V
Volume V
3
Snapshot
V.s2
4
Backend selected
as standby for
Volume V
Snapshot 5
V.s2
25. Failing Over to the Warm Standby
1. Current host Fails
2. Clone new Volume V from
Snapshot.
3. Select new standby target
and repeat prior slide.
Backend currently
hosting Volume V
Volume V
1
Backend selected
as standby for
Volume V
Snapshot
V.s1
2
Volume V
26. Adjacent to proposed Volume Mirroring solution
• Not fully overlapping, but frequently taken snapshots replicated
incrementally begins to resemble Volume Mirroring.
– It cannot match Volume Mirroring with near-instant relay of
transactions.
– But it consumes a lot less of network resources, especially
peak network resources.
– It is more flexible operationally. There is no need to setup
one-to-one mirror relationships in Cinder.
• We can offer both solutions and let end users decide which is
best for their needs.
27. Agenda
• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
–Optimized provisioning of master image
– Live Migration with Incremental Snapshots
– Apply policy for Snapshot Retention and Replication
– Check-in/Checkout Volume from Central Repository
28. Proposed New Taskflow:
Provision boot volume with minimal overhead
• Optimize for common boot
images that provide a
common OS for many
VMs.
• Creating two VMs from the
same template should not
require 2x the bandwidth.
Glance
Volume Template VT
Cinder Backend
Volume V1 based on
Template VT
Volume V2 based on
Template VT
29. Snapshot optimized Image Provisioning - 1
1. Use Glance to create
reference image
– Already adapted for a
specific deployment
format.
2. Take snapshot of that
volume.
3. Clone additional targets
from that snapshot.
4. Repeat as more VMs from
the same template are
launched.
Glance
Volume Template VT
Cinder Backend
Volume V1 based on
Template VT
2
Volume V2 based on
Template VT
3
Snapshot V-Prime based
on initial Volume V1
Volume V2 based on
Template VT
4
30. Agenda
• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
– Optimized provisioning of master image
– Live Migration with Incremental Snapshots
– Apply policy for Snapshot Retention and Replication
– Check-in/Checkout Volume from Central Repository
31. Add live migration using incremental snapshots
• This is essentially how Hypervisors Live Migrate VMs
– Loop
• Make incremental snapshot
• If empty
– Break
• Send incremental snapshot to destination
– De-activate source volume
– Clone volume at destination from snapshots.
32. Agenda
• Optional Cinder Enhancements
• Taskflow Usage
– Maintain Volume Pool / “Warm Standby”
– Optimized provisioning of master image
– Live Migration with Incremental Snapshots
– Apply policy for Snapshot Retention and
Replication
– Check-in/Checkout Volume
33. Possible Taskflow: Manage retention/replication of snapshots
• Set a policy for retention of snapshots
– Frequency for taking snapshots.
– Which snapshots to retain.
• Automatically replicate some snapshots to other
backend targets.
• Backup some to Object storage
34. Agenda
• Optional Cinder Enhancements
• Taskflow Usage
–
–
–
–
Maintain Volume Pool / “Warm Standby”
Optimized provisioning of master image
Live Migration with Incremental Snapshots
Apply policy for Snapshot Retention and Replication
– Check-in/Checkout Volume
35. Possible taskflow: Check-out and Check-in of Volumes
• Use Case: Persistent disk images for intermittent computer jobs.
– Example: non-continuous compute job needs disk image near
it whenever it is launched.
– Example: VDI desktop needs access to persistent disk image.
• This is especially useful when this is a thin image that relies on the
central image for blocks not altered or referenced yet.
– Periodically snapshot and post delta to the central repository.
– Check-in when done with final snapshot.
– Then delete the remote volume and change status in central
repository to allow a new check-out.
36. Steps for Check-out/Check-In.
1.
2.
Snapshot the Volume being checked out
Replicate the Snapshot to a HostAdjacent (or co-located backend)
3. De-activate the Volume on the Master.
4. Clone the Volume on the host-adjacent
backend.
5. Periodically snapshot the Volume on the
host-adjacent backend.
6. Replicate those Snapshots to the Master
storage backend
7. Snapshot a final time on the Storage
Backend.
8. Replicate to the Master
9. Remove volume on the host-adjacent
backend.
10. Clone new volume on the master
backend from the final snapshot
Host-Adjacent
Storage
Backend
Volume V
4 9
Snapshot
V.s3
2
Master Storage
Backend
3 10
Snapshot
V.s1
Volume V
Snapsho
V.s2
Snapshot
V.s4
5
Snapshot
V.s3
1
Snapshot
V.s5
7
Snapshot
V.s4
6
Snapshot
V.s5
8
37. Summary
• Taskflow can automate several Cinder releated tasks
– This logic can be vendor neutral
• But to do so efficiently it needs a handful of Cinder enhancements
•
– Optional separation from Volume Status for long-running activities.
– Snapshot Replication.
– Volume Driver Attributes.
Wiki: https://wiki.openstack.org/wiki/CERSS
• Questions?:
– Caitlin.Bestler@nexenta.com, irc:caitlin56.
– Victor.Rodionov@nexenta.com, irc:vito-ordaz.