Audience Level
Intermediate
Synopsis
Ceph – the most popular storage solution for OpenStack – stores all data as a collection of objects. This object store was originally implemented on top of a POSIX filesystem, an approach that turned out to have a number of problems, notably with performance and complexity.
BlueStore, a new storage backend for Ceph, was created to solve these issues; the Ceph Jewel release included an early prototype. The code and on-disk format were declared stable (but experimental) for Ceph Kraken, and now in the upcoming Ceph Luminous release, BlueStore will be the recommended default storage backend.
With a 2-3x performance boost, you’ll want to look at migrating your Ceph clusters to BlueStore. This talk goes into detail about what BlueStore does, the problems it solves, and what you need to do to use it.
Speaker Bio:
Tim works for SUSE, hacking on Ceph and related technologies. He has spoken often about distributed storage and high availability at conferences such as linux.conf.au. In his spare time he wrangles pigs, chickens, sheep and ducks, and was declared by one colleague “teammate most likely to survive the zombie apocalypse”.
3. 3
Ceph Provides
●
A resilient, scale-out storage cluster
●
On commodity hardware
●
No bottlenecks
●
No single points of failure
●
Three interfaces
●
Object (radosgw)
●
Block (rbd)
●
Distributed Filesystem (cephfs)
4. 4
Internally, Regardless of Interface
●
All data stored as “objects”
●
Aggregated into Placement Groups
●
Objects have:
●
Data (byte stream)
●
Attributes (key/value pairs)
●
Objects live on Object Storage Devices
●
Managed by Ceph Object Storage Daemons
●
An OSD is a disk, in a server
12. 12
BlueStore = Block + NewStore
●
Raw block devices
●
RocksDB key/value store for metadata
●
Data written direct to block device
●
2-3x performance boost over FileStore
15. 15
Availability
●
Early prototype in Ceph Jewel (April 2016)
●
Stable on-disk format in Ceph Kraken (January 2017)
●
Recommended default in Ceph Luminous (RSN)
●
Can co-exist with FileStore
17. 17
Several Approaches
●
Fail in place
●
Fail FileStore OSD
●
Create new BlueStore OSD on same device
●
Disk-wise replacement
●
Create new BlueStore OSD on spare disk
●
Stop FileStore OSD on same host
●
Host-wise replacement
●
Provision entire new host with BlueStore OSDs
●
Swap into old host’s CRUSH position
●
Evacuate and rebuild in place
●
Evacuate FileStore OSD, fail when empty
●
Create new BlueStore OSD on same device
18. 18
Several Approaches
●
Fail in place → period of reduced redundancy
●
Fail FileStore OSD
●
Create new BlueStore OSD on same device
●
Disk-wise replacement → reduced online redundancy, requires extra drive slot
●
Create new BlueStore OSD on spare disk
●
Stop FileStore OSD on same host
●
Host-wise replacement → no reduced redundancy, requires spare host
●
Provision entire new host with BlueStore OSDs
●
Swap into old host’s CRUSH position
●
Evacuate and rebuild in place → no reduced redundancy, requires free disk space
●
Evacuate FileStore OSD, fail when empty
●
Create new BlueStore OSD on same device