SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Downloaden Sie, um offline zu lesen
RECOVERY, ERASURE CODING, AND CACHE TIERING
SAMUEL JUST – 2015 CEPH DAY SHANGHAI
RADOS
3
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
management
APP HOST/VM CLIENT
4
RADOS
● Flat object namespace within each pool
● Strong consistency (CP system)
● Infrastructure aware, dynamic topology
● Hash-based placement (CRUSH)
● Direct client to server data path
5
LIBRADOS Interface
● Rich object API
– Bytes, attributes, key/value data
– Partial overwrite of existing data
– Single-object compound atomic operations
– RADOS classes (stored procedures)
6
RADOS COMPONENTS
OSDs:
 10s to 1000s in a cluster
 One per disk (or one per SSD, RAID
group…)
 Serve stored objects to clients
 Intelligently peer for replication & recovery
7
RADOS COMPONENTS
Monitors:
 Maintain cluster membership and state
 Provide consensus for distributed decision-
making
 Small, odd number (e.g., 5)
 Not part of data path
M
DATA PLACEMENT
9
WHERE DO OBJECTS LIVE?
??
APPLICATION
M
M
M
OBJECT
10
A METADATA SERVER?
1
APPLICATION
M
M
M
2
11
CALCULATED PLACEMENT
FAPPLICATION
M
M
M
A-G
H-N
O-T
U-Z
12
CRUSH
CLUSTER
OBJECTS
10
01
01
10
10
01
11
01
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
PLACEMENT GROUPS
(PGs)
13
CRUSH IS A QUICK CALCULATION
RADOS CLUSTER
OBJECT
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
14
CRUSH IS A QUICK CALCULATION
RADOS CLUSTER
OBJECT
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
15
CRUSH AVOIDS FAILED DEVICES
RADOS CLUSTER
OBJECT
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
10
16
CRUSH: DECLUSTERED PLACEMENT
RADOS CLUSTER
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
16●
Each PG independently maps to a
pseudorandom set of OSDs
●
PGs that map to the same OSD
generally have replicas that do not
●
When an OSD fails, each PG it stored
will generally be re-replicated by a
different OSD
– Highly parallel recovery
17
CRUSH: DYNAMIC DATA PLACEMENT
CRUSH:
 Pseudo-random placement algorithm
 Fast calculation, no lookup
 Repeatable, deterministic
 Statistically uniform distribution
 Stable mapping
 Limited data migration on change
 Rule-based configuration
 Infrastructure topology aware
 Adjustable replication
 Weighting
18
DATA IS ORGANIZED INTO POOLS
CLUSTER
OBJECTS
10
01
01
10
10
01 11
01
1001
0110 10 01
11
01
POOLS
(CONTAINING PGs)
10
01
11
01
10
01
01
10
01
10
10
01
11
01
10
01
10 01 10 11
01
11
01
10
10
01
01
01
10
10
01
01
POOL
A
POOL
B
POOL
C
POOL
DOBJECTS
OBJECTS
OBJECTS
Peering and Recovery
20
Peering
● Each OSDMap is numbered with an epoch number
● The Monitors and OSDs store a history of OSDMaps
● Using this history, an OSD which becomes a new
member of a PG can deduce every OSD which could
have received a write which it needs to know about
● The process of discovering the authoritative state of
the objects stored in the PG by contacting old PG
members is called Peering
21
Peering
11Epoch 20220:
22
Peering
11Epoch 20220: 305
23
Peering
11Epoch 20220: 305
Epoch 20113: 305
11Epoch 19884: 305
30
24
Recovery, Backfill, and PG Temp
● The state of a PG on an OSD is represented by a PG
Log which contains the most recent operations
witnessed by that OSD.
● The authoritative state of a PG after Peering is
represented by constructing an authoritative PG Log
from an up-to-date peer.
● If a peer's PG Log overlaps the authoritative PG
Log, we can construct a set of out-of-date objects to
recover
● Otherwise, we do not know which objects are out-of-
date, and we must perform Backfill
25
Recovery, Backfill, and PG Temp
● If we do not know which objects are invalid, we
certainly cannot serve reads from such a peer
● If after peering the primary determines that it or any
other peer requires Backfill, it will request that the
Monitor cluster publish a new map with an exception
to the CRUSH mapping for this PG mapping it to the
best set of up-to-date peers that it can find.
● Once that map is published, peering will happen
again, and the up-to-date peers will independently
conclude that they should serve reads and writes
while concurrently backfilling the correct peers
26
Backfill Example
0Epoch 1130: 21
27
Backfill Example
0Epoch 1130: 21
3Epoch 1340: 54
28
Backfill Example
0Epoch 1130: 21
3Epoch 1340: 54
0Epoch 1345: 21
29
Backfill Example
0Epoch 1130: 21
3Epoch 1340: 54
0Epoch 1345: 21
3Epoch 1391: 54
30
Backfill
● Backfill in object order (basically hash order) within
the pg
● info.last_backfill
● Obj o <= info.last_backfill → object is up to date
● Obj o > info.last_backfill → object is not up to date
TIERED STORAGE
32
TWO WAYS TO CACHE
● Within each OSD
– Combine SSD and HDD under each OSD
– Make localized promote/demote decisions
– Leverage existing tools
● dm-cache, bcache, FlashCache
● Variety of caching controllers
– We can help with hints
● Cache on separate devices/nodes
– Different hardware for different tiers
● Slow nodes for cold data
● High performance nodes for hot data
– Add, remove, scale each tier independently
● Unlikely to choose right ratios at procurement time
HDD
OSD
BLOCKDEV
SSD
FS
33
TIERED STORAGE
APPLICATION
CACHE POOL (REPLICATED)
BACKING POOL (ERASURE CODED)
CEPH STORAGE CLUSTER
34
RADOS TIERING PRINCIPLES
● Each tier is a RADOS pool
– May be replicated or erasure coded
● Tiers are durable
– e.g., replicate across SSDs in multiple hosts
● Each tier has its own CRUSH policy
– e.g., map cache pool to SSDs devices/hosts only
● librados adapts to tiering topology
– Transparently direct requests accordingly
● e.g., to cache
– No changes to RBD, RGW, CephFS, etc.
35
WRITE INTO CACHE POOL
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
WRITE ACK
36
WRITE (MISS)
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
WRITE ACK
PROMOTE
37
WRITE (MISS) PROXY!
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
WRITE ACK
PROXY WRITE PROMOTE?
38
READ (CACHE HIT)
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
READ READ REPLY
39
READ (CACHE MISS)
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
READ READ REPLYREDIRECT READ
40
READ (CACHE MISS) PROXY!
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
READ READ REPLY
PROXY READ
41
READ (CACHE MISS)
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
READ READ REPLY
PROXY READ PROMOTE
42
CACHE TIERING ARCHITECTURE
● Cache and backing pools are otherwise independent
rados pools with independent placement rules
● OSDs in the cache pool are able to handle promotion
and eviction of their objects independently allowing
for scalability.
● RADOS clients understand the tiering configuration
and are able to send requests to the right place
mostly without redirects.
● Librados users can perform operations on a cache
pool transparently trusting the library to correctly
handle routing requests between the cache pool and
base pool as needed
43
ESTIMATING TEMPERATURE
● Each PG constructs in-memory bloom filters
– Insert records on both read and write
– Each filter covers configurable period (e.g., 1 hour)
– Tunable false positive probability (e.g., 5%)
– Maintain most recent N filters on disk
● Estimate temperature
– Has object been accessed in any of the last N periods?
– ...in how many of them?
– Informs flush/evict decision
● Estimate “recency”
– How many periods since the object hasn't been accessed?
– Informs read miss behavior: promote vs redirect
44
FLUSH AND/OR EVICT
COLD DATA
CEPH CLIENT
CACHE POOL (SSD): WRITEBACK
BACKING POOL (HDD)
CEPH STORAGE CLUSTER
FLUSH ACK EVICT
45
TIERING AGENT
● Each PG has an internal tiering agent
– Manages PG based on administrator defined policy
● Flush dirty objects
– When pool reaches target dirty ratio
– Tries to select cold objects
– Marks objects clean when they have been written back
to the base pool
● Evict clean objects
– Greater “effort” as pool/PG size approaches target size
46
CACHE TIER USAGE
● Cache tier should be faster than the base tier
● Cache tier should be replicated (not erasure coded)
● Promote and flush are expensive
– Best results when object temperature are skewed
● Most I/O goes to small number of hot objects
– Cache should be big enough to capture most of the
acting set
● Challenging to benchmark
– Need a realistic workload (e.g., not 'dd') to determine
how it will perform in practice
– Takes a long time to “warm up” the cache
ERASURE CODING
48
ERASURE CODING
OBJECT
REPLICATED POOL
CEPH STORAGE CLUSTER
ERASURE CODED POOL
CEPH STORAGE CLUSTER
COPY COPY
OBJECT
31 2 X Y
COPY
4
Full copies of stored objects
 Very high durability
 3x (200% overhead)
 Quicker recovery
One copy plus parity
 Cost-effective durability
 1.5x (50% overhead)
 Expensive recovery
49
ERASURE CODING SHARDS
CEPH STORAGE CLUSTER
OBJECT
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
50
ERASURE CODING SHARDS
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
0
4
8
12
16
1
5
9
13
17
2
6
10
14
18
3
7
9
15
19
A
B
C
D
E
A'
B'
C'
D'
E'
●
Variable stripe size (e.g., 4 KB)
●
Zero-fill shards (logically) in partial tail stripe
51
EC READ
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
READ
52
EC READ
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
READ
READS
53
EC READ
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
READ REPLY
54
EC WRITE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
WRITE
55
EC WRITE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
WRITE
WRITES
56
EC WRITE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
WRITE ACK
57
EC WRITE: DEGRADED
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
WRITE
WRITES
58
EC WRITE: PARTIAL FAILURE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
WRITE
WRITES
59
EC WRITE: PARTIAL FAILURE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
B B BA A A
60
EC RESTRICTIONS
● Overwrite in place will not work in general
● Log and 2PC would increase complexity, latency
● We chose to restrict allowed operations
– create
– append (on stripe boundary)
– remove (keep previous generation of object for some time)
● These operations can all easily be rolled back locally
– create → delete
– append → truncate
– remove → roll back to previous generation
● Object attrs preserved in existing PG logs (they are small)
● Key/value data is not allowed on EC pools
61
EC WRITE: PARTIAL FAILURE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
B B BA A A
62
EC WRITE: PARTIAL FAILURE
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
CEPH CLIENT
A A AA A A
63
EC RESTRICTIONS
● This is a small subset of allowed librados operations
– Notably cannot (over)write any extent
● Coincidentally, these operations are also inefficient for
erasure codes
– Generally require read/modify/write of affected stripe(s)
● Some applications can consume EC directly
– RGW (no object data update in place)
● Others can combine EC with a cache tier (RBD, CephFS)
– Replication for warm/hot data
– Erasure coding for cold data
– Tiering agent skips objects with key/value data
64
WHICH ERASURE CODE?
● The EC algorithm and implementation are pluggable
– jerasure (free, open, and very fast)
– ISA-L (Intel library; optimized for modern Intel procs)
– LRC (local recovery code – layers over existing plugins)
● Parameterized
– Pick k or m, stripe size
● OSD handles data path, placement, rollback, etc.
● Plugin handles
– Encode and decode
– Given these available shards, which ones should I fetch to
satisfy a read?
– Given these available shards and these missing shards, which
ones should I fetch to recover?
65
COST OF RECOVERY
1 TB OSD
66
COST OF RECOVERY
1 TB OSD
67
COST OF RECOVERY (REPLICATION)
1 TB OSD
1 TB
68
COST OF RECOVERY (REPLICATION)
1 TB OSD
.01 TB
.01 TB
.01 TB
.01 TB
...
...
.01 TB .01 TB
69
COST OF RECOVERY (REPLICATION)
1 TB OSD
1 TB
70
COST OF RECOVERY (EC)
1 TB OSD
1 TB
1 TB
1 TB
1 TB
71
LOCAL RECOVERY CODE (LRC)
CEPH STORAGE CLUSTER
Y
OSD
3
OSD
2
OSD
1
OSD
4
OSD
X
OSD
ERASURE CODED POOL
A
OSD
C
OSD
B
OSD
OBJECT
72
BIG THANKS TO
● Ceph
– Loic Dachary (CloudWatt, FSF France, Red Hat)
– Andreas Peters (CERN)
– David Zafman (Inktank / Red Hat)
● jerasure / gf-complete
– Jim Plank (University of Tennessee)
– Kevin Greenan (Box.com)
● Intel (ISL plugin)
● Fujitsu (SHEC plugin)
ROADMAP
74
WHAT'S NEXT
● Erasure coding
– Allow (optimistic) client reads directly from shards
– ARM optimizations for jerasure
● Cache pools
– Better agent decisions (when to flush or evict)
– Supporting different performance profiles
● e.g., slow / “cheap” flash can read just as fast
– Complex topologies
● Multiple readonly cache tiers in multiple sites
● Tiering
– Support “redirects” to (very) cold tier below base pool
– Enable dynamic spin-down, dedup, and other features
75
OTHER ONGOING WORK
● Performance optimization (SanDisk, Intel, Mellanox)
● Alternative OSD backends
– New backend: hybrid key/value and file system
– leveldb, rocksdb, LMDB
● Messenger (network layer) improvements
– RDMA support (libxio – Mellanox)
– Event-driven TCP implementation (UnitedStack)
76
FOR MORE INFORMATION
● http://ceph.com
● http://github.com/ceph
● http://tracker.ceph.com
● Mailing lists
– ceph-users@ceph.com
– ceph-devel@vger.kernel.org
● irc.oftc.net
– #ceph
– #ceph-devel
● Twitter
– @ceph
THANK YOU!
Samuel Just
sjust@redhat.com

Weitere ähnliche Inhalte

Was ist angesagt?

Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph Community
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Community
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Community
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...inwin stack
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Community
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Community
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Community
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Community
 

Was ist angesagt? (20)

Ceph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance BarriersCeph on All Flash Storage -- Breaking Performance Barriers
Ceph on All Flash Storage -- Breaking Performance Barriers
 
Ceph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash StorageCeph Day Tokyo -- Ceph on All-Flash Storage
Ceph Day Tokyo -- Ceph on All-Flash Storage
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
 
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
Intel - optimizing ceph performance by leveraging intel® optane™ and 3 d nand...
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Day Taipei - Accelerate Ceph via SPDK
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
 

Andere mochten auch

Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Community
 
Ceph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Community
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Community
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Community
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Community
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph Ceph Community
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Community
 
Ceph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Community
 
Ceph Day Tokyo - High Performance Layered Architecture
Ceph Day Tokyo - High Performance Layered Architecture  Ceph Day Tokyo - High Performance Layered Architecture
Ceph Day Tokyo - High Performance Layered Architecture Ceph Community
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Community
 
Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Community
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community
 
Ceph Day Tokyo - Ceph Community Update
Ceph Day Tokyo - Ceph Community Update Ceph Day Tokyo - Ceph Community Update
Ceph Day Tokyo - Ceph Community Update Ceph Community
 
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient Ceph Community
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 

Andere mochten auch (20)

Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 
Ceph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in CtripCeph Day Shanghai - Ceph in Ctrip
Ceph Day Shanghai - Ceph in Ctrip
 
librados
libradoslibrados
librados
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
 
Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage Ceph Day Seoul - Ceph on All-Flash Storage
Ceph Day Seoul - Ceph on All-Flash Storage
 
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture Ceph Day Taipei - Ceph Tiering with High Performance Architecture
Ceph Day Taipei - Ceph Tiering with High Performance Architecture
 
Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools Ceph Day Shanghai - Ceph Performance Tools
Ceph Day Shanghai - Ceph Performance Tools
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
 
Ceph Day Seoul - Community Update
Ceph Day Seoul - Community UpdateCeph Day Seoul - Community Update
Ceph Day Seoul - Community Update
 
Ceph Day Tokyo - High Performance Layered Architecture
Ceph Day Tokyo - High Performance Layered Architecture  Ceph Day Tokyo - High Performance Layered Architecture
Ceph Day Tokyo - High Performance Layered Architecture
 
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
Ceph Day Tokyo - Bit-Isle's 3 years footprint with Ceph
 
Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise Ceph Day Tokyo - Bring Ceph to Enterprise
Ceph Day Tokyo - Bring Ceph to Enterprise
 
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
 
Ceph Day Tokyo - Ceph Community Update
Ceph Day Tokyo - Ceph Community Update Ceph Day Tokyo - Ceph Community Update
Ceph Day Tokyo - Ceph Community Update
 
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient
Ceph Day Tokyo - Ceph on ARM: Scaleable and Efficient
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Ähnlich wie Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering

Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for youTaco Scargo
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCeph Community
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and BeyondSage Weil
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to CephCeph Community
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterPatrick Quairoli
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonSage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Sage Weil
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephSage Weil
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Community
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterEttore Simone
 
Jaspreet webinar-cns
Jaspreet webinar-cnsJaspreet webinar-cns
Jaspreet webinar-cnsJaspreet Kaur
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turkbuildacloud
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemNETWAYS
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community
 

Ähnlich wie Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering (20)

Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
20171101 taco scargo luminous is out, what's in it for you
20171101 taco scargo   luminous is out, what's in it for you20171101 taco scargo   luminous is out, what's in it for you
20171101 taco scargo luminous is out, what's in it for you
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
What's new in Luminous and Beyond
What's new in Luminous and BeyondWhat's new in Luminous and Beyond
What's new in Luminous and Beyond
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Community Update at OpenStack Summit Boston
Community Update at OpenStack Summit BostonCommunity Update at OpenStack Summit Boston
Community Update at OpenStack Summit Boston
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)Distributed Storage and Compute With Ceph's librados (Vault 2015)
Distributed Storage and Compute With Ceph's librados (Vault 2015)
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
Ceph Tech Talk: Bluestore
Ceph Tech Talk: BluestoreCeph Tech Talk: Bluestore
Ceph Tech Talk: Bluestore
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
The Hive Think Tank: Ceph + RocksDB by Sage Weil, Red Hat.
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Jaspreet webinar-cns
Jaspreet webinar-cnsJaspreet webinar-cns
Jaspreet webinar-cns
 
Ceph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross TurkCeph Intro and Architectural Overview by Ross Turk
Ceph Intro and Architectural Overview by Ross Turk
 
OSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage SystemOSDC 2015: John Spray | The Ceph Storage System
OSDC 2015: John Spray | The Ceph Storage System
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Ceph Day Shanghai - Recovery Erasure Coding and Cache Tiering

  • 1. RECOVERY, ERASURE CODING, AND CACHE TIERING SAMUEL JUST – 2015 CEPH DAY SHANGHAI
  • 3. 3 ARCHITECTURAL COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata management APP HOST/VM CLIENT
  • 4. 4 RADOS ● Flat object namespace within each pool ● Strong consistency (CP system) ● Infrastructure aware, dynamic topology ● Hash-based placement (CRUSH) ● Direct client to server data path
  • 5. 5 LIBRADOS Interface ● Rich object API – Bytes, attributes, key/value data – Partial overwrite of existing data – Single-object compound atomic operations – RADOS classes (stored procedures)
  • 6. 6 RADOS COMPONENTS OSDs:  10s to 1000s in a cluster  One per disk (or one per SSD, RAID group…)  Serve stored objects to clients  Intelligently peer for replication & recovery
  • 7. 7 RADOS COMPONENTS Monitors:  Maintain cluster membership and state  Provide consensus for distributed decision- making  Small, odd number (e.g., 5)  Not part of data path M
  • 9. 9 WHERE DO OBJECTS LIVE? ?? APPLICATION M M M OBJECT
  • 13. 13 CRUSH IS A QUICK CALCULATION RADOS CLUSTER OBJECT 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01
  • 14. 14 CRUSH IS A QUICK CALCULATION RADOS CLUSTER OBJECT 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01
  • 15. 15 CRUSH AVOIDS FAILED DEVICES RADOS CLUSTER OBJECT 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01 10
  • 16. 16 CRUSH: DECLUSTERED PLACEMENT RADOS CLUSTER 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01 16● Each PG independently maps to a pseudorandom set of OSDs ● PGs that map to the same OSD generally have replicas that do not ● When an OSD fails, each PG it stored will generally be re-replicated by a different OSD – Highly parallel recovery
  • 17. 17 CRUSH: DYNAMIC DATA PLACEMENT CRUSH:  Pseudo-random placement algorithm  Fast calculation, no lookup  Repeatable, deterministic  Statistically uniform distribution  Stable mapping  Limited data migration on change  Rule-based configuration  Infrastructure topology aware  Adjustable replication  Weighting
  • 18. 18 DATA IS ORGANIZED INTO POOLS CLUSTER OBJECTS 10 01 01 10 10 01 11 01 1001 0110 10 01 11 01 POOLS (CONTAINING PGs) 10 01 11 01 10 01 01 10 01 10 10 01 11 01 10 01 10 01 10 11 01 11 01 10 10 01 01 01 10 10 01 01 POOL A POOL B POOL C POOL DOBJECTS OBJECTS OBJECTS
  • 20. 20 Peering ● Each OSDMap is numbered with an epoch number ● The Monitors and OSDs store a history of OSDMaps ● Using this history, an OSD which becomes a new member of a PG can deduce every OSD which could have received a write which it needs to know about ● The process of discovering the authoritative state of the objects stored in the PG by contacting old PG members is called Peering
  • 23. 23 Peering 11Epoch 20220: 305 Epoch 20113: 305 11Epoch 19884: 305 30
  • 24. 24 Recovery, Backfill, and PG Temp ● The state of a PG on an OSD is represented by a PG Log which contains the most recent operations witnessed by that OSD. ● The authoritative state of a PG after Peering is represented by constructing an authoritative PG Log from an up-to-date peer. ● If a peer's PG Log overlaps the authoritative PG Log, we can construct a set of out-of-date objects to recover ● Otherwise, we do not know which objects are out-of- date, and we must perform Backfill
  • 25. 25 Recovery, Backfill, and PG Temp ● If we do not know which objects are invalid, we certainly cannot serve reads from such a peer ● If after peering the primary determines that it or any other peer requires Backfill, it will request that the Monitor cluster publish a new map with an exception to the CRUSH mapping for this PG mapping it to the best set of up-to-date peers that it can find. ● Once that map is published, peering will happen again, and the up-to-date peers will independently conclude that they should serve reads and writes while concurrently backfilling the correct peers
  • 27. 27 Backfill Example 0Epoch 1130: 21 3Epoch 1340: 54
  • 28. 28 Backfill Example 0Epoch 1130: 21 3Epoch 1340: 54 0Epoch 1345: 21
  • 29. 29 Backfill Example 0Epoch 1130: 21 3Epoch 1340: 54 0Epoch 1345: 21 3Epoch 1391: 54
  • 30. 30 Backfill ● Backfill in object order (basically hash order) within the pg ● info.last_backfill ● Obj o <= info.last_backfill → object is up to date ● Obj o > info.last_backfill → object is not up to date
  • 32. 32 TWO WAYS TO CACHE ● Within each OSD – Combine SSD and HDD under each OSD – Make localized promote/demote decisions – Leverage existing tools ● dm-cache, bcache, FlashCache ● Variety of caching controllers – We can help with hints ● Cache on separate devices/nodes – Different hardware for different tiers ● Slow nodes for cold data ● High performance nodes for hot data – Add, remove, scale each tier independently ● Unlikely to choose right ratios at procurement time HDD OSD BLOCKDEV SSD FS
  • 33. 33 TIERED STORAGE APPLICATION CACHE POOL (REPLICATED) BACKING POOL (ERASURE CODED) CEPH STORAGE CLUSTER
  • 34. 34 RADOS TIERING PRINCIPLES ● Each tier is a RADOS pool – May be replicated or erasure coded ● Tiers are durable – e.g., replicate across SSDs in multiple hosts ● Each tier has its own CRUSH policy – e.g., map cache pool to SSDs devices/hosts only ● librados adapts to tiering topology – Transparently direct requests accordingly ● e.g., to cache – No changes to RBD, RGW, CephFS, etc.
  • 35. 35 WRITE INTO CACHE POOL CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER WRITE ACK
  • 36. 36 WRITE (MISS) CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER WRITE ACK PROMOTE
  • 37. 37 WRITE (MISS) PROXY! CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER WRITE ACK PROXY WRITE PROMOTE?
  • 38. 38 READ (CACHE HIT) CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER READ READ REPLY
  • 39. 39 READ (CACHE MISS) CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER READ READ REPLYREDIRECT READ
  • 40. 40 READ (CACHE MISS) PROXY! CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER READ READ REPLY PROXY READ
  • 41. 41 READ (CACHE MISS) CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER READ READ REPLY PROXY READ PROMOTE
  • 42. 42 CACHE TIERING ARCHITECTURE ● Cache and backing pools are otherwise independent rados pools with independent placement rules ● OSDs in the cache pool are able to handle promotion and eviction of their objects independently allowing for scalability. ● RADOS clients understand the tiering configuration and are able to send requests to the right place mostly without redirects. ● Librados users can perform operations on a cache pool transparently trusting the library to correctly handle routing requests between the cache pool and base pool as needed
  • 43. 43 ESTIMATING TEMPERATURE ● Each PG constructs in-memory bloom filters – Insert records on both read and write – Each filter covers configurable period (e.g., 1 hour) – Tunable false positive probability (e.g., 5%) – Maintain most recent N filters on disk ● Estimate temperature – Has object been accessed in any of the last N periods? – ...in how many of them? – Informs flush/evict decision ● Estimate “recency” – How many periods since the object hasn't been accessed? – Informs read miss behavior: promote vs redirect
  • 44. 44 FLUSH AND/OR EVICT COLD DATA CEPH CLIENT CACHE POOL (SSD): WRITEBACK BACKING POOL (HDD) CEPH STORAGE CLUSTER FLUSH ACK EVICT
  • 45. 45 TIERING AGENT ● Each PG has an internal tiering agent – Manages PG based on administrator defined policy ● Flush dirty objects – When pool reaches target dirty ratio – Tries to select cold objects – Marks objects clean when they have been written back to the base pool ● Evict clean objects – Greater “effort” as pool/PG size approaches target size
  • 46. 46 CACHE TIER USAGE ● Cache tier should be faster than the base tier ● Cache tier should be replicated (not erasure coded) ● Promote and flush are expensive – Best results when object temperature are skewed ● Most I/O goes to small number of hot objects – Cache should be big enough to capture most of the acting set ● Challenging to benchmark – Need a realistic workload (e.g., not 'dd') to determine how it will perform in practice – Takes a long time to “warm up” the cache
  • 48. 48 ERASURE CODING OBJECT REPLICATED POOL CEPH STORAGE CLUSTER ERASURE CODED POOL CEPH STORAGE CLUSTER COPY COPY OBJECT 31 2 X Y COPY 4 Full copies of stored objects  Very high durability  3x (200% overhead)  Quicker recovery One copy plus parity  Cost-effective durability  1.5x (50% overhead)  Expensive recovery
  • 49. 49 ERASURE CODING SHARDS CEPH STORAGE CLUSTER OBJECT Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL
  • 50. 50 ERASURE CODING SHARDS CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD 0 4 8 12 16 1 5 9 13 17 2 6 10 14 18 3 7 9 15 19 A B C D E A' B' C' D' E' ● Variable stripe size (e.g., 4 KB) ● Zero-fill shards (logically) in partial tail stripe
  • 51. 51 EC READ CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT READ
  • 52. 52 EC READ CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT READ READS
  • 53. 53 EC READ CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT READ REPLY
  • 54. 54 EC WRITE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT WRITE
  • 55. 55 EC WRITE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT WRITE WRITES
  • 56. 56 EC WRITE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT WRITE ACK
  • 57. 57 EC WRITE: DEGRADED CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT WRITE WRITES
  • 58. 58 EC WRITE: PARTIAL FAILURE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT WRITE WRITES
  • 59. 59 EC WRITE: PARTIAL FAILURE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT B B BA A A
  • 60. 60 EC RESTRICTIONS ● Overwrite in place will not work in general ● Log and 2PC would increase complexity, latency ● We chose to restrict allowed operations – create – append (on stripe boundary) – remove (keep previous generation of object for some time) ● These operations can all easily be rolled back locally – create → delete – append → truncate – remove → roll back to previous generation ● Object attrs preserved in existing PG logs (they are small) ● Key/value data is not allowed on EC pools
  • 61. 61 EC WRITE: PARTIAL FAILURE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT B B BA A A
  • 62. 62 EC WRITE: PARTIAL FAILURE CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL CEPH CLIENT A A AA A A
  • 63. 63 EC RESTRICTIONS ● This is a small subset of allowed librados operations – Notably cannot (over)write any extent ● Coincidentally, these operations are also inefficient for erasure codes – Generally require read/modify/write of affected stripe(s) ● Some applications can consume EC directly – RGW (no object data update in place) ● Others can combine EC with a cache tier (RBD, CephFS) – Replication for warm/hot data – Erasure coding for cold data – Tiering agent skips objects with key/value data
  • 64. 64 WHICH ERASURE CODE? ● The EC algorithm and implementation are pluggable – jerasure (free, open, and very fast) – ISA-L (Intel library; optimized for modern Intel procs) – LRC (local recovery code – layers over existing plugins) ● Parameterized – Pick k or m, stripe size ● OSD handles data path, placement, rollback, etc. ● Plugin handles – Encode and decode – Given these available shards, which ones should I fetch to satisfy a read? – Given these available shards and these missing shards, which ones should I fetch to recover?
  • 67. 67 COST OF RECOVERY (REPLICATION) 1 TB OSD 1 TB
  • 68. 68 COST OF RECOVERY (REPLICATION) 1 TB OSD .01 TB .01 TB .01 TB .01 TB ... ... .01 TB .01 TB
  • 69. 69 COST OF RECOVERY (REPLICATION) 1 TB OSD 1 TB
  • 70. 70 COST OF RECOVERY (EC) 1 TB OSD 1 TB 1 TB 1 TB 1 TB
  • 71. 71 LOCAL RECOVERY CODE (LRC) CEPH STORAGE CLUSTER Y OSD 3 OSD 2 OSD 1 OSD 4 OSD X OSD ERASURE CODED POOL A OSD C OSD B OSD OBJECT
  • 72. 72 BIG THANKS TO ● Ceph – Loic Dachary (CloudWatt, FSF France, Red Hat) – Andreas Peters (CERN) – David Zafman (Inktank / Red Hat) ● jerasure / gf-complete – Jim Plank (University of Tennessee) – Kevin Greenan (Box.com) ● Intel (ISL plugin) ● Fujitsu (SHEC plugin)
  • 74. 74 WHAT'S NEXT ● Erasure coding – Allow (optimistic) client reads directly from shards – ARM optimizations for jerasure ● Cache pools – Better agent decisions (when to flush or evict) – Supporting different performance profiles ● e.g., slow / “cheap” flash can read just as fast – Complex topologies ● Multiple readonly cache tiers in multiple sites ● Tiering – Support “redirects” to (very) cold tier below base pool – Enable dynamic spin-down, dedup, and other features
  • 75. 75 OTHER ONGOING WORK ● Performance optimization (SanDisk, Intel, Mellanox) ● Alternative OSD backends – New backend: hybrid key/value and file system – leveldb, rocksdb, LMDB ● Messenger (network layer) improvements – RDMA support (libxio – Mellanox) – Event-driven TCP implementation (UnitedStack)
  • 76. 76 FOR MORE INFORMATION ● http://ceph.com ● http://github.com/ceph ● http://tracker.ceph.com ● Mailing lists – ceph-users@ceph.com – ceph-devel@vger.kernel.org ● irc.oftc.net – #ceph – #ceph-devel ● Twitter – @ceph