Ceph Object Store
oder: Wie speichert man terabyteweise Dokumente
Daniel Schneller
daniel.schneller@centerdevice.de
@dschneller
Wer sind wir?
@dschneller
@drivebytesting
Was machen wir?
Wo kamen wir her?
Warum wollten wir
da weg?
Wohin wollten wir?
Und der Weg?
Ceph Grundlagen
“Unified, distributed storage system designed for
excellent performance, reliability and scalability”
Stark skalierbar
Commodity Hardware
Kein Single Point of Failure
Ceph Komponenten
OSD Daemons
Object Storage Device Daemons
CRUSH Algorithmus
Intelligente Objektverteilung ohne zentrale Metadaten
RADOS
Reliable Autonomous Distributed Object Store
Objekte
Data Pools
Sammelbecken für Objekte mit gleichen Anforderungen
Placement Groups
Monitors
Erste Anlaufstelle für Clients
Hardware Setup
Storage Virtualization
Bare Metal Hardware
Compute Virtualization
Network Virtualization
Virtual Infrastructure
Application
Storage Virtualization
Bare Metal Hardware
Compute Virtualization
Network Virtualization
Virtual Infrastructure
Application
Baseline Benchmarks
Erwartungen definieren
Storage
Disk I/O pro Node
Netzwerk
IEEE 802.3ad != IEEE 802.3ad
> cat /etc/network/interfaces
...
auto bond2
iface bond2 inet manual
bond-slaves p2p3 p2p4 # interfaces to bond
bond-mode 802.3ad # activate LACP
bond-miimon 100 # monitor link health
bond-xmit_hash_policy layer3+4 # use Layer 3+4 for link selection
pre-up ip link set dev bond2 mtu 9000 # set Jumbo Frames
auto vlan-ceph-clust
iface vlan-ceph-clust inet static
pre-up ip link add link bond2 name vlan-ceph-clust type vlan id 105
pre-up ip link set dev vlan-ceph-clust mtu 9000 # Jumbo Frames
post-down ip link delete vlan-ceph-clust
address ...
netmask ...
network ...
broadcast ...
...
IEEE 802.3ad != IEEE 802.3ad
[node01] > iperf -s -B node01.ceph-cluster
[node02] > iperf -c node01.ceph-cluster -P 2
[node03] > iperf -c node01.ceph-cluster -P 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address node01.ceph-cluster
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49412
[ 5] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49413
[ 6] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59947
[ 7] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59946
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 342 MBytes 286 Mbits/sec
[ 5] 0.0-10.0 sec 271 MBytes 227 Mbits/sec
[SUM] 0.0-10.0 sec 613 MBytes 513 Mbits/sec
[ 6] 0.0-10.0 sec 293 MBytes 246 Mbits/sec
[ 7] 0.0-10.0 sec 338 MBytes 283 Mbits/sec
[SUM] 0.0-10.0 sec 631 MBytes 529 Mbits/sec
IEEE 802.3ad != IEEE 802.3ad
[node01] > iperf -s -B node01.ceph-cluster
[node02] > iperf -c node01.ceph-cluster -P 2
[node03] > iperf -c node01.ceph-cluster -P 2
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address node01.ceph-cluster
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49412
[ 5] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49413
[ 6] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59947
[ 7] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59946
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 342 MBytes 286 Mbits/sec
[ 5] 0.0-10.0 sec 271 MBytes 227 Mbits/sec
[SUM] 0.0-10.0 sec 613 MBytes 513 Mbits/sec
[ 6] 0.0-10.0 sec 293 MBytes 246 Mbits/sec
[ 7] 0.0-10.0 sec 338 MBytes 283 Mbits/sec
[SUM] 0.0-10.0 sec 631 MBytes 529 Mbits/sec ???
Messen!
…und die Ergebnisse verstehen
CenterDevice
Gesamtarchitektur
Node 1
OSD 1
…
…
Node 2
…
…
…
Node 3
…
…
…
Node 4
…
OSD 48
…
Bare Metal
Ceph
Gesamtarchitektur
Node 1
OSD 1
…
…
Rados GW
Node 2
…
…
…
Rados GW
Node 3
…
…
…
Rados GW
Node 4
…
OSD 48
…
Rados GW
Bare Metal
Ceph
Gesamtarchitektur
Node 1
OSD 1
…
…
Rados GW
Node 2
…
…
…
Rados GW
Node 3
…
…
…
Rados GW
Node 4
…
OSD 48
…
Rados GW
VM 1
HAProxy
VM 1
HAProxy
VM 1
HAProxy
VM …
HAProxy
VMs
Bare Metal
Ceph
Gesamtarchitektur
Node 1
OSD 1
…
…
Rados GW
Node 2
…
…
…
Rados GW
Node 3
…
…
…
Rados GW
Node 4
…
OSD 48
…
Rados GW
VMs
Bare Metal
VM 1
Ceph
HAProxy
CenterDevice
Swift
VM 1
HAProxy
CenterDevice
Swift
VM 1
HAProxy
CenterDevice
Swift
VM …
HAProxy
CenterDevice
Swift
Vorteile
Nachteile
Caveats
CephFS
Not recommended for production data.
Scrubbing
Integrität hat ihren Preis. Aber man kann handeln!
Zukunft
Rados Gateway
Ceph Caching Tier
SSD based Journaling
10GBit/s Networking
Zum Schluss
Folien bei Slideshare
http://www.slideshare.net/dschneller
Handout bei
CenterDevice
https://public.centerdevice.de/399612bf-ce31-489f-
bd58-04e8d030be52
@drivebytesting
@dschneller
Ende
Daniel Schneller
daniel.schneller@centerdevice.de
@dschneller

Ceph Object Store