Private Cloud mit Ceph
und OpenStack
Daniel Schneller
daniel.schneller@centerdevice.de
@dschneller
Lukas Pustina
lukas.pustina@centerdevice.de
@drivebytesting
Aus dem täglichen Wahnsinn des Cloud Betriebs
Wer sind wir?
@dschneller
@drivebytesting
Was machen wir?
Wo kamen wir her?
Warum wollten wir
davon weg?
Wohin wollten wir?
Private Cloud
Hardwareabstraktion
Bare Metal Hardware
Virtual Environment
Application
Virtualization
Storage NetworkCompute
Bare Metal Hardware
Baseline Benchmarks
Erwartungen definieren
Storage
Disk I/O pro Node
Netzwerk
IEEE 802.3ad != IEEE 802.3ad
> cat /etc/network/interfaces
...
auto bond2
iface bond2 inet manual
bond-slaves p2p3 p2p4 # interfaces to bond
bond-mode 802.3ad # activate LACP
bond-miimon 100 # monitor link health
bond-xmit_hash_policy layer3+4 # use Layer 3+4 for link selection
pre-up ip link set dev bond2 mtu 9000 # set Jumbo Frames
auto vlan-ceph-clust
iface vlan-ceph-clust inet static
pre-up ip link add link bond2 name vlan-ceph-clust type vlan id 105
pre-up ip link set dev vlan-ceph-clust mtu 9000 # Jumbo Frames
post-down ip link delete vlan-ceph-clust
address ...
netmask ...
network ...
broadcast ...
...
[node01] > iperf -s -B node01.ceph-cluster
[node02] > iperf -c node01.ceph-cluster
[node03] > iperf -c node01.ceph-cluster
Storage
“Unified, distributed storage system designed for
excellent performance, reliability and scalability”
Hoch skalierbar
Commodity Hardware
Kein Single Point of Failure
Ceph Komponenten
OSD Daemons
Object Storage Device Daemons
Ceph Benchmark
Virtualisierung
OpenStack Komponenten
CenterDevice
Gesamtarchitektur
Bare Metal
Ceph
Node 1 Node 2 Node 3 Node 4
OSD 1
…
…
…
…
…
…
…
…
…
OSD 48
…
Node 5
Node 1 Node 2 Node 3 Node 4
CD-VM1
VMs
Bare Metal
Ceph
Node 6
CD-VM2
Node 7
CD-VM3
Node 8
CD-VM4
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
Node …
VM …
VM …
Gesamtarchitektur
OSD 1
…
…
…
…
…
…
…
…
…
OSD 48
…
Node 5
Node 3Node 1
Rados GW
Node 2
Rados GW Rados GW
Node 4
Rados GW
CD-VM1
VMs
Bare Metal
Ceph
Node 6
CD-VM2
Node 7
CD-VM3
Node 8
CD-VM4
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
Node …
VM …
VM …
Gesamtarchitektur
OSD 1
…
…
…
…
…
…
…
…
…
OSD 48
…
Node 5
Node 1
Rados GW
Node 2
Rados GW
Node 3
Rados GW
Node 4
Rados GW
CD-VM1
HAProxyVMs
Bare Metal
Ceph
Node 6
CD-VM2
Node 7
CD-VM3
Node 8
CD-VM4
HAProxy HAProxyHAProxy
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
Node …
VM …
VM …
Gesamtarchitektur
OSD 1
…
…
…
…
…
…
…
…
…
OSD 48
…
Node 5
Node 1
Rados GW
Node 2
Rados GW
Node 3
Rados GW
Node 4
Rados GW
CD-VM1
HAProxyVMs
Bare Metal
Ceph
Node 6
CD-VM2
Node 7
CD-VM3
Node 8
CD-VM4
HAProxy HAProxyHAProxy
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
ES Mo
CD CD
Node …
VM …
VM …
Gesamtarchitektur
OSD 1
…
…
…
…
…
…
…
…
…
OSD 48
…
Vorteile
Nachteile
Hardware Caveats
Bonding Modes
Halbe Kraft voraus!
[node01] > iperf -s -B node01.ceph-cluster
[…]
[ 4] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49412
[ 5] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49413
[ 6] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59947
[ 7] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59946
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 342 MBytes 286 Mbits/sec
[ 5] 0.0-10.0 sec 271 MBytes 227 Mbits/sec
[SUM] 0.0-10.0 sec 613 MBytes 513 Mbits/sec
[ 6] 0.0-10.0 sec 293 MBytes 246 Mbits/sec
[ 7] 0.0-10.0 sec 338 MBytes 283 Mbits/sec
[SUM] 0.0-10.0 sec 631 MBytes 529 Mbits/sec ???
Bonding Modes
Halbe Kraft voraus!
CAUTION
TRUST

NOTHING!
CAUTION
MEASURE

EVERYTHING!
Bonding und VLANs
Ein Switch als Weihnachtsbaum
Bonding und VLANs
Ein Switch als Weihnachtsbaum
01: Handle up event of interface Po4
02: VPC Po4(VPC ID: 4) UP on self. Inform peer to program peer-link to BLOCK traffic
03: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 4
04: Handle down event of interface Po2
05: VPC Po2(VPC ID: 2) DOWN on self. Inform peer to program peer-link to ALLOW traffic
06: Handle up event of interface Po2
07: VPC Po2(VPC ID: 2) UP on self. Inform peer to program peer-link to BLOCK traffic
08: Handle down event of interface Po42
09: VPC Po42(VPC ID: 42) DOWN on self. Inform peer to program peer-link to ALLOW traffic
10: Handle up event of interface Po42
11: VPC Po42(VPC ID: 42) UP on self. Inform peer to program peer-link to BLOCK traffic
12: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 42
13: Received control message of type MLAG_INTERFACE_UP, for VPC ID 14
14: Handle down event of interface Po16
15: VPC Po16(VPC ID: 16) DOWN on self. Inform peer to program peer-link to ALLOW traffic
16: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 39
17: Received control message of type MLAG_INTERFACE_UP, for VPC ID 42
[…]
Bonding und VLANs
Ein Switch als Weihnachtsbaum
CAUTION
TRUST

NOTHING!
CAUTION
FIRMWARE
FAILS, TOO!
BIOS Updates
3 Betriebssysteme und diverse Gigabytes
BIOS Updates
3 Betriebssysteme und diverse Gigabytes
CAUTION
KNOW THE

TOOLCHAIN!
Ceph Caveats
Scrubbing
Integrität hat ihren Preis.
CAUTION
DISK I/O

INTENSIVE!
CAUTION
SCHEDULE

MANUALLY!
RADOS Gateway
Use as recommended for maximum happiness
CAUTION
OBEY USER

LIMITS!
CAUTION
KNOW THE
CONFIG
CAUTION
NO SNAPS

ALLOWED!
OpenStack Caveats
Netzwerk – Total trivial
Node 5
CD-VM1
Node 6
CD-VM2
Node 7
LB1
HAProxy
…
Node 8
LB2
HAProxy
…
Der Wolf im Netzpelz
Netzwerk – Total trivial
Der Wolf im Netzpelz
https://openstack.redhat.com/Networking_in_too_much_detail
Netzwerk – Total trivial
Der Wolf im Netzpelz
VHOST_NET_ENABLED=0
# To load the vhost_net module, which in some cases can speed up
# network performance, set VHOST_NET_ENABLED to 1.
$ cat /etc/default/qemu-kvm
Netzwerk – Total trivial
Der Wolf im Netzpelz
# To load the vhost_net module, which in some cases can speed up
# network performance, set VHOST_NET_ENABLED to 1.
VHOST_NET_ENABLED=1
$ cat /etc/default/qemu-kvm
Netzwerk – Total trivial
Der Wolf im Netzpelz
# To load the vhost_net module, which in some cases can speed up
# network performance, set VHOST_NET_ENABLED to 1.
VHOST_NET_ENABLED=1
$ cat /etc/default/qemu-kvm
CAUTION
VIRTUAL

NETWORKS!
Live Migration
“Der sich den Wolf patcht”
CAUTION
CHECK BUG

TRACKERS!
CAUTION
YOU’RE NOT

DEFAULT!
Heat Stacks
Ein Wolf kommt selten allein
CAUTION
CHECK BUG

TRACKERS!
CAUTION
YOU’RE NOT

THE DEFAULT!
Fazit
Fazit
http://bit.ly/1FktZok #AllesKaputt
Aber OpenSource ermöglicht Selbsthilfe.
Zum Schluss
Folien bei Slideshare
http://www.slideshare.net/dschneller
http://www.cloudfibel.de
codecentricCloudFibel1.2015
PRAXIS AUS DEM NÄHKÄSTCHENÜBERBLICK
6 OPENSTACK - PRIVATSACHE
18 DOCKER - DIENSTE IN CONTAINERN
29 CEPH - MÄCHTIG FLEXIBEL
42 ANSIBLE - EINFACH LEISTUNGSFÄHIG
58 DOCKER TOOLS
68 CEPH BENCHMARKS
74 HARDWARE FÄLLT AUS
78 NEVER CHANGE A RUNNING SYSTEM
80 PROVISIONIERUNG VON IAAS CLOUDS
86 JINJA2 FÜR BESSERE ANSIBLE TEMPLATES
93 SSH ZWEIFAKTOR-AUTHENTIFIZIERUNG
100 EIN OPENSTACK-KRIMI
Blog Posts
• https://blog.codecentric.de/en/2015/03/true-kvm-live-migration-openstack-
icehouse-ceph-based-vm-storage/
• https://blog.codecentric.de/en/2014/06/provisioning-iaas-clouds-dynamic-ansible-
inventories-openstack-metadata/
• https://blog.codecentric.de/en/2014/08/jinja2-better-ansible-playbooks-templates/
• https://blog.codecentric.de/en/2014/06/ansible-simple-yet-powerful-automation/
• https://blog.codecentric.de/en/2014/09/openstack-crime-story-solved-tcpdump-
sysdig-iostat-episode-1/
• https://blog.codecentric.de/en/2014/12/haproxy-http-header-rate-limiting/
• https://blog.codecentric.de/en/2014/12/centerdevice-cloud-architecture-revisited/
• https://blog.codecentric.de/en/2013/12/never-change-running-system-wrong/
• https://blog.codecentric.de/en/2013/11/hardware-will-fail-just-way-expect/
Vielen Dank.
Daniel Schneller
daniel.schneller@centerdevice.de
@dschneller
Dr. Lukas Pustina
lukas.pustina@centerdevice.de
@drivebytesting

Private Cloud mit Ceph und OpenStack

  • 1.
    Private Cloud mitCeph und OpenStack Daniel Schneller daniel.schneller@centerdevice.de @dschneller Lukas Pustina lukas.pustina@centerdevice.de @drivebytesting
  • 2.
    Aus dem täglichenWahnsinn des Cloud Betriebs
  • 3.
  • 4.
  • 5.
  • 7.
  • 8.
  • 9.
  • 11.
  • 12.
  • 13.
    Bare Metal Hardware VirtualEnvironment Application Virtualization Storage NetworkCompute
  • 14.
  • 17.
  • 18.
  • 21.
  • 22.
    IEEE 802.3ad !=IEEE 802.3ad > cat /etc/network/interfaces ... auto bond2 iface bond2 inet manual bond-slaves p2p3 p2p4 # interfaces to bond bond-mode 802.3ad # activate LACP bond-miimon 100 # monitor link health bond-xmit_hash_policy layer3+4 # use Layer 3+4 for link selection pre-up ip link set dev bond2 mtu 9000 # set Jumbo Frames auto vlan-ceph-clust iface vlan-ceph-clust inet static pre-up ip link add link bond2 name vlan-ceph-clust type vlan id 105 pre-up ip link set dev vlan-ceph-clust mtu 9000 # Jumbo Frames post-down ip link delete vlan-ceph-clust address ... netmask ... network ... broadcast ... ...
  • 23.
    [node01] > iperf-s -B node01.ceph-cluster [node02] > iperf -c node01.ceph-cluster [node03] > iperf -c node01.ceph-cluster
  • 24.
  • 25.
    “Unified, distributed storagesystem designed for excellent performance, reliability and scalability”
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 33.
  • 34.
  • 35.
    Gesamtarchitektur Bare Metal Ceph Node 1Node 2 Node 3 Node 4 OSD 1 … … … … … … … … … OSD 48 …
  • 36.
    Node 5 Node 1Node 2 Node 3 Node 4 CD-VM1 VMs Bare Metal Ceph Node 6 CD-VM2 Node 7 CD-VM3 Node 8 CD-VM4 ES Mo CD CD ES Mo CD CD ES Mo CD CD ES Mo CD CD Node … VM … VM … Gesamtarchitektur OSD 1 … … … … … … … … … OSD 48 …
  • 37.
    Node 5 Node 3Node1 Rados GW Node 2 Rados GW Rados GW Node 4 Rados GW CD-VM1 VMs Bare Metal Ceph Node 6 CD-VM2 Node 7 CD-VM3 Node 8 CD-VM4 ES Mo CD CD ES Mo CD CD ES Mo CD CD ES Mo CD CD Node … VM … VM … Gesamtarchitektur OSD 1 … … … … … … … … … OSD 48 …
  • 38.
    Node 5 Node 1 RadosGW Node 2 Rados GW Node 3 Rados GW Node 4 Rados GW CD-VM1 HAProxyVMs Bare Metal Ceph Node 6 CD-VM2 Node 7 CD-VM3 Node 8 CD-VM4 HAProxy HAProxyHAProxy ES Mo CD CD ES Mo CD CD ES Mo CD CD ES Mo CD CD Node … VM … VM … Gesamtarchitektur OSD 1 … … … … … … … … … OSD 48 …
  • 39.
    Node 5 Node 1 RadosGW Node 2 Rados GW Node 3 Rados GW Node 4 Rados GW CD-VM1 HAProxyVMs Bare Metal Ceph Node 6 CD-VM2 Node 7 CD-VM3 Node 8 CD-VM4 HAProxy HAProxyHAProxy ES Mo CD CD ES Mo CD CD ES Mo CD CD ES Mo CD CD Node … VM … VM … Gesamtarchitektur OSD 1 … … … … … … … … … OSD 48 …
  • 40.
  • 41.
  • 42.
  • 43.
    Bonding Modes Halbe Kraftvoraus! [node01] > iperf -s -B node01.ceph-cluster […] [ 4] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49412 [ 5] local 10.102.5.11 port 5001 connected with 10.102.5.12 port 49413 [ 6] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59947 [ 7] local 10.102.5.11 port 5001 connected with 10.102.5.13 port 59946 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 342 MBytes 286 Mbits/sec [ 5] 0.0-10.0 sec 271 MBytes 227 Mbits/sec [SUM] 0.0-10.0 sec 613 MBytes 513 Mbits/sec [ 6] 0.0-10.0 sec 293 MBytes 246 Mbits/sec [ 7] 0.0-10.0 sec 338 MBytes 283 Mbits/sec [SUM] 0.0-10.0 sec 631 MBytes 529 Mbits/sec ???
  • 44.
    Bonding Modes Halbe Kraftvoraus! CAUTION TRUST
 NOTHING! CAUTION MEASURE
 EVERYTHING!
  • 45.
    Bonding und VLANs EinSwitch als Weihnachtsbaum
  • 46.
    Bonding und VLANs EinSwitch als Weihnachtsbaum 01: Handle up event of interface Po4 02: VPC Po4(VPC ID: 4) UP on self. Inform peer to program peer-link to BLOCK traffic 03: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 4 04: Handle down event of interface Po2 05: VPC Po2(VPC ID: 2) DOWN on self. Inform peer to program peer-link to ALLOW traffic 06: Handle up event of interface Po2 07: VPC Po2(VPC ID: 2) UP on self. Inform peer to program peer-link to BLOCK traffic 08: Handle down event of interface Po42 09: VPC Po42(VPC ID: 42) DOWN on self. Inform peer to program peer-link to ALLOW traffic 10: Handle up event of interface Po42 11: VPC Po42(VPC ID: 42) UP on self. Inform peer to program peer-link to BLOCK traffic 12: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 42 13: Received control message of type MLAG_INTERFACE_UP, for VPC ID 14 14: Handle down event of interface Po16 15: VPC Po16(VPC ID: 16) DOWN on self. Inform peer to program peer-link to ALLOW traffic 16: Received control message of type MLAG_INTERFACE_DOWN, for VPC ID 39 17: Received control message of type MLAG_INTERFACE_UP, for VPC ID 42 […]
  • 47.
    Bonding und VLANs EinSwitch als Weihnachtsbaum CAUTION TRUST
 NOTHING! CAUTION FIRMWARE FAILS, TOO!
  • 48.
    BIOS Updates 3 Betriebssystemeund diverse Gigabytes
  • 49.
    BIOS Updates 3 Betriebssystemeund diverse Gigabytes CAUTION KNOW THE
 TOOLCHAIN!
  • 50.
  • 51.
    Scrubbing Integrität hat ihrenPreis. CAUTION DISK I/O
 INTENSIVE! CAUTION SCHEDULE
 MANUALLY!
  • 52.
    RADOS Gateway Use asrecommended for maximum happiness CAUTION OBEY USER
 LIMITS! CAUTION KNOW THE CONFIG CAUTION NO SNAPS
 ALLOWED!
  • 53.
  • 54.
    Netzwerk – Total trivial Node5 CD-VM1 Node 6 CD-VM2 Node 7 LB1 HAProxy … Node 8 LB2 HAProxy … Der Wolf im Netzpelz
  • 55.
    Netzwerk – Total trivial DerWolf im Netzpelz https://openstack.redhat.com/Networking_in_too_much_detail
  • 56.
    Netzwerk – Total trivial DerWolf im Netzpelz VHOST_NET_ENABLED=0 # To load the vhost_net module, which in some cases can speed up # network performance, set VHOST_NET_ENABLED to 1. $ cat /etc/default/qemu-kvm
  • 57.
    Netzwerk – Total trivial DerWolf im Netzpelz # To load the vhost_net module, which in some cases can speed up # network performance, set VHOST_NET_ENABLED to 1. VHOST_NET_ENABLED=1 $ cat /etc/default/qemu-kvm
  • 58.
    Netzwerk – Total trivial DerWolf im Netzpelz # To load the vhost_net module, which in some cases can speed up # network performance, set VHOST_NET_ENABLED to 1. VHOST_NET_ENABLED=1 $ cat /etc/default/qemu-kvm CAUTION VIRTUAL
 NETWORKS!
  • 59.
    Live Migration “Der sichden Wolf patcht” CAUTION CHECK BUG
 TRACKERS! CAUTION YOU’RE NOT
 DEFAULT!
  • 60.
    Heat Stacks Ein Wolfkommt selten allein CAUTION CHECK BUG
 TRACKERS! CAUTION YOU’RE NOT
 THE DEFAULT!
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
    http://www.cloudfibel.de codecentricCloudFibel1.2015 PRAXIS AUS DEMNÄHKÄSTCHENÜBERBLICK 6 OPENSTACK - PRIVATSACHE 18 DOCKER - DIENSTE IN CONTAINERN 29 CEPH - MÄCHTIG FLEXIBEL 42 ANSIBLE - EINFACH LEISTUNGSFÄHIG 58 DOCKER TOOLS 68 CEPH BENCHMARKS 74 HARDWARE FÄLLT AUS 78 NEVER CHANGE A RUNNING SYSTEM 80 PROVISIONIERUNG VON IAAS CLOUDS 86 JINJA2 FÜR BESSERE ANSIBLE TEMPLATES 93 SSH ZWEIFAKTOR-AUTHENTIFIZIERUNG 100 EIN OPENSTACK-KRIMI
  • 66.
    Blog Posts • https://blog.codecentric.de/en/2015/03/true-kvm-live-migration-openstack- icehouse-ceph-based-vm-storage/ •https://blog.codecentric.de/en/2014/06/provisioning-iaas-clouds-dynamic-ansible- inventories-openstack-metadata/ • https://blog.codecentric.de/en/2014/08/jinja2-better-ansible-playbooks-templates/ • https://blog.codecentric.de/en/2014/06/ansible-simple-yet-powerful-automation/ • https://blog.codecentric.de/en/2014/09/openstack-crime-story-solved-tcpdump- sysdig-iostat-episode-1/ • https://blog.codecentric.de/en/2014/12/haproxy-http-header-rate-limiting/ • https://blog.codecentric.de/en/2014/12/centerdevice-cloud-architecture-revisited/ • https://blog.codecentric.de/en/2013/12/never-change-running-system-wrong/ • https://blog.codecentric.de/en/2013/11/hardware-will-fail-just-way-expect/
  • 67.
    Vielen Dank. Daniel Schneller daniel.schneller@centerdevice.de @dschneller Dr.Lukas Pustina lukas.pustina@centerdevice.de @drivebytesting