SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
99.999% Available OpenStack Cloud
-
A Builder's Guide
Danny Al-Gaaf (Deutsche Telekom)
OpenStack Summit 2015 - Tokyo
● Motivation
● Availability and SLA's
● Data centers
○ Setup and failure scenarios
● OpenStack and Ceph
○ Architecture and Critical Components
○ HA setup
○ Quorum?
● OpenStack and Ceph == HA?
○ Failure scenarios
○ Mitigation
● Conclusions
Overview
2
Motivation
NFV Cloud @ Deutsche Telekom
● Datacenter design
○ Backend DCs
■ Few but classic DCs
■ High SLAs for infrastructure and services
■ For private/customer data and services
○ Frontend DCs
■ Small but many
■ Near to the customer
■ Lower SLAs, can fail at any time
■ NFVs:
● Spread over many FDCs
● Failures are handled by services and not the infrastructure
● Run telco core services @OpenStack/KVM/Ceph
4
Availability
Availability
● Measured relative to “100 % operational”
6
availability downtime classification
99.9% 8.76 hours/year high availability
99.99% 52.6 minutes/year very high availability
99.999% 5.26 minutes/year highest availability
99.9999% 0.526 minutes/year disaster tolerant
High Availability
● Continuous system availability in case of component
failures
● Which availability?
○ Server
○ Network
○ Datacenter
○ Cloud
○ Application/Service
● End-to-End availability most interesting
7
High Availability
● Calculation
○ Each component contributes to the service availability
■ Infrastructure
■ Hardware
■ Software
■ Processes
○ Likelihood of disaster and failure scenarios
○ Model can get very complex
● SLA’s
○ ITIL (IT Infrastructure Library)
○ Planned maintenance depending on SLA may be excluded
8
Data centers
Failure scenarios
● Power outage
○ External
○ Internal
○ Backup UPS/Generator
● Network outage
○ External connectivity
○ Internal
■ Cables
■ Switches, router
● Failure of a server or a component
● Failure of a software service
10
Failure scenarios
● Human error still often leading
cause of outage
○ Misconfiguration
○ Accidents
○ Emergency power-off
● Disaster
○ Fire
○ Flood
○ Earthquake
○ Plane crash
○ Nuclear accident
11
Data Center Tiers
12
Mitigation
● Identify potential SPoF
● Use redundant components
● Careful planning
○ Network design (external/internal)
○ Power management (external/internal)
○ Fire suppression
○ Disaster management
○ Monitoring
● 5-nines on DC/HW level hard to achieve
○ Tier IV usually too expensive (compared with Tier III or III+)
○ Requires HA concept on cloud and application level
13
Example: Network
● Spine/leaf arch
● Redundant
○ DC-R
○ Spine switches
○ Leaf switches (ToR)
○ OAM switches
○ Firewall
● Server
○ Redundant NICs
○ Redundant power lines
and supplies
14
Ceph and OpenStack
Architecture: Ceph
16
Architecture: Ceph Components
● OSDs
○ 10s - 1000s per cluster
○ One per device (HDD/SDD/RAID Group, SAN …)
○ Store objects
○ Handle replication and recovery
● MONs:
○ Maintain cluster membership and states
○ Use PAXOS protocol to establish quorum
consensus
○ Small, lightweight
○ Odd number17
Architecture: Ceph and OpenStack
18
HA - Critical Components
Which services need to be HA?
● Control plane
○ Provisioning, management
○ API endpoints and services
○ Admin nodes
○ Control nodes
● Data plane
○ Steady states
○ Storage
○ Network
19
HA Setup
● Stateless services
○ No dependency between requests
○ After reply no further attention required
○ API endpoints (e.g. nova-api, glance-api,...) or nova-scheduler
● Stateful service
○ Action typically comprises out of multiple requests
○ Subsequent requests depend on the results of a former request
○ Databases, RabbitMQ
20
HA Setup
21
active/passive active/active
stateless ● load balance
redundant services
● load balance redundant
services
stateful ● bring replacement
resource online
● redundant services, all
with the same state
● state changes are
passed to all instances
OpenStack HA
22
Quorum?
● Required to decide which cluster partition/member is
primary to prevent data/service corruption
● Examples:
○ Databases
■ MariaDB / Galera, MongoDB, CassandraDB
○ Pacemaker/corosync
○ Ceph Monitors
■ Paxos
■ Odd number of MONs required
■ At least 3 MONs for HA, simple majority (2:3, 3:5, 4:7, …)
■ Without quorum:
● no changes of cluster membership (e.g. add new MONs/ODSs)
● Clients can’t connect to cluster
23
OpenStack and Ceph == HA ?
SPoF
● OpenStack HA
○ No SPoF assumed
● Ceph
○ No SPoF assumed
○ Availability of RBDs is critical to VMs
○ Availability of RadosGW can be easily managed via HAProxy
● What in case of failures on higher level?
○ Data center cores or fire compartments
○ Network
■ Physical
■ Misconfiguration
○ Power
25
Setup - Two Rooms
26
Failure scenarios - FC fails
27
Failure scenarios - FC fails
28
Failure scenarios - Split brain
29
● Ceph
● Quorum selects B
● Storage in A stops
● OpenStack HA:
● Selects B
● VMs in B still running
● Best-case scenario
Failure scenarios - Split brain
30
● Ceph
● Quorum selects B
● Storage in A stops
● OpenStack HA:
● Selects A
● VMs in A and B stop
working
● Worst-case scenario
Other issues
● Replica distribution
○ Two room setup:
■ 2 or 3 replica contain risk of having only one replica left
■ Would require 4 replica (2:2)
● Reduced performance
● Increased traffic and costs
○ Alternative: erasure coding
■ Reduced performance, less space required
● Spare capacity
○ Remaining room requires spare capacity to restore
○ Depends on
■ Failure/restore scenario
■ Replication vs erasure code
○ Costs
31
Mitigation - Three FCs
32
● Third FC/failure
zone hosting all
services
● Usually higher
costs
● More resistant
against failures
● Better replica
distribution
● More east/west
traffic
Mitigation - Quorum Room
33
● Most DCs have
backup rooms
● Only a few servers
to host quorum
related services
● Less cost
intensive
● Can mitigate split
brain between FCs
(depending on
network layout)
Mitigation - Pets vs Cattle
34
● NO pets allowed !!!
● Only cloud-ready applications
Mitigation - Failure tolerant applications
35
● Tier level is not the most relevant layer
● Application must build their own cluster
mechanisms on top of the DC
→ increases the availability significantly
● Data replication must be done across
multi-region
● In case of a disaster route traffic to
different DC
● Many VNF (virtual network functions)
already support such setups
Mitigation - Federated Object Stores
36
● Best way to synchronize and replicate
data across multiple DC is usage of
object storage
● Sync is done asynchronously
Open issues:
● Doesn’t solve replication of databases
● Many applications don’t support object
storage and need to be adapted
● Applications also need to support
regions/zones
Mitigation - Outlook
● “OpenStack follows Storage”
○ Use RBDs as fencing devices
○ Extend Ceph MONs
■ Include information about physical placement similar to CRUSH map
■ Enable HA setup to query quorum decisions and map quorum to physical layout
● Passive standby Ceph MONs to ease deployment of
MONs if quorum fails
○ http://tracker.ceph.com/projects/ceph/wiki/Passive_monitors
● Generic quorum service/library ?
37
Conclusions
Conclusions
● OpenStack and Ceph provide HA if carefully planed
○ Be aware of potential failure scenarios!
○ All Quorums need must be synced
○ Third room must be used
○ Replica distribution and spare capacity must be considered
○ Ceph need more extended quorum information
● Target for five 9’s is E2E
○ Five 9’s on data center level very expensive
○ No pets !!!
○ Distribute applications or services over multiple DCs
39
Get involved !
● Ceph
○ https://ceph.com/community/contribute/
○ ceph-devel@vger.kernel.org
○ IRC: OFTC
■ #ceph,
■ #ceph-devel
○ Ceph Developer Summit
● OpenStack
○ Cinder, Glance, Manila, ...
40
danny.al-gaaf@telekom.de
dalgaaf
linkedin.com/in/dalgaaf
Danny Al-Gaaf
Senior Cloud Technologist
IRC
Q&A - THANK YOU!

Weitere ähnliche Inhalte

Was ist angesagt?

Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosGluster.org
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterRed_Hat_Storage
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...NETWAYS
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure CodingShinobu Kinjo
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersSage Weil
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overviewGluster.org
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)Hien Nguyen Van
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storageGluster.org
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackSage Weil
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorialGluster.org
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpen-NFP
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challengesGluster.org
 
My personal journey through the World of Open Source! How What Was Old Beco...
My personal journey through  the World of Open Source!  How What Was Old Beco...My personal journey through  the World of Open Source!  How What Was Old Beco...
My personal journey through the World of Open Source! How What Was Old Beco...Ceph Community
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingOpen-NFP
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching finalGluster.org
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGluster.org
 

Was ist angesagt? (20)

Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure Coding
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStack
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorial
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouter
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
My personal journey through the World of Open Source! How What Was Old Beco...
My personal journey through  the World of Open Source!  How What Was Old Beco...My personal journey through  the World of Open Source!  How What Was Old Beco...
My personal journey through the World of Open Source! How What Was Old Beco...
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 

Andere mochten auch

PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...Puppet
 
Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
 Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
Deutsche Telekom CMD 2015 - Cost and Portfolio TransformationDeutsche Telekom
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4Ulf Wendel
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAUlf Wendel
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL FabricMats Kindahl
 
Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015SYGroup
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKatie Crimi
 

Andere mochten auch (9)

PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
 
Deutsche telekom
Deutsche telekomDeutsche telekom
Deutsche telekom
 
Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
 Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HA
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
 
Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
 
SDN IN DT’s TERASTREAM
SDN IN DT’s TERASTREAMSDN IN DT’s TERASTREAM
SDN IN DT’s TERASTREAM
 

Ähnlich wie 99.999% Available OpenStack Cloud - A Builder's Guide

Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebula Project
 
Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Thang Man
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In ProductionClayton Weise
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH Ceph Community
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula Project
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project UpdateCeph Community
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Orit Wasserman
 
Optimized placement in Openstack for NFV
Optimized placement in Openstack for NFVOptimized placement in Openstack for NFV
Optimized placement in Openstack for NFVDebojyoti Dutta
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfClyso GmbH
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to RealitySriram Subramanian
 

Ähnlich wie 99.999% Available OpenStack Cloud - A Builder's Guide (20)

Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In Production
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful Clouds
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
 
Optimized placement in Openstack for NFV
Optimized placement in Openstack for NFVOptimized placement in Openstack for NFV
Optimized placement in Openstack for NFV
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
 

Mehr von Danny Al-Gaaf

Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018Danny Al-Gaaf
 
Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018Danny Al-Gaaf
 
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...Danny Al-Gaaf
 
Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017Danny Al-Gaaf
 
Email storage with Ceph
Email storage with CephEmail storage with Ceph
Email storage with CephDanny Al-Gaaf
 
DOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiateDOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiateDanny Al-Gaaf
 
Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?Danny Al-Gaaf
 

Mehr von Danny Al-Gaaf (7)

Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018
 
Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018
 
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
 
Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017
 
Email storage with Ceph
Email storage with CephEmail storage with Ceph
Email storage with Ceph
 
DOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiateDOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiate
 
Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?
 

Kürzlich hochgeladen

Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 

Kürzlich hochgeladen (15)

Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 

99.999% Available OpenStack Cloud - A Builder's Guide

  • 1. 99.999% Available OpenStack Cloud - A Builder's Guide Danny Al-Gaaf (Deutsche Telekom) OpenStack Summit 2015 - Tokyo
  • 2. ● Motivation ● Availability and SLA's ● Data centers ○ Setup and failure scenarios ● OpenStack and Ceph ○ Architecture and Critical Components ○ HA setup ○ Quorum? ● OpenStack and Ceph == HA? ○ Failure scenarios ○ Mitigation ● Conclusions Overview 2
  • 4. NFV Cloud @ Deutsche Telekom ● Datacenter design ○ Backend DCs ■ Few but classic DCs ■ High SLAs for infrastructure and services ■ For private/customer data and services ○ Frontend DCs ■ Small but many ■ Near to the customer ■ Lower SLAs, can fail at any time ■ NFVs: ● Spread over many FDCs ● Failures are handled by services and not the infrastructure ● Run telco core services @OpenStack/KVM/Ceph 4
  • 6. Availability ● Measured relative to “100 % operational” 6 availability downtime classification 99.9% 8.76 hours/year high availability 99.99% 52.6 minutes/year very high availability 99.999% 5.26 minutes/year highest availability 99.9999% 0.526 minutes/year disaster tolerant
  • 7. High Availability ● Continuous system availability in case of component failures ● Which availability? ○ Server ○ Network ○ Datacenter ○ Cloud ○ Application/Service ● End-to-End availability most interesting 7
  • 8. High Availability ● Calculation ○ Each component contributes to the service availability ■ Infrastructure ■ Hardware ■ Software ■ Processes ○ Likelihood of disaster and failure scenarios ○ Model can get very complex ● SLA’s ○ ITIL (IT Infrastructure Library) ○ Planned maintenance depending on SLA may be excluded 8
  • 10. Failure scenarios ● Power outage ○ External ○ Internal ○ Backup UPS/Generator ● Network outage ○ External connectivity ○ Internal ■ Cables ■ Switches, router ● Failure of a server or a component ● Failure of a software service 10
  • 11. Failure scenarios ● Human error still often leading cause of outage ○ Misconfiguration ○ Accidents ○ Emergency power-off ● Disaster ○ Fire ○ Flood ○ Earthquake ○ Plane crash ○ Nuclear accident 11
  • 13. Mitigation ● Identify potential SPoF ● Use redundant components ● Careful planning ○ Network design (external/internal) ○ Power management (external/internal) ○ Fire suppression ○ Disaster management ○ Monitoring ● 5-nines on DC/HW level hard to achieve ○ Tier IV usually too expensive (compared with Tier III or III+) ○ Requires HA concept on cloud and application level 13
  • 14. Example: Network ● Spine/leaf arch ● Redundant ○ DC-R ○ Spine switches ○ Leaf switches (ToR) ○ OAM switches ○ Firewall ● Server ○ Redundant NICs ○ Redundant power lines and supplies 14
  • 17. Architecture: Ceph Components ● OSDs ○ 10s - 1000s per cluster ○ One per device (HDD/SDD/RAID Group, SAN …) ○ Store objects ○ Handle replication and recovery ● MONs: ○ Maintain cluster membership and states ○ Use PAXOS protocol to establish quorum consensus ○ Small, lightweight ○ Odd number17
  • 18. Architecture: Ceph and OpenStack 18
  • 19. HA - Critical Components Which services need to be HA? ● Control plane ○ Provisioning, management ○ API endpoints and services ○ Admin nodes ○ Control nodes ● Data plane ○ Steady states ○ Storage ○ Network 19
  • 20. HA Setup ● Stateless services ○ No dependency between requests ○ After reply no further attention required ○ API endpoints (e.g. nova-api, glance-api,...) or nova-scheduler ● Stateful service ○ Action typically comprises out of multiple requests ○ Subsequent requests depend on the results of a former request ○ Databases, RabbitMQ 20
  • 21. HA Setup 21 active/passive active/active stateless ● load balance redundant services ● load balance redundant services stateful ● bring replacement resource online ● redundant services, all with the same state ● state changes are passed to all instances
  • 23. Quorum? ● Required to decide which cluster partition/member is primary to prevent data/service corruption ● Examples: ○ Databases ■ MariaDB / Galera, MongoDB, CassandraDB ○ Pacemaker/corosync ○ Ceph Monitors ■ Paxos ■ Odd number of MONs required ■ At least 3 MONs for HA, simple majority (2:3, 3:5, 4:7, …) ■ Without quorum: ● no changes of cluster membership (e.g. add new MONs/ODSs) ● Clients can’t connect to cluster 23
  • 25. SPoF ● OpenStack HA ○ No SPoF assumed ● Ceph ○ No SPoF assumed ○ Availability of RBDs is critical to VMs ○ Availability of RadosGW can be easily managed via HAProxy ● What in case of failures on higher level? ○ Data center cores or fire compartments ○ Network ■ Physical ■ Misconfiguration ○ Power 25
  • 26. Setup - Two Rooms 26
  • 27. Failure scenarios - FC fails 27
  • 28. Failure scenarios - FC fails 28
  • 29. Failure scenarios - Split brain 29 ● Ceph ● Quorum selects B ● Storage in A stops ● OpenStack HA: ● Selects B ● VMs in B still running ● Best-case scenario
  • 30. Failure scenarios - Split brain 30 ● Ceph ● Quorum selects B ● Storage in A stops ● OpenStack HA: ● Selects A ● VMs in A and B stop working ● Worst-case scenario
  • 31. Other issues ● Replica distribution ○ Two room setup: ■ 2 or 3 replica contain risk of having only one replica left ■ Would require 4 replica (2:2) ● Reduced performance ● Increased traffic and costs ○ Alternative: erasure coding ■ Reduced performance, less space required ● Spare capacity ○ Remaining room requires spare capacity to restore ○ Depends on ■ Failure/restore scenario ■ Replication vs erasure code ○ Costs 31
  • 32. Mitigation - Three FCs 32 ● Third FC/failure zone hosting all services ● Usually higher costs ● More resistant against failures ● Better replica distribution ● More east/west traffic
  • 33. Mitigation - Quorum Room 33 ● Most DCs have backup rooms ● Only a few servers to host quorum related services ● Less cost intensive ● Can mitigate split brain between FCs (depending on network layout)
  • 34. Mitigation - Pets vs Cattle 34 ● NO pets allowed !!! ● Only cloud-ready applications
  • 35. Mitigation - Failure tolerant applications 35 ● Tier level is not the most relevant layer ● Application must build their own cluster mechanisms on top of the DC → increases the availability significantly ● Data replication must be done across multi-region ● In case of a disaster route traffic to different DC ● Many VNF (virtual network functions) already support such setups
  • 36. Mitigation - Federated Object Stores 36 ● Best way to synchronize and replicate data across multiple DC is usage of object storage ● Sync is done asynchronously Open issues: ● Doesn’t solve replication of databases ● Many applications don’t support object storage and need to be adapted ● Applications also need to support regions/zones
  • 37. Mitigation - Outlook ● “OpenStack follows Storage” ○ Use RBDs as fencing devices ○ Extend Ceph MONs ■ Include information about physical placement similar to CRUSH map ■ Enable HA setup to query quorum decisions and map quorum to physical layout ● Passive standby Ceph MONs to ease deployment of MONs if quorum fails ○ http://tracker.ceph.com/projects/ceph/wiki/Passive_monitors ● Generic quorum service/library ? 37
  • 39. Conclusions ● OpenStack and Ceph provide HA if carefully planed ○ Be aware of potential failure scenarios! ○ All Quorums need must be synced ○ Third room must be used ○ Replica distribution and spare capacity must be considered ○ Ceph need more extended quorum information ● Target for five 9’s is E2E ○ Five 9’s on data center level very expensive ○ No pets !!! ○ Distribute applications or services over multiple DCs 39
  • 40. Get involved ! ● Ceph ○ https://ceph.com/community/contribute/ ○ ceph-devel@vger.kernel.org ○ IRC: OFTC ■ #ceph, ■ #ceph-devel ○ Ceph Developer Summit ● OpenStack ○ Cinder, Glance, Manila, ... 40