SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Compute node HA
a.k.a. “can pets survive in OpenStack?”
Adam Spiers
Senior Software Engineer, Cloud & High Availability
aspiers@suse.com
OpenStack London Meetup, Wednesday 18th
November
– short update on upstream development
High Availability in
a typical OpenStack cloud today
3
Typical HA control plane in OpenStack
Pacemaker Cluster
Control Node 1 Node
DRBD
PostgreSQL
RabbitMQ
Keystone
Glance
Nova
Dashboard
Cinder
Neutron
Database Cluster
Node 1 Node 2
DRBD or shared storage
Database
Message Queue
Services Cluster
Node 1 Node 2 Node 3
Orchestration
Keystone
Glance
Nova
Dashboard
Cinder
Telemetry
Neutron
• Maximises cloud uptime
• Automatic restart of
OpenStack controller
services
• Active/Active API services
with load balancing
• DB + MQ either
Active/Active or
Active/Passive
4
Under the covers
Services Cluster
Node 1 Node 2 Node 3
• Recommended by
official HA guide
• HAProxy distributes service
requests
• Pacemaker
‒ monitoring and control of nodes
and services
• Corosync
‒ cluster membership /
messaging / quorum /
leadership election
Corosync
Pacemaker
HAProxy
But what I really want to do is keep my workloads up!
6
HA Cluster
Control node
OS
Message queue
Database
Identity
Images
Block storage
Networking
Dashboard
Compute
OS
Compute node
nova-compute
libvirt
HA only on control plane
OS
Compute node
nova-compute
libvirt
OS
Compute node
nova-compute
libvirt
7
HA Cluster
Control node
OS
Message queue
Database
Identity
Images
Block storage
Networking
Dashboard
Compute
OS
Compute node
nova-compute
libvirt
Can we simply extend the cluster?
OS
Compute node
nova-compute
libvirt
OS
Compute node
nova-compute
libvirt
8
9
Scaling up
• Corosync requires <= 32 nodes
• But we want lots of compute nodes!
• The obvious workarounds are ugly
‒ Multiple compute clusters
‒ introduces unwanted artificial boundaries
‒ Clusters inside / between guest VM instances
‒ requires cloud users to modify guest images (installing & configuring cluster
software)
‒ cluster stacks are not OS-agnostic
‒ cloud is supposed to make things easier not harder!
10
pacemaker_remote to the rescue!
• New(-ish) Pacemaker feature
• Allows arbitrary scalability of an existing
Pacemaker cluster
11
Extending the cluster to compute nodes
Services Cluster
Node 1 Node 2 Node 3
Corosync
Pacemaker
HAProxy Compute node
pacemaker_remote
Compute node
pacemaker_remote
Compute node
pacemaker_remote
Compute node
pacemaker_remote
12
Capabilities
• Increases availability of compute nodes
‒ Detects failed compute services
‒ Automatic recovery of compute services where possible
• “Quarantines” failing compute nodes
‒ STONITH (fencing) extends to remote nodes
• Coordinates with control plane
‒ VMs on dead compute nodes are resurrected elsewhere
‒ In nova, this is described as “evacuation”
13
Public Health Warning
nova evacuate does not really mean evacuation!
14
Think about earthquakes
Not too late
to evacuate
Too late to
evacuate
15
nova terminology
nova live-migration
nova evacuate
16
Public Health Warning
• nova evacuate does not do evacuation
• nova evacuate does resurrection
• In Vancouver, nova developers considered a rename
‒ Hasn't happened yet
‒ Due to impact, seems unlikely to happen any time soon
‒ Whenever you see “evacuate” in a nova-related context,
pretend you saw “resurrect”
17
Existing solutions
• NovaCompute / NovaEvacuate custom OCF RAs
‒ used by Red Hat / SUSE / Intel
‒ works with known limitations
• EvacuationD
‒ PoC to address above limitations
‒ decouples resurrection workflow from Pacemaker
• Masakari (NTT)
‒ similar architecture, different code
‒ monitoring at 3 layers (node, process, hypervisor)
• Approach of AWcloud / ChinaMobile
‒ very different; uses consul / raft / gossip
18
Proposed solutions
• Use Mistral to orchestrate resurrection workflow
• Intel currently working on prototype
• Possibly the most promising approach
‒ Mistral considered pretty solid
‒ This is exactly the kind of thing it was designed for
• However, Mistral currently a SPoF … oops
‒ Don't worry, should be fixed in mitaka cycle
• Feasibility of convergence with Masakari will probably
be analysed within next week or two
19
Community developments
• openstack-resource-agents project now on
stackforge
‒ maintained by me
• New #openstack-ha IRC channel on FreeNode
‒ automatic notifications for activity on HA repositories
• New topic category on openstack-dev@ mailing list
Subject: [HA] i can haz pets in my cloud?
• Weekly IRC meetings at Monday 9am UTC
• HA guide currently undergoing a revamp
• Everyone welcome to get involved!
21
Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of
their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,
and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at
any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in
this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All
third-party trademarks are the property of their respective owners.

Weitere ähnliche Inhalte

Was ist angesagt?

High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstackDeepak Mane
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack CloudQiming Teng
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...OpenStack Korea Community
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMike Dorman
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionJohn Garbutt
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtqViet Stack
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HAtcp cloud
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Vietnam Open Infrastructure User Group
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelDocker, Inc.
 
Integrate Openshift with Cloudforms
Integrate Openshift with CloudformsIntegrate Openshift with Cloudforms
Integrate Openshift with CloudformsMichael Lessard
 
HVX: Virtualizing the Cloud
HVX: Virtualizing the CloudHVX: Virtualizing the Cloud
HVX: Virtualizing the CloudAlex Fishman
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Opcito Technologies
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling LinuxCon ContainerCon CloudOpen China
 
CoreOS Overview and Current Status
CoreOS Overview and Current StatusCoreOS Overview and Current Status
CoreOS Overview and Current StatusSreenivas Makam
 

Was ist angesagt? (20)

High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack Cloud
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the World
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
Rethinking the OS
Rethinking the OSRethinking the OS
Rethinking the OS
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Integrate Openshift with Cloudforms
Integrate Openshift with CloudformsIntegrate Openshift with Cloudforms
Integrate Openshift with Cloudforms
 
HVX: Virtualizing the Cloud
HVX: Virtualizing the CloudHVX: Virtualizing the Cloud
HVX: Virtualizing the Cloud
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
 
OpenStack on AArch64
OpenStack on AArch64OpenStack on AArch64
OpenStack on AArch64
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
 
CoreOS Overview and Current Status
CoreOS Overview and Current StatusCoreOS Overview and Current Status
CoreOS Overview and Current Status
 

Andere mochten auch

Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureEtsuji Nakai
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...Arthur Berezin
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여Ji-Woong Choi
 
Závislosti, injekce a vůbec
Závislosti, injekce a vůbecZávislosti, injekce a vůbec
Závislosti, injekce a vůbecDavid Grudl
 
10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupemDavid Grudl
 

Andere mochten auch (8)

Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여
 
Závislosti, injekce a vůbec
Závislosti, injekce a vůbecZávislosti, injekce a vůbec
Závislosti, injekce a vůbec
 
SMART Response PGO
SMART Response PGOSMART Response PGO
SMART Response PGO
 
10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem
 
Marianne faithfull slide
Marianne faithfull slideMarianne faithfull slide
Marianne faithfull slide
 

Ähnlich wie Compute node HA - current upstream development

Open stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availabilityOpen stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availabilityRick Ashford
 
Learning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSELearning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSEOpenInfra Days Poland 2019
 
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise EnvironmentDeploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise EnvironmentRick Ashford
 
Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt Ceph Community
 
Hands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE CloudHands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE CloudRick Ashford
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Daniel Krook
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Animesh Singh
 
SUSE: Alien Life Forms
SUSE: Alien Life FormsSUSE: Alien Life Forms
SUSE: Alien Life FormsKangaroot
 
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedoresCasos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedoresSUSE España
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudSUSE
 
Productos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSPProductos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSPSUSE España
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.Joe Felisky
 
OSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim WernerOSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim WernerNETWAYS
 
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 WSO2
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Dirk Oppenkowski
 
SUSE: Software Defined Storage
SUSE: Software Defined StorageSUSE: Software Defined Storage
SUSE: Software Defined StorageKangaroot
 

Ähnlich wie Compute node HA - current upstream development (20)

Open stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availabilityOpen stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availability
 
Learning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSELearning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSE
 
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise EnvironmentDeploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
 
Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt
 
High Availability in Neutron
High Availability in NeutronHigh Availability in Neutron
High Availability in Neutron
 
Hands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE CloudHands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE Cloud
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
SUSE: Alien Life Forms
SUSE: Alien Life FormsSUSE: Alien Life Forms
SUSE: Alien Life Forms
 
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedoresCasos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack Cloud
 
Productos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSPProductos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSP
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
 
OSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim WernerOSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim Werner
 
Nova states summit
Nova states summitNova states summit
Nova states summit
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
 
SUSE: Software Defined Storage
SUSE: Software Defined StorageSUSE: Software Defined Storage
SUSE: Software Defined Storage
 

Kürzlich hochgeladen

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Kürzlich hochgeladen (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Compute node HA - current upstream development

  • 1. Compute node HA a.k.a. “can pets survive in OpenStack?” Adam Spiers Senior Software Engineer, Cloud & High Availability aspiers@suse.com OpenStack London Meetup, Wednesday 18th November – short update on upstream development
  • 2. High Availability in a typical OpenStack cloud today
  • 3. 3 Typical HA control plane in OpenStack Pacemaker Cluster Control Node 1 Node DRBD PostgreSQL RabbitMQ Keystone Glance Nova Dashboard Cinder Neutron Database Cluster Node 1 Node 2 DRBD or shared storage Database Message Queue Services Cluster Node 1 Node 2 Node 3 Orchestration Keystone Glance Nova Dashboard Cinder Telemetry Neutron • Maximises cloud uptime • Automatic restart of OpenStack controller services • Active/Active API services with load balancing • DB + MQ either Active/Active or Active/Passive
  • 4. 4 Under the covers Services Cluster Node 1 Node 2 Node 3 • Recommended by official HA guide • HAProxy distributes service requests • Pacemaker ‒ monitoring and control of nodes and services • Corosync ‒ cluster membership / messaging / quorum / leadership election Corosync Pacemaker HAProxy But what I really want to do is keep my workloads up!
  • 5. 6 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt HA only on control plane OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  • 6. 7 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt Can we simply extend the cluster? OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  • 7. 8
  • 8. 9 Scaling up • Corosync requires <= 32 nodes • But we want lots of compute nodes! • The obvious workarounds are ugly ‒ Multiple compute clusters ‒ introduces unwanted artificial boundaries ‒ Clusters inside / between guest VM instances ‒ requires cloud users to modify guest images (installing & configuring cluster software) ‒ cluster stacks are not OS-agnostic ‒ cloud is supposed to make things easier not harder!
  • 9. 10 pacemaker_remote to the rescue! • New(-ish) Pacemaker feature • Allows arbitrary scalability of an existing Pacemaker cluster
  • 10. 11 Extending the cluster to compute nodes Services Cluster Node 1 Node 2 Node 3 Corosync Pacemaker HAProxy Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote
  • 11. 12 Capabilities • Increases availability of compute nodes ‒ Detects failed compute services ‒ Automatic recovery of compute services where possible • “Quarantines” failing compute nodes ‒ STONITH (fencing) extends to remote nodes • Coordinates with control plane ‒ VMs on dead compute nodes are resurrected elsewhere ‒ In nova, this is described as “evacuation”
  • 12. 13 Public Health Warning nova evacuate does not really mean evacuation!
  • 13. 14 Think about earthquakes Not too late to evacuate Too late to evacuate
  • 15. 16 Public Health Warning • nova evacuate does not do evacuation • nova evacuate does resurrection • In Vancouver, nova developers considered a rename ‒ Hasn't happened yet ‒ Due to impact, seems unlikely to happen any time soon ‒ Whenever you see “evacuate” in a nova-related context, pretend you saw “resurrect”
  • 16. 17 Existing solutions • NovaCompute / NovaEvacuate custom OCF RAs ‒ used by Red Hat / SUSE / Intel ‒ works with known limitations • EvacuationD ‒ PoC to address above limitations ‒ decouples resurrection workflow from Pacemaker • Masakari (NTT) ‒ similar architecture, different code ‒ monitoring at 3 layers (node, process, hypervisor) • Approach of AWcloud / ChinaMobile ‒ very different; uses consul / raft / gossip
  • 17. 18 Proposed solutions • Use Mistral to orchestrate resurrection workflow • Intel currently working on prototype • Possibly the most promising approach ‒ Mistral considered pretty solid ‒ This is exactly the kind of thing it was designed for • However, Mistral currently a SPoF … oops ‒ Don't worry, should be fixed in mitaka cycle • Feasibility of convergence with Masakari will probably be analysed within next week or two
  • 18. 19 Community developments • openstack-resource-agents project now on stackforge ‒ maintained by me • New #openstack-ha IRC channel on FreeNode ‒ automatic notifications for activity on HA repositories • New topic category on openstack-dev@ mailing list Subject: [HA] i can haz pets in my cloud? • Weekly IRC meetings at Monday 9am UTC • HA guide currently undergoing a revamp • Everyone welcome to get involved!
  • 19. 21
  • 20. Unpublished Work of SUSE LLC. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.