SlideShare a Scribd company logo
1 of 24
Download to read offline
KVM High Availability
regardless of storage
CloudStack™ European User Group Virtual - May 27th 2021
Who am I?
gabriel@apache.org
• Gabriel Beims Bräscher, Brazilian
• Software Developer at PCextreme B.V.
○ Dutch hosting company founded in 2004
• 2013: First time using CloudStack (CloudStack 4.1.0)
• 2017: Apache CloudStack Committer
• 2019: CloudStack Project Management Committee (PMC)
• 2021: Appointed by the ASF as PMC Chair (VP) of CloudStack
CloudStack™ European User Group Virtual - May 27th 2021
• CloudStack KVM HA
• Health Check with NFS
• Can we have KVM HA without NFS?
• KVM HA regardless of storage
• Take away: future
Summary
What this presentation brings?
CloudStack™ European User Group Virtual - May 27th 2021
CloudStack KVM HA
Why configure HA for Hosts?
Why?
• Improve QoS
○ VMs should run as much as possible
○ Hosts should not stay “Down”
CloudStack™ European User Group Virtual - May 27th 2021
CloudStack KVM HA
Why configure HA for Hosts?
How it works?
Why?
• Improve QoS
○ VMs should run as much as possible
○ Hosts should not stay “Down”
How?
• Detect problematic Host
• Re-start its stopped VMs
CloudStack™ European User Group Virtual - May 27th 2021
Why?
• Improve QoS
○ VMs should run as much as possible
○ Hosts should not stay “Down”
How?
• Detect problematic Host
• Recover or Fence it
• Re-start its stopped VMs
We don’t want 2 VMs mapped to same storage path
• CloudStack cannot reach a Host
• VMs are still running and writing/reading on storage
CloudStack KVM HA
Why configure HA for Hosts?
How it works?
CloudStack™ European User Group Virtual - May 27th 2021
CloudStack KVM HA
Why configure HA for Hosts?
How it works?
HA States
CloudStack™ European User Group Virtual - May 27th 2021
Link: https://github.com/apache/cloudstack/blob/master/api/src/main/java/org/apache/cloudstack/ha/HAConfig.java
Host HA States
• Disabled: HA Operations disabled
• Available: The resource is healthy
• Ineligible: The current state does not support HA/recovery
• Suspect: Most recent health check failed
• Degraded: The resource cannot be managed, but services end user
requests
• Checking: The activity checks are currently being performed
• Recovering: The resource is undergoing recovery operation
• Recovered: The resource is recovered
• Fencing: The resource is undergoing fence operation
• Fenced: The resource is fenced
CloudStack KVM HA
Why configure HA for Hosts?
How it works?
HA States
CloudStack™ European User Group Virtual - May 27th 2021
Link: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
Out-of-band management
• IPMI
• Redfish (CloudStack +4.15.0)
Enable HA
• VMs Service offerings enabled for HA
• Hosts enabled for HA
Use NFS as shared primary storage pool
CloudStack KVM HA
Why configure HA for Hosts?
How it works?
HA States
Requirements
CloudStack™ European User Group Virtual - May 27th 2021
Why NFS?
• Hosts in the same cluster can check the same storage
• Check the storage activity
How it works?
• HeartBeat script running on KVM nodes checks if can write/read on the
mounted NFS partition
Health Check with NFS
Why use NFS?
CloudStack™ European User Group Virtual - May 27th 2021
Health Check with NFS
Today, with NFS
CloudStack™ European User Group Virtual - May 27th 2021
Currently KVM HA works by monitoring an NFS based heartbeat file and it can often
fail whenever this network share becomes slower, causing the hypervisors to reboot.
This can be particularly annoying when you have different kinds of primary storages in
place which are working fine (people running CEPH etc).
...
This is embarrassing. How can we fix it? Ideas, suggestions? How are other hypervisors
doing it?
– Nux 09, October, 2015
JIRA Issue: CLOUDSTACK-8943
Health Check with NFS
Why use NFS?
CloudStack™ European User Group Virtual - May 27th 2021
Link: https://issues.apache.org/jira/browse/CLOUDSTACK-8943
Possible validations
• Request to the CloudStack Agent (JVM) -- Java can crash
• Check storage activity -- cost to implement & maintain (for each
storage)
• Check via Libvirt
• Ping host -- Ping is limited and often firewalls can block
Can we have KVM HA without NFS?
What are the possible validations?
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
CloudStack + KVM + HA - NFS
CloudStack™ European User Group Virtual - May 27th 2021
Possible validations
• Request to the CloudStack Agent (JVM) -- Java can crash
• Check storage activity -- cost to implement & maintain (for each
storage)
• Check via Libvirt
• Ping host -- Ping is limited and often firewalls can block
KVM HA regardless of storage
Today, with NFS
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
Proposal with KVM HA Agent Helper web-service
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
HTTP Request for checking neighbour hosts
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
What if NFS check fails?
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
What if NFS check fails?
What if KVM HA Helper Fails?
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
What if NFS check fails?
What if KVM HA Helper Fails?
What if both fails?
CloudStack™ European User Group Virtual - May 27th 2021
KVM HA regardless of storage
In a nutshell
CloudStack™ European User Group Virtual - May 27th 2021
HTTP Rest API that checks Libvirt - KVM HA Agent
• The web-service runs Libvirt commands to list VMs ( ~$ virsh list )
• Checks neighbour hosts via the same agent
• One can enable or disable the KVM HA Agent checks
• If NFS is used on the cluster, it is also taken into account
• If no NFS is used, Heart Beat checks are skipped
Example:
• HTTP GET -> http://host.name:8080/
○ response: {"count": 3, "virtualmachines": ["r-123-VM", "v-134-VM", "s-111-VM"]}
• HTTP GET -> http://host.name:8080/check-neighbour/neighbour.name:8080
○ response: {"status": "Up"} OR {"status": "Down"}
KVM HA regardless of storage
Possible outcomes
All Good
• HTTP Request gets a response listing VMs that matches DB
Warning
• HTTP Request gets a response but listed VMs does not match DB
Recover/Fence
• HTTP Request gets a response listing Zero VMs but according to the DB
there are VMs running
• HTTP Request gets an error code (e.g. 404), Service is not reachable
CloudStack™ European User Group Virtual - May 27th 2021
• HA systems are critical and will always need attention
• HA can be done regardless of storage
• However, combining multiple checks can lead to robust
systems
• Code is already available at PR #4978
• Running on a test environment
• Aim implementation for 4.16.0.0 or next LTS
Take away
Future
CloudStack™ European User Group Virtual - May 27th 2021
Link for PR: https://github.com/apache/cloudstack/pull/4978
Thanks!
Questions?
#CSEUGvirtual #cloudstack #cloustackworks
CloudStack™ European User Group Virtual - May 27th 2021
contact: gabriel@apache.org

More Related Content

What's hot

What's hot (20)

Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
 
OpenStack Neutron Tutorial
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorial
 
Building a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir MelnikBuilding a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir Melnik
 
CloudStack Networking
CloudStack NetworkingCloudStack Networking
CloudStack Networking
 
Monitoring in CloudStack
Monitoring in CloudStackMonitoring in CloudStack
Monitoring in CloudStack
 
Meetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStackMeetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStack
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
News And Development Update Of The CloudStack Tungsten Fabric SDN Plug-in
News And Development Update Of The CloudStack Tungsten Fabric SDN Plug-inNews And Development Update Of The CloudStack Tungsten Fabric SDN Plug-in
News And Development Update Of The CloudStack Tungsten Fabric SDN Plug-in
 
Issues of OpenStack multi-region mode
Issues of OpenStack multi-region modeIssues of OpenStack multi-region mode
Issues of OpenStack multi-region mode
 
OpenStack networking (Neutron)
OpenStack networking (Neutron) OpenStack networking (Neutron)
OpenStack networking (Neutron)
 
Routed Provider Networks on OpenStack
Routed Provider Networks on OpenStack Routed Provider Networks on OpenStack
Routed Provider Networks on OpenStack
 
VXLAN Integration with CloudStack Advanced Zone
VXLAN Integration with CloudStack Advanced ZoneVXLAN Integration with CloudStack Advanced Zone
VXLAN Integration with CloudStack Advanced Zone
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
 
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
 
OpenStack Networking
OpenStack NetworkingOpenStack Networking
OpenStack Networking
 
CloudStack Architecture
CloudStack ArchitectureCloudStack Architecture
CloudStack Architecture
 
OpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image LifecycleOpenShift Virtualization - VM and OS Image Lifecycle
OpenShift Virtualization - VM and OS Image Lifecycle
 
How to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with CephHow to Survive an OpenStack Cloud Meltdown with Ceph
How to Survive an OpenStack Cloud Meltdown with Ceph
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 

Similar to KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache CloudStack - CloudStack European User Group Virtual, May 2021

Successfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIOSuccessfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIO
Arraya Solutions
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
Fabio Fumarola
 

Similar to KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache CloudStack - CloudStack European User Group Virtual, May 2021 (20)

Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
 
OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 
Linux containers and docker
Linux containers and dockerLinux containers and docker
Linux containers and docker
 
Containerising bootiful microservices javaeeconf
Containerising bootiful microservices javaeeconfContainerising bootiful microservices javaeeconf
Containerising bootiful microservices javaeeconf
 
Ravello webinar - Creating smart labs on AWS/Google for sales demos, training...
Ravello webinar - Creating smart labs on AWS/Google for sales demos, training...Ravello webinar - Creating smart labs on AWS/Google for sales demos, training...
Ravello webinar - Creating smart labs on AWS/Google for sales demos, training...
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3
 
Successfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIOSuccessfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIO
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
 
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
Barak Merimovich (GIgaSpaces) & Gal Moav (Ravello) - Devstack on Demand, Open...
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
Becoming the master of disaster... with asr
Becoming the master of disaster... with asrBecoming the master of disaster... with asr
Becoming the master of disaster... with asr
 
Devstack On Demand
Devstack On DemandDevstack On Demand
Devstack On Demand
 
OpenStack - JobShop @Iași, 2016
OpenStack - JobShop @Iași, 2016OpenStack - JobShop @Iași, 2016
OpenStack - JobShop @Iași, 2016
 
2 Linux Container and Docker
2 Linux Container and Docker2 Linux Container and Docker
2 Linux Container and Docker
 
Selecting the correct hypervisor for CloudStack 4.5
Selecting the correct hypervisor for CloudStack 4.5Selecting the correct hypervisor for CloudStack 4.5
Selecting the correct hypervisor for CloudStack 4.5
 
Container and Cloud Native Application: What is VMware doing in this space? -...
Container and Cloud Native Application: What is VMware doing in this space? -...Container and Cloud Native Application: What is VMware doing in this space? -...
Container and Cloud Native Application: What is VMware doing in this space? -...
 
Develop with linux containers and docker
Develop with linux containers and dockerDevelop with linux containers and docker
Develop with linux containers and docker
 

More from ShapeBlue

More from ShapeBlue (20)

CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlueCloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
CloudStack Authentication Methods – Harikrishna Patnala, ShapeBlue
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlueVM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
VM Migration from VMware to CloudStack and KVM – Suresh Anaparti, ShapeBlue
 
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHubHow We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
How We Grew Up with CloudStack and its Journey – Dilip Singh, DataHub
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
CloudStack 101: The Best Way to Build Your Private Cloud – Rohit Yadav, VP Ap...
 
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIOHow We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
How We Use CloudStack to Provide Managed Hosting - Swen Brüseke - proIO
 
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
 
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
 
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
 
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
 
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
Use Existing Assets to Build a Powerful In-house Cloud Solution - Magali Perv...
 
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
 
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
 
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
 
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
 
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
 
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
 
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

KVM High Availability Regardless of Storage - Gabriel Brascher, VP of Apache CloudStack - CloudStack European User Group Virtual, May 2021

  • 1. KVM High Availability regardless of storage CloudStack™ European User Group Virtual - May 27th 2021
  • 2. Who am I? gabriel@apache.org • Gabriel Beims Bräscher, Brazilian • Software Developer at PCextreme B.V. ○ Dutch hosting company founded in 2004 • 2013: First time using CloudStack (CloudStack 4.1.0) • 2017: Apache CloudStack Committer • 2019: CloudStack Project Management Committee (PMC) • 2021: Appointed by the ASF as PMC Chair (VP) of CloudStack CloudStack™ European User Group Virtual - May 27th 2021
  • 3. • CloudStack KVM HA • Health Check with NFS • Can we have KVM HA without NFS? • KVM HA regardless of storage • Take away: future Summary What this presentation brings? CloudStack™ European User Group Virtual - May 27th 2021
  • 4. CloudStack KVM HA Why configure HA for Hosts? Why? • Improve QoS ○ VMs should run as much as possible ○ Hosts should not stay “Down” CloudStack™ European User Group Virtual - May 27th 2021
  • 5. CloudStack KVM HA Why configure HA for Hosts? How it works? Why? • Improve QoS ○ VMs should run as much as possible ○ Hosts should not stay “Down” How? • Detect problematic Host • Re-start its stopped VMs CloudStack™ European User Group Virtual - May 27th 2021
  • 6. Why? • Improve QoS ○ VMs should run as much as possible ○ Hosts should not stay “Down” How? • Detect problematic Host • Recover or Fence it • Re-start its stopped VMs We don’t want 2 VMs mapped to same storage path • CloudStack cannot reach a Host • VMs are still running and writing/reading on storage CloudStack KVM HA Why configure HA for Hosts? How it works? CloudStack™ European User Group Virtual - May 27th 2021
  • 7. CloudStack KVM HA Why configure HA for Hosts? How it works? HA States CloudStack™ European User Group Virtual - May 27th 2021 Link: https://github.com/apache/cloudstack/blob/master/api/src/main/java/org/apache/cloudstack/ha/HAConfig.java Host HA States • Disabled: HA Operations disabled • Available: The resource is healthy • Ineligible: The current state does not support HA/recovery • Suspect: Most recent health check failed • Degraded: The resource cannot be managed, but services end user requests • Checking: The activity checks are currently being performed • Recovering: The resource is undergoing recovery operation • Recovered: The resource is recovered • Fencing: The resource is undergoing fence operation • Fenced: The resource is fenced
  • 8. CloudStack KVM HA Why configure HA for Hosts? How it works? HA States CloudStack™ European User Group Virtual - May 27th 2021 Link: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
  • 9. Out-of-band management • IPMI • Redfish (CloudStack +4.15.0) Enable HA • VMs Service offerings enabled for HA • Hosts enabled for HA Use NFS as shared primary storage pool CloudStack KVM HA Why configure HA for Hosts? How it works? HA States Requirements CloudStack™ European User Group Virtual - May 27th 2021
  • 10. Why NFS? • Hosts in the same cluster can check the same storage • Check the storage activity How it works? • HeartBeat script running on KVM nodes checks if can write/read on the mounted NFS partition Health Check with NFS Why use NFS? CloudStack™ European User Group Virtual - May 27th 2021
  • 11. Health Check with NFS Today, with NFS CloudStack™ European User Group Virtual - May 27th 2021
  • 12. Currently KVM HA works by monitoring an NFS based heartbeat file and it can often fail whenever this network share becomes slower, causing the hypervisors to reboot. This can be particularly annoying when you have different kinds of primary storages in place which are working fine (people running CEPH etc). ... This is embarrassing. How can we fix it? Ideas, suggestions? How are other hypervisors doing it? – Nux 09, October, 2015 JIRA Issue: CLOUDSTACK-8943 Health Check with NFS Why use NFS? CloudStack™ European User Group Virtual - May 27th 2021 Link: https://issues.apache.org/jira/browse/CLOUDSTACK-8943
  • 13. Possible validations • Request to the CloudStack Agent (JVM) -- Java can crash • Check storage activity -- cost to implement & maintain (for each storage) • Check via Libvirt • Ping host -- Ping is limited and often firewalls can block Can we have KVM HA without NFS? What are the possible validations? CloudStack™ European User Group Virtual - May 27th 2021
  • 14. KVM HA regardless of storage CloudStack + KVM + HA - NFS CloudStack™ European User Group Virtual - May 27th 2021 Possible validations • Request to the CloudStack Agent (JVM) -- Java can crash • Check storage activity -- cost to implement & maintain (for each storage) • Check via Libvirt • Ping host -- Ping is limited and often firewalls can block
  • 15. KVM HA regardless of storage Today, with NFS CloudStack™ European User Group Virtual - May 27th 2021
  • 16. KVM HA regardless of storage Proposal with KVM HA Agent Helper web-service CloudStack™ European User Group Virtual - May 27th 2021
  • 17. KVM HA regardless of storage HTTP Request for checking neighbour hosts CloudStack™ European User Group Virtual - May 27th 2021
  • 18. KVM HA regardless of storage What if NFS check fails? CloudStack™ European User Group Virtual - May 27th 2021
  • 19. KVM HA regardless of storage What if NFS check fails? What if KVM HA Helper Fails? CloudStack™ European User Group Virtual - May 27th 2021
  • 20. KVM HA regardless of storage What if NFS check fails? What if KVM HA Helper Fails? What if both fails? CloudStack™ European User Group Virtual - May 27th 2021
  • 21. KVM HA regardless of storage In a nutshell CloudStack™ European User Group Virtual - May 27th 2021 HTTP Rest API that checks Libvirt - KVM HA Agent • The web-service runs Libvirt commands to list VMs ( ~$ virsh list ) • Checks neighbour hosts via the same agent • One can enable or disable the KVM HA Agent checks • If NFS is used on the cluster, it is also taken into account • If no NFS is used, Heart Beat checks are skipped Example: • HTTP GET -> http://host.name:8080/ ○ response: {"count": 3, "virtualmachines": ["r-123-VM", "v-134-VM", "s-111-VM"]} • HTTP GET -> http://host.name:8080/check-neighbour/neighbour.name:8080 ○ response: {"status": "Up"} OR {"status": "Down"}
  • 22. KVM HA regardless of storage Possible outcomes All Good • HTTP Request gets a response listing VMs that matches DB Warning • HTTP Request gets a response but listed VMs does not match DB Recover/Fence • HTTP Request gets a response listing Zero VMs but according to the DB there are VMs running • HTTP Request gets an error code (e.g. 404), Service is not reachable CloudStack™ European User Group Virtual - May 27th 2021
  • 23. • HA systems are critical and will always need attention • HA can be done regardless of storage • However, combining multiple checks can lead to robust systems • Code is already available at PR #4978 • Running on a test environment • Aim implementation for 4.16.0.0 or next LTS Take away Future CloudStack™ European User Group Virtual - May 27th 2021 Link for PR: https://github.com/apache/cloudstack/pull/4978
  • 24. Thanks! Questions? #CSEUGvirtual #cloudstack #cloustackworks CloudStack™ European User Group Virtual - May 27th 2021 contact: gabriel@apache.org