SlideShare ist ein Scribd-Unternehmen logo
1 von 13
© 2015 NTT Software Innovation Center
Masakari: Virtual Machine-HA for OpenStack
27/Oct/2015
Masahito Muroi, NTT
2Copyright©2015 NTT corp. All Rights Reserved.
What’s Masakari
• In general context
• まさかり (Masakari) is Japanese word for an “axe” or a “hatchet”
• Used for cutting down trees, Not weapon
• Trademark for 金太郎 (KINTARO)
• Name of the Japanese fairy story and its main character’s name
• In engineering context
• “まさかりを投げる (masakari wo nageru)”
• Roughly translated “Throwing a Masakari”
• Meaning “point out a mistake in conferences or presentations”
• In OpenStack context
• Virtual Machine High Availability (VM-HA) service
• Rescue Virtual Machine when any errors occur
• Published as OSS at github https://github.com/ntt-sic/masakari
Copyright © いらすとや. All Rights Reserved.
3Copyright©2015 NTT corp. All Rights Reserved.
Motivations
• Pets vs Cattle
• Unable to change all Apps to Cloud Native at once
• Open Source
4Copyright©2015 NTT corp. All Rights Reserved.
Requirements for Pets Model
• Detect 3 types of VM down
• Unexpected VM down
• VM manager down
• Host down
• Recover VM within 10 mins
• Work automatically
5Copyright©2015 NTT corp. All Rights Reserved.
Architecture Overview
ComputeNodesControllerNodes
&BackendNodes
6Copyright©2015 NTT corp. All Rights Reserved.
How to detect the 3 down
• VM down
• monitoring libvert’s events
• Manager Process down
• Monitoring manager process
• Host down
• Using Pacemaker
7Copyright©2015 NTT corp. All Rights Reserved.
Detect VM Down
Libvirt
Masakari
1. Notify down VM’s Info
(VM-ID, Host Name, etc.)
Libvirt Monitor
Detect VM down
VM1 VM2 VM3
Libvirt
Libvirt Monitor
VM5 VM6
HostHost
Nova
2. Call Rebuild API for the down VM
3. Rebuild the VM
Down
8Copyright©2015 NTT corp. All Rights Reserved.
Manager Process Down
1. Restart manager
process when it’s down
Process Monitor
Masakari
2. Notify manager process down
if fail to restart few times
Libvirt Nova-compute
Host A
Libvirt Nova-compute
Host B
Nova
3. Notify Nova to disable schedule
for Host A
Process Monitor
Down
9Copyright©2015 NTT corp. All Rights Reserved.
Host Down
RA
CIB
RA
RA
Node’s
Status
pacemaker
Heartbeat communications
Masakari
Check its Host’s status
1. Notify another host down
Start
Stop
Monitor
WatchDog&
Shutdowner
Host Fail Monitor
Polling
RA
CIB
RA
RA
Node’s
Status
pacemaker
Start
Stop
Monitor
WatchDog&
Shutdowner
Host Fail Monitor
Polling
Down
Host A Host B
Nova
2. Call Evacuate API for all VM on Host B
10Copyright©2015 NTT corp. All Rights Reserved.
How to use Masakari
1. Prerequisites
• Set up Nova and Compute Nodes with KVM
• Set up a shared storage per cluster for ephemeral disks (e.g. NFS)
2. Install and Configure Masakari
• Download source from github
• https://github.com/ntt-sic/masakari
• Install Masakari’s package
• Initialize Masakari’s DB
• Configure 4 Masakari’s config files
3. Start Masakari
• Start all process
• Add a reserved host prepared for host down
4. Wait any error
• Masakari only works when any error occurs
11Copyright©2015 NTT corp. All Rights Reserved.
Challenges
• No branch from OpenStack master
12Copyright©2015 NTT corp. All Rights Reserved.
Other session related to Masakari
• Korejanai Story: How To Integrate OpenStack Into Your
Business Strategy (http://sched.co/49wG)
13Copyright©2015 NTT corp. All Rights Reserved.
Github: https://github.com/ntt-sic/masakari
Mail: muroi.masahito@lab.ntt.co.jp
Market Place: S14 NTT Group

Weitere ähnliche Inhalte

Andere mochten auch

VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingQiming Teng
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016Qiming Teng
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)rootfs32
 
openstackの仮想マシンHA機能の現状と今後の方向性
openstackの仮想マシンHA機能の現状と今後の方向性openstackの仮想マシンHA機能の現状と今後の方向性
openstackの仮想マシンHA機能の現状と今後の方向性Sampath Priyankara
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheuskawamuray
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech dayArthur Berezin
 
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator PerspectiveStackStorm
 

Andere mochten auch (8)

VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region Scaling
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)
20151030 jun lee_vnf 의 reliabilityavailability 제공을 위한 방법 (최종)
 
openstackの仮想マシンHA機能の現状と今後の方向性
openstackの仮想マシンHA機能の現状と今後の方向性openstackの仮想マシンHA機能の現状と今後の方向性
openstackの仮想マシンHA機能の現状と今後の方向性
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
 

Kürzlich hochgeladen

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 

Kürzlich hochgeladen (20)

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 

Masakari: Virtual Machine High Availability for OpenStack

  • 1. © 2015 NTT Software Innovation Center Masakari: Virtual Machine-HA for OpenStack 27/Oct/2015 Masahito Muroi, NTT
  • 2. 2Copyright©2015 NTT corp. All Rights Reserved. What’s Masakari • In general context • まさかり (Masakari) is Japanese word for an “axe” or a “hatchet” • Used for cutting down trees, Not weapon • Trademark for 金太郎 (KINTARO) • Name of the Japanese fairy story and its main character’s name • In engineering context • “まさかりを投げる (masakari wo nageru)” • Roughly translated “Throwing a Masakari” • Meaning “point out a mistake in conferences or presentations” • In OpenStack context • Virtual Machine High Availability (VM-HA) service • Rescue Virtual Machine when any errors occur • Published as OSS at github https://github.com/ntt-sic/masakari Copyright © いらすとや. All Rights Reserved.
  • 3. 3Copyright©2015 NTT corp. All Rights Reserved. Motivations • Pets vs Cattle • Unable to change all Apps to Cloud Native at once • Open Source
  • 4. 4Copyright©2015 NTT corp. All Rights Reserved. Requirements for Pets Model • Detect 3 types of VM down • Unexpected VM down • VM manager down • Host down • Recover VM within 10 mins • Work automatically
  • 5. 5Copyright©2015 NTT corp. All Rights Reserved. Architecture Overview ComputeNodesControllerNodes &BackendNodes
  • 6. 6Copyright©2015 NTT corp. All Rights Reserved. How to detect the 3 down • VM down • monitoring libvert’s events • Manager Process down • Monitoring manager process • Host down • Using Pacemaker
  • 7. 7Copyright©2015 NTT corp. All Rights Reserved. Detect VM Down Libvirt Masakari 1. Notify down VM’s Info (VM-ID, Host Name, etc.) Libvirt Monitor Detect VM down VM1 VM2 VM3 Libvirt Libvirt Monitor VM5 VM6 HostHost Nova 2. Call Rebuild API for the down VM 3. Rebuild the VM Down
  • 8. 8Copyright©2015 NTT corp. All Rights Reserved. Manager Process Down 1. Restart manager process when it’s down Process Monitor Masakari 2. Notify manager process down if fail to restart few times Libvirt Nova-compute Host A Libvirt Nova-compute Host B Nova 3. Notify Nova to disable schedule for Host A Process Monitor Down
  • 9. 9Copyright©2015 NTT corp. All Rights Reserved. Host Down RA CIB RA RA Node’s Status pacemaker Heartbeat communications Masakari Check its Host’s status 1. Notify another host down Start Stop Monitor WatchDog& Shutdowner Host Fail Monitor Polling RA CIB RA RA Node’s Status pacemaker Start Stop Monitor WatchDog& Shutdowner Host Fail Monitor Polling Down Host A Host B Nova 2. Call Evacuate API for all VM on Host B
  • 10. 10Copyright©2015 NTT corp. All Rights Reserved. How to use Masakari 1. Prerequisites • Set up Nova and Compute Nodes with KVM • Set up a shared storage per cluster for ephemeral disks (e.g. NFS) 2. Install and Configure Masakari • Download source from github • https://github.com/ntt-sic/masakari • Install Masakari’s package • Initialize Masakari’s DB • Configure 4 Masakari’s config files 3. Start Masakari • Start all process • Add a reserved host prepared for host down 4. Wait any error • Masakari only works when any error occurs
  • 11. 11Copyright©2015 NTT corp. All Rights Reserved. Challenges • No branch from OpenStack master
  • 12. 12Copyright©2015 NTT corp. All Rights Reserved. Other session related to Masakari • Korejanai Story: How To Integrate OpenStack Into Your Business Strategy (http://sched.co/49wG)
  • 13. 13Copyright©2015 NTT corp. All Rights Reserved. Github: https://github.com/ntt-sic/masakari Mail: muroi.masahito@lab.ntt.co.jp Market Place: S14 NTT Group

Hinweis der Redaktion

  1. Masakari has many kind of meaning depending on its context. First of all, I explain how we use the phrase ‘Masakari’ in our life. Generally speaking, Masakari reminds us of ‘axe’ and/or KINTARO. KINTARO is a name of the Japanese fairy story and of its main character name. In engineering context, we use phrase ‘まさかりをなげる’. Roughly translated “Throwing a Masakari” This means “Point out a mistake in conferences or in their presentations.” Especially, it focuses on technically mistakes. Finally, in OpenStack context “Masakari” means Virtual Machine High Availability service NTT developed. This service rescues Virtual Machines when any errors occur for these. This service is now available in Github. Check it out after this presentation. Now, you’ve learned meaning of Masakari in OpenStack context, so it’s a good time to go.
  2. We had 3 motivations for developing Masakari. First of all, to enable pet type Virtual Machines & Application to work on OpenStack. As you know pets vs cattle issue is under discussion long time in the community. Ideally speaking from developer perspective, when we replace our Infra with OpenStack, the app on OpenStack should be cloud native application, which means that we should change App and the Infra at once. However, in general it’s hard to accomplish because of their budget, release time or any reason. We’ve developed Virtual Machine HA to change each app on OpenStack to cloud native step by step, even though we understand a bad way to introduce pets model to cloud. That’s why we named this HA ‘Masakri’ we use in engineering context, which means ‘point out a mistake of ourself’. Finally, we wanted to make‘Masakari’ open. We’re deploying OpenStack in production, so we’d like to volunteer OpenStack in any way. We think making “Masakari” open is one of our contribution. From next slides, I’ll show you quick overview of ‘Masakari’.
  3. We had 3 kind of requirements for Masakari. 1. Detect 3 types of VM down 2. Recover VM within few minutes. 3. Of course, this recovering works automatically
  4. This is quick architecture overview of Masakari. Masakari is roughly divided to 2 parts. One is Masakari controller presented in light blue box at top of the slide. Another is state monitoring processes displayed in the boxes at bottom of the slide. The controller process is in charge of calling OpenStack API depending on the type of notification from monitoring processes. The monitoring processes are monitoring whether each type of error Masakari want to detect occurs or not. In following slides, I’ll present you how each monitoring process is monitoring different errors.
  5. As I mentioned before, the monitoring processes detect 3 types down. These are ways to detect failures. For detecting VM down, Masakari hooks libvirt’s event which is sent from libvirt. For detecting manager process down, Masakari monitors whether manager processes works well or not. For detecting host down, Masakari uses pacemaker.
  6. I show you a quick instruction of Masakari. Before setting up Masakari, there are 2 prerequisites, Masakari assumes Compute Node uses KVM as its virtualizing technology and shares storages for ephemeral disks, NFS or ceph. 2.
  7. Masakari doesn’t require its user to modify OpenStack. It means everyone can start using Masakari into their OpenStack cloud. The reason why we chose this way is that we’d like to upgrade OpenStack after new version of OpenStack is released. OpenStack improves its feature in each releases, so we’d like to use new features as possible as we can. Unfortunately, the more we change codes at one release, the more difficult we upgrade to new one. We tried this challenge and accomplished.
  8. Once again. Check the URL and send me if you have any question. And then we have display in Market place S14. Please ask us in next booth.