SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
PVHVM Linux guest
why doesn't kexec work?
Vitaly Kuznetsov
Red Hat
Xen Developer Summit, 2015
2 PVHVM Linux guest: why doesn't kexec work?
Why?
● We support Red Hat Enterprise Linux.
● Bare hardware, virtualized and cloud environments, ...
● Kernel issues happen.
● Analyse stack traces.
● In complicated cases use kdump!
3 PVHVM Linux guest: why doesn't kexec work?
Kexec/kdump
● “kexec … is a mechanism of the Linux kernel that
allows "live" booting of a new kernel "over" the
currently running kernel”
● Kdump uses kexec:
● Some memory is reserved at boot (crashkernel=)
● Crash kernel/initrd are loaded to the area.
● On crash we trigger crash kernel's boot.
● Crash initrd dumps all domain's memory and reboots.
● You have crash file to analyse! (profit!!!)
Doesn't work for Xen guests
5 PVHVM Linux guest: why doesn't kexec work?
Issues with Kexec on PVHVM
● Previously used structures cause problems, no good
way to transfer knowledge to kexec kernel.
● and we need these interfaces working!
● Xen/guest interfaces we need to re-establish:
● shared_info frame (XENMAPSPACE_shared_info)
● VCPU_info (VCPUOP_register_vcpu_info)
● Event channels (EVTCHNOP_bind_*, ABI)
● + Emuirq/pirq mappings (PHYSDEVOP_map_pirq)
● Granted pages
6 PVHVM Linux guest: why doesn't kexec work?
shared_info page:
● 4k page, belongs to Xen hypervisor.
● Required for events, vcpu_info for first 32 VCPUs lives
here.
● Upon boot guest chooses one of its pages to sacrifice.
● XENMEM_add_to_physmap(XENMAPSPACE_shared_info)
frees guest's frame and mounts shared_info there.
● kexec kernel does the same for another frame → we
get a hole as shared_info is being unmapped from its
previous place.
7 PVHVM Linux guest: why doesn't kexec work?
Event channels:
● Already bound event channels
● “(XEN) event_channel.c:370:d2v0 EVTCHNOP failure: error -17”
● 2 level → FIFO ABI switch at boot
● Mapped control block, event array pages.
● Some INTERDOMAIN channels are being set up by
the toolstack:
● Xenstore, xenconsole,..
● EVTCHNOP_reset resets everything, there is no
way back.
8 PVHVM Linux guest: why doesn't kexec work?
Grant pages:
● Memory sharing mechanism in Xen.
● We can't do anything guest-side:
● Forcibly unmapping a page from backend domain
will crash it.
● Requesting new pages requires additional memory.
● Some grants are “persistent”.
● Maybe not-an-issue for kdump because its memory
region is separated but
● We still need functional backends for kexec kernel!
Possible solutions
10 PVHVM Linux guest: why doesn't kexec work?
“Obvious solution”
● Implement set of hypercalls to tear all interfaces down:
● reset_vcpu_info
● evtchn_switch_to_2l
● unmap_shared_info
● do_something_with_granted_pages
● …
● Good from “if there is a way to set something up there
should be one to tear it down” PoV.
● Good for hypervisor testing :-)
11 PVHVM Linux guest: why doesn't kexec work?
“Obvious solution”
● Issues:
● Domain needs to follow a special protocol – what if
it doesn't?
● Granted pages story is complicated.
● Not all bits are being set up by the domain.
● Too many possible issues (including security).
12 PVHVM Linux guest: why doesn't kexec work?
“New domain with the same memory”
● Destroy the original domain leaving its memory intact.
● Create new domain, reassign all memory pages, copy
vcpu contexts.
● Benefits:
● No cumbersome teardown required!
● Migration path is being reused!
● Supportability: new interfaces/objects should “just
work”.
13 PVHVM Linux guest: why doesn't kexec work?
“New domain with the same memory”
● Issues:
● Memory reassignment appears to be
cumbersome :-(
● Superpages, PoD, mem_access issues.
● No m2p on ARM.
● Non-trivial toolstack part repeating migration code.
● Too complicated.
14 PVHVM Linux guest: why doesn't kexec work?
“Reset everything”
● No cumbersome memory reassignment.
● Explicit list of interfaces to reset with one hypercall:
● shared_info, vcpu_info, event channels,
pirq_to_emuirq, ioreq servers.
● Toolstack involvement required:
● Restart device model.
● Reopen xenstore/xenconsole event channels.
● ..
● Hypervisor maintainers like it :-)
15 PVHVM Linux guest: why doesn't kexec work?
“Reset everything”
● Granted pages - let's do (almost) nothing!
● Remove the domain from xenstore and add it back
– all backends are supposed to release all
mappings.
● Xenconsoled doesn't release its mapping (but that's
fine).
● Special debug print to find future issues.
● Hunt for misbehaving backends! (if there are such)
Current status and
future work
17 PVHVM Linux guest: why doesn't kexec work?
Current status and future work
● [PATCH v10 00/11] “toolstack-assisted approach to
PVHVM guest kexec” is out waiting for reviewers!
● … and testers too!
● PVH (as "HVM without device model") should "just
work".
● Not tested, minor issues are possible.
● ARM-specific part is -ENOSYS stub for now.
● shared_info page needs handling (same as x86).
● Some GIC cleanup?
Thank you!
Questions?
Vitaly Kuznetsov
vkuznets@redhat.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Storage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackStorage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStack
 
High Fidelity Games: Real Examples, Best Practices ... | Oleksii Vasylenko
High Fidelity Games: Real Examples, Best Practices ... | Oleksii VasylenkoHigh Fidelity Games: Real Examples, Best Practices ... | Oleksii Vasylenko
High Fidelity Games: Real Examples, Best Practices ... | Oleksii Vasylenko
 
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebulaOpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
OpenNebulaConf 2016 - Storage Hands-on Workshop by Javier Fontán, OpenNebula
 
Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!Native Clients, more the merrier with GFProxy!
Native Clients, more the merrier with GFProxy!
 
Kvm optimizations
Kvm optimizationsKvm optimizations
Kvm optimizations
 
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
 
Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails
Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in RailsRails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails
Rails Conf Europe 2007 - Utilizing Amazon S3 and EC2 in Rails
 
Optimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_PerformanceOptimization_of_Virtual_Machines_for_High_Performance
Optimization_of_Virtual_Machines_for_High_Performance
 
Intro to vagrant
Intro to vagrantIntro to vagrant
Intro to vagrant
 
OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....
OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....
OpenNebulaConf 2016 - Networking, NFVs and SDNs Hands-on Workshop by Rubén S....
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo Applications
 
Introducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management PlatformIntroducing Apricot, The Eclipse Content Management Platform
Introducing Apricot, The Eclipse Content Management Platform
 
OpenNebulaConf 2016 - Evolution of OpenNebula at Netways by Sebastian Saemann...
OpenNebulaConf 2016 - Evolution of OpenNebula at Netways by Sebastian Saemann...OpenNebulaConf 2016 - Evolution of OpenNebula at Netways by Sebastian Saemann...
OpenNebulaConf 2016 - Evolution of OpenNebula at Netways by Sebastian Saemann...
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
 
QCon 2017 - Java/JVM com Docker em produção: lições das trincheiras
QCon 2017 - Java/JVM com Docker em produção: lições das trincheirasQCon 2017 - Java/JVM com Docker em produção: lições das trincheiras
QCon 2017 - Java/JVM com Docker em produção: lições das trincheiras
 
XPDS14: libvirt support for libxenlight - James Fehlig, SUSE
XPDS14: libvirt support for libxenlight - James Fehlig, SUSEXPDS14: libvirt support for libxenlight - James Fehlig, SUSE
XPDS14: libvirt support for libxenlight - James Fehlig, SUSE
 
OpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on TutorialOpenNebula 4.14 Hands-on Tutorial
OpenNebula 4.14 Hands-on Tutorial
 
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providersPLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
PLNOG 4: Leszek Urbański - A modern HTTP accelerator for content providers
 
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
OpenNebulaconf2017US: Multi-Site Hyperconverged OpenNebula with DRBD9
 
OpenNebula 5.4 Hands-on Tutorial
OpenNebula 5.4 Hands-on TutorialOpenNebula 5.4 Hands-on Tutorial
OpenNebula 5.4 Hands-on Tutorial
 

Andere mochten auch (8)

Hosted Virtualization
Hosted VirtualizationHosted Virtualization
Hosted Virtualization
 
Virtual Pc Seminar
Virtual Pc SeminarVirtual Pc Seminar
Virtual Pc Seminar
 
Virtualization
VirtualizationVirtualization
Virtualization
 
Lemay jin-reddy-schoudel
Lemay jin-reddy-schoudelLemay jin-reddy-schoudel
Lemay jin-reddy-schoudel
 
LCA13: Xen on ARM
LCA13: Xen on ARMLCA13: Xen on ARM
LCA13: Xen on ARM
 
Memory virtualization
Memory virtualizationMemory virtualization
Memory virtualization
 
2. OS vs. VMM
2. OS vs. VMM2. OS vs. VMM
2. OS vs. VMM
 
PACE-IT: Virtualization Technology - N10 006
PACE-IT: Virtualization Technology - N10 006PACE-IT: Virtualization Technology - N10 006
PACE-IT: Virtualization Technology - N10 006
 

Ähnlich wie Seattle2015 xen

Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
guest72e8c1
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
aaajjj4
 
kexec / kdump implementation in Linux Kernel and Xen hypervisor
kexec / kdump implementation in Linux Kernel and Xen hypervisorkexec / kdump implementation in Linux Kernel and Xen hypervisor
kexec / kdump implementation in Linux Kernel and Xen hypervisor
The Linux Foundation
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
Sim Janghoon
 

Ähnlich wie Seattle2015 xen (20)

Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)Libvirt/KVM Driver Update (Kilo)
Libvirt/KVM Driver Update (Kilo)
 
The sexy world of Linux kernel pvops project
The sexy world of Linux kernel pvops projectThe sexy world of Linux kernel pvops project
The sexy world of Linux kernel pvops project
 
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012Hands on Virtualization with Ganeti (part 1)  - LinuxCon 2012
Hands on Virtualization with Ganeti (part 1) - LinuxCon 2012
 
RMLL / LSM 2009
RMLL / LSM 2009RMLL / LSM 2009
RMLL / LSM 2009
 
Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0Rmll Virtualization As Is Tool 20090707 V1.0
Rmll Virtualization As Is Tool 20090707 V1.0
 
LXC on Ganeti
LXC on GanetiLXC on Ganeti
LXC on Ganeti
 
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdfStorage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
 
OpenVZ Linux Containers
OpenVZ Linux ContainersOpenVZ Linux Containers
OpenVZ Linux Containers
 
kexec / kdump implementation in Linux Kernel and Xen hypervisor
kexec / kdump implementation in Linux Kernel and Xen hypervisorkexec / kdump implementation in Linux Kernel and Xen hypervisor
kexec / kdump implementation in Linux Kernel and Xen hypervisor
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On PremTo Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
To Russia with Love: Deploying Kubernetes in Exotic Locations On Prem
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
Deploying containers and managing them on multiple Docker hosts, Docker Meetu...
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Quickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStackQuickly Debug VM Failures in OpenStack
Quickly Debug VM Failures in OpenStack
 
Local development environment evolution
Local development environment evolutionLocal development environment evolution
Local development environment evolution
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Kvm performance optimization for ubuntu
Kvm performance optimization for ubuntuKvm performance optimization for ubuntu
Kvm performance optimization for ubuntu
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Seattle2015 xen

  • 1. PVHVM Linux guest why doesn't kexec work? Vitaly Kuznetsov Red Hat Xen Developer Summit, 2015
  • 2. 2 PVHVM Linux guest: why doesn't kexec work? Why? ● We support Red Hat Enterprise Linux. ● Bare hardware, virtualized and cloud environments, ... ● Kernel issues happen. ● Analyse stack traces. ● In complicated cases use kdump!
  • 3. 3 PVHVM Linux guest: why doesn't kexec work? Kexec/kdump ● “kexec … is a mechanism of the Linux kernel that allows "live" booting of a new kernel "over" the currently running kernel” ● Kdump uses kexec: ● Some memory is reserved at boot (crashkernel=) ● Crash kernel/initrd are loaded to the area. ● On crash we trigger crash kernel's boot. ● Crash initrd dumps all domain's memory and reboots. ● You have crash file to analyse! (profit!!!)
  • 4. Doesn't work for Xen guests
  • 5. 5 PVHVM Linux guest: why doesn't kexec work? Issues with Kexec on PVHVM ● Previously used structures cause problems, no good way to transfer knowledge to kexec kernel. ● and we need these interfaces working! ● Xen/guest interfaces we need to re-establish: ● shared_info frame (XENMAPSPACE_shared_info) ● VCPU_info (VCPUOP_register_vcpu_info) ● Event channels (EVTCHNOP_bind_*, ABI) ● + Emuirq/pirq mappings (PHYSDEVOP_map_pirq) ● Granted pages
  • 6. 6 PVHVM Linux guest: why doesn't kexec work? shared_info page: ● 4k page, belongs to Xen hypervisor. ● Required for events, vcpu_info for first 32 VCPUs lives here. ● Upon boot guest chooses one of its pages to sacrifice. ● XENMEM_add_to_physmap(XENMAPSPACE_shared_info) frees guest's frame and mounts shared_info there. ● kexec kernel does the same for another frame → we get a hole as shared_info is being unmapped from its previous place.
  • 7. 7 PVHVM Linux guest: why doesn't kexec work? Event channels: ● Already bound event channels ● “(XEN) event_channel.c:370:d2v0 EVTCHNOP failure: error -17” ● 2 level → FIFO ABI switch at boot ● Mapped control block, event array pages. ● Some INTERDOMAIN channels are being set up by the toolstack: ● Xenstore, xenconsole,.. ● EVTCHNOP_reset resets everything, there is no way back.
  • 8. 8 PVHVM Linux guest: why doesn't kexec work? Grant pages: ● Memory sharing mechanism in Xen. ● We can't do anything guest-side: ● Forcibly unmapping a page from backend domain will crash it. ● Requesting new pages requires additional memory. ● Some grants are “persistent”. ● Maybe not-an-issue for kdump because its memory region is separated but ● We still need functional backends for kexec kernel!
  • 10. 10 PVHVM Linux guest: why doesn't kexec work? “Obvious solution” ● Implement set of hypercalls to tear all interfaces down: ● reset_vcpu_info ● evtchn_switch_to_2l ● unmap_shared_info ● do_something_with_granted_pages ● … ● Good from “if there is a way to set something up there should be one to tear it down” PoV. ● Good for hypervisor testing :-)
  • 11. 11 PVHVM Linux guest: why doesn't kexec work? “Obvious solution” ● Issues: ● Domain needs to follow a special protocol – what if it doesn't? ● Granted pages story is complicated. ● Not all bits are being set up by the domain. ● Too many possible issues (including security).
  • 12. 12 PVHVM Linux guest: why doesn't kexec work? “New domain with the same memory” ● Destroy the original domain leaving its memory intact. ● Create new domain, reassign all memory pages, copy vcpu contexts. ● Benefits: ● No cumbersome teardown required! ● Migration path is being reused! ● Supportability: new interfaces/objects should “just work”.
  • 13. 13 PVHVM Linux guest: why doesn't kexec work? “New domain with the same memory” ● Issues: ● Memory reassignment appears to be cumbersome :-( ● Superpages, PoD, mem_access issues. ● No m2p on ARM. ● Non-trivial toolstack part repeating migration code. ● Too complicated.
  • 14. 14 PVHVM Linux guest: why doesn't kexec work? “Reset everything” ● No cumbersome memory reassignment. ● Explicit list of interfaces to reset with one hypercall: ● shared_info, vcpu_info, event channels, pirq_to_emuirq, ioreq servers. ● Toolstack involvement required: ● Restart device model. ● Reopen xenstore/xenconsole event channels. ● .. ● Hypervisor maintainers like it :-)
  • 15. 15 PVHVM Linux guest: why doesn't kexec work? “Reset everything” ● Granted pages - let's do (almost) nothing! ● Remove the domain from xenstore and add it back – all backends are supposed to release all mappings. ● Xenconsoled doesn't release its mapping (but that's fine). ● Special debug print to find future issues. ● Hunt for misbehaving backends! (if there are such)
  • 17. 17 PVHVM Linux guest: why doesn't kexec work? Current status and future work ● [PATCH v10 00/11] “toolstack-assisted approach to PVHVM guest kexec” is out waiting for reviewers! ● … and testers too! ● PVH (as "HVM without device model") should "just work". ● Not tested, minor issues are possible. ● ARM-specific part is -ENOSYS stub for now. ● shared_info page needs handling (same as x86). ● Some GIC cleanup?