SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Evaluation and Enhancement to Memory
             Sharing and Swapping in Xen 4.1
                Xiaowei Yang, Chuan Ye, Qiangmin Lin
            {xiaowei.yang, yechuan, linqiangmin}@huawei.com
                              HUAWEI
Sponsored by:

                 &
Agenda
• Background
• Memory Overcommit Features / Policy
• Evaluation
Background
• Challenges
  • Memory becomes bottleneck as VM# grows
       • E.g.: VM# > 100 in VDI scenario on 2-socket server
  • What’s the proper vMem size?
       • vMem# too small: bad performance
       • vMem# too big: low utilization

• Memory overcommit is a key factor to high VM density /
memory utilization

• ESX / KVM have rich memory overcommit features / policies
  • ESX: Balloon, TPS, Host swap, compression
  • KVM: Balloon, KSM, Host swap

• Xen adds sharing / swapping since 4.0, but
  • Untested in production
  • Need improvements in terms of efficiency, performance
  • No policies

 HUAWEI TECHNOLOGIES CO., LTD.     Huawei Confidential
Goals
• Evaluate current memory overcommit features in Xen
4.1, particularly memory sharing / swapping

• Make enhancements to current memory sharing /
swapping features

• Design a memory overcommit policy to reach higher
VM density w/o sacrificing performance




 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Agenda
• Background
• Memory Overcommit Features / Policy
• Evaluation
Blktap2 Sharing
                                          How it works
                                          Rely on link-clone of parent image

                                          Share
                                          1st read:
                                          • tapdisk2 read from the parent
                                          image on disk, and records the
                                          read / page (p1)

                                          Later read:
                                          • Tapdisk2 finds the record,
                                          notifies Xen HV to share with p1

                                          Unshare
                                          • Write to the RO sharing page
                                          triggers unshare




HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Blktap2 Sharing (2)
• Pros
  • Overhead is lower than content-based page sharing
       • No need to calculate/compare page contents in the background
  • Relief VM bootup storm issue
       • Later read from memory cache


• Cons
  • Sharing% is much lower comparing to content-based sharing
  • Only VMs with the same parent image can share pages
  • Rely on PV driver
       • No share before PV driver loaded – startup
  • Page sharing # is limited by shm #
       • Between tapdisk2 processes to store global hash table




 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
0-page Sharing
                                             How it works

                                             Share
                                             • zerosharing triggers Xen HV to
                                             scan VM’s page contents
                                             periodically
                                             • If Xen HV finds the page’s
                                             content is all 0, free the original
                                             page, and points VM’s
                                             corresponding p2m entry to a
                                             special RO 0-page

                                             Unshare
                                             • Write to the RO 0-page triggers
                                             unshare



* Red words are our enhancements
 HUAWEI TECHNOLOGIES CO., LTD.     Huawei Confidential
0-page Sharing (2)
• 0-page sharing is the most valuable part in content-
based page sharing per our evaluation

• 0-page sharing is more useful to Windows VM than to
Linux VM
  • All free mem are 0-page in Windows – it scrubs before free

• A proper scan rate of 0-page sharing is important
  • Slow: can’t relief memory pressure in time
  • Fast: high CPU util%

• Actually 0-page sweep is used in POD
  • Usage limited - only when POD cache under the pressure



 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Host Swap
                                          How it works

                                          Swap out
                                          • xenpaging selects guest pages
                                          thru policy
                                          • Xenpaing saves their contents to
                                          disk, then notifies Xen HV to free
                                          them

                                          Swap in
                                          • Guest access to the swapped-out
                                          page triggers violation
                                          • Xen HV notifies xenpaging to
                                          read back the contents to a new
                                          allocated page




HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Host Swap (2)
• Pros
  • The only memory overcommit feature that guarantees
    overcommit ratio


• Cons
  • Page selection is hard
       • Which page is proper to be swapped out?
  • Inefficient
       • Disk access latency is much higher
       • Double swap scenario




 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Host Swap Policies
• Basics
  • Random
  • Sequential

• Improvements
  • Skip low memory: which is used for BIOS, kernel image
  • MRU: prevent recent swapped-in pages from being swapped out
  • Aggressive MRU: prevent X continuous pages adjacent to each
    MRU page from being swapped out
  • Swap->sharing: If the elected page is 0-page, share it instead of
    swap

• Advanced
  • Based on statistic of Guest OS page usage in HV
  • Based on Guest OS MM knowledge – enlightened


 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Memory Overcommit Policy
• Metrics
  •   Host Free Memory #
  •   VM Free Memory #
  •   VM Maximum Memory
  •   VM Reserved Memory
  •   VM Current Zero Page #
  •   VM Current Balloon Page #
  •   VM Current Swap Page #
  •   VM Current Sharing Page #

• Configure options
  •   Sharing threshold (default: 20%)
  •   Balloon threshold (default: 10%)
  •   Swap threshold (default: 5%)
  •   0-page sharing scan rate (frequency, page #)
  •   …

 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Memory Overcommit Policy (2)
• No memory pressure
  • Turn off memory overcommit                         used% < 80%


• Host memory pressure is moderate
  • Turn on blktap2 / 0-page sharing
                                                       used% > 80%
  • Set 0-page sharing scan rate

• Host memory pressure is severe
  • Adjust 0-page sharing scan rate                    used% > 90%
  • Start balloon; balloon # by VM metrics

• Host memory pressure is critical
  • Balloon can’t afford memory consumption            used% > 95%
  • Start swap

• When the pressure goes down, return memory to VM

 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Memory Overcommit Support Matrix
                            Balloon                Sharing     Host Swap
QEMU                        OK                     Sharing     Trigger swap in
                                                   breaks
PV driver                   OK                     Sharing     Trigger swap in
                                                   breaks
I/O device passthru         OK                     Conflict!   Conflict!
VM live migration           OK                     Sharing     1. Trigger swap in
                                                   breaks      2. Swapfile
                                                               accessible after L.M.
VM save                     OK                     OK          Triggers swap in
vMem snapshot               OK                     OK          Triggers swap in
VM resume                   OK                     Sharing     OK
                                                   breaks
VM hibernate                1. Balloon in          OK          Triggers swap in
                            2. Redirect to
                            0-page


HUAWEI TECHNOLOGIES CO., LTD.         Huawei Confidential
Agenda
• Background
• Memory Overcommit Features / Policy
• Evaluation
Experimental Environment
                                     Host Configure
   Processor                    2x Intel X5670 @ 2.93GHz, SMT enabled
   Memory                       96GB DDR3
   Storage                      Intel 160G X25-M SSD

                                     VM Workloads
   SPECjbb                      1 vCPU, 4G vMem; SLES 11
                                Heap size: 2.5GB
   Kernel Build                 1 vCPU, 512M vMem; SLES 11
                                Linux kernel: 2.6.32
   Sysbench OLTP                1 vCPU, 1G vMem; SLES 11
                                Database: mysql
   VDI benchmark                1 vCPU, 1G vMem; Windows 7
                                Workload: Office, IE, PDF, Java



HUAWEI TECHNOLOGIES CO., LTD.          Huawei Confidential
Blktap2 / 0-page Sharing # -- VM startup

                                 Unshared #




• 0-page sharing # is dominant
  • Xen scrubs pages (on host startup, domain destroy, …)
  • Lots of `free memory’ are 0-page, can be shared

• Linux uses more memory to boot up
  • Less 0-page sharing #
  • More blktap2 sharing #

 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Blktap2 / 0-page Sharing # -- VDI workload




                    VM’s Sharing # diff w/ VDI workload
                                Before (MB)       After (MB)   Diff
   0-page sharing #             750               400          ↓53%
   Blktap2 sharing #            42                171          ↑409%
   Unshared #                   199               418          ↑210%

HUAWEI TECHNOLOGIES CO., LTD.         Huawei Confidential
0-page Sharing # -- Windows v.s. Linux




• On startup almost all 0-pages are from `free memory’
• Windows: Free memory is 0-page all the time
  • Windows scrubs page before free
  • More friendly to page sharing
• Linux: Free memory is 0-page only on startup
  • Linux doesn’t scrub free page

* Mem Hog test case consumes 500MB memory when running
 HUAWEI TECHNOLOGIES CO., LTD.             Huawei Confidential
Performance Impact of blktap2 Sharing




• Performance impact of blktap2 is negligible
• Scalability is very good

• In theory blktap2 sharing could benefit READ intensive workload
   • First time: from disk; afterwards: from cache
   • But the benefit is not observed in KB test


 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Performance Impact of 0-page Sharing




• Few impacts on the benchmarks’ scores
• Impacts of different scan rates are almost the same
  • 5%-7% CPU overhead in dom0
  • Few new 0-pages are generated during the test – scan finishes fast
  • A better benchmark?


 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Host Swap Policies




• Different policies result in different performances
• Swap->sharing policy brings the best performance most of the
time
• When the remaining vMem# < working set, the performance
drops dramatically


 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Host Swap v.s. Balloon




• Balloon usually performs better than swap
  • Balloon transfers the memory pressure from host to guest
  • Guest OS knows better about memory usage: which is free; which is
    not least/most used

• Swap->sharing policy narrows the gap between swap / balloon


 HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
VM Density -- VDI Workload
                                                                 Projected




                                                                     * Host memory = 96GB


  1. W/ no memory overcommit: VM# = 85, memory is bottleneck
  2. W/ balloon: VM# = 120, CPU/memory are both bottlenecks
  3. W/ balloon+sharing: when VM# = 120, host free memory = 17GB,
  CPU is bottleneck
  4. W/ balloon+sharing: w/o CPU bottleneck, same memory can host
  145 VM (projected)
  * Test 1/2/3 are performed w/ QoE unchanged
HUAWEI TECHNOLOGIES CO., LTD.              Huawei Confidential
Takeaways
• 0-page sharing complements blktap2 sharing. The
combination of both is competitive

• The performance impact and overhead of blktap2 / 0-
page sharing is small if used properly

• Windows is more friendly to 0-page sharing than Linux

• Host swap policy do matters. Swap->sharing policy
narrows the gap between swap / balloon

• In VDI scenario a good memory overcommit policy can
increase VM density by 70+% w/ QoE uncompromised


HUAWEI TECHNOLOGIES CO., LTD.   Huawei Confidential
Thank you
www.huawei.com

Weitere ähnliche Inhalte

Was ist angesagt?

Hyper-V High Availability and Live Migration
Hyper-V High Availability and Live MigrationHyper-V High Availability and Live Migration
Hyper-V High Availability and Live Migration
Paulo Freitas
 
Xen Project Update LinuxCon Brazil
Xen Project Update LinuxCon BrazilXen Project Update LinuxCon Brazil
Xen Project Update LinuxCon Brazil
The Linux Foundation
 
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary session
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary sessionLinaro Connect Asia 13 : Citrix - Xen on ARM plenary session
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary session
The Linux Foundation
 
Running productioninstance 1-localcopy
Running productioninstance 1-localcopyRunning productioninstance 1-localcopy
Running productioninstance 1-localcopy
CloudBees
 

Was ist angesagt? (20)

UDS 2012 Xen
UDS 2012 XenUDS 2012 Xen
UDS 2012 Xen
 
XS Oracle 2009 Vm Snapshots
XS Oracle 2009 Vm SnapshotsXS Oracle 2009 Vm Snapshots
XS Oracle 2009 Vm Snapshots
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
4. Memory virtualization and management
4. Memory virtualization and management4. Memory virtualization and management
4. Memory virtualization and management
 
Hyper-V High Availability and Live Migration
Hyper-V High Availability and Live MigrationHyper-V High Availability and Live Migration
Hyper-V High Availability and Live Migration
 
Xen Project Update LinuxCon Brazil
Xen Project Update LinuxCon BrazilXen Project Update LinuxCon Brazil
Xen Project Update LinuxCon Brazil
 
XCP: The Art of Open Virtualization for the Enterprise and the Cloud
XCP: The Art of Open Virtualization for the Enterprise and the CloudXCP: The Art of Open Virtualization for the Enterprise and the Cloud
XCP: The Art of Open Virtualization for the Enterprise and the Cloud
 
Nakajima numa-final
Nakajima numa-finalNakajima numa-final
Nakajima numa-final
 
XS Oracle 2009 Intro Slides
XS Oracle 2009 Intro SlidesXS Oracle 2009 Intro Slides
XS Oracle 2009 Intro Slides
 
Xen in the Cloud at SCALE 10x
Xen in the Cloud at SCALE 10xXen in the Cloud at SCALE 10x
Xen in the Cloud at SCALE 10x
 
XS Oracle 2009 Error Detection
XS Oracle 2009 Error DetectionXS Oracle 2009 Error Detection
XS Oracle 2009 Error Detection
 
VM Live Migration Speedup in Xen
VM Live Migration Speedup in XenVM Live Migration Speedup in Xen
VM Live Migration Speedup in Xen
 
Hosting
HostingHosting
Hosting
 
Demand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMsDemand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMs
 
3. CPU virtualization and scheduling
3. CPU virtualization and scheduling3. CPU virtualization and scheduling
3. CPU virtualization and scheduling
 
Chen Haibo
Chen HaiboChen Haibo
Chen Haibo
 
CPU Scheduling for Virtual Desktop Infrastructure
CPU Scheduling for Virtual Desktop InfrastructureCPU Scheduling for Virtual Desktop Infrastructure
CPU Scheduling for Virtual Desktop Infrastructure
 
2. OS vs. VMM
2. OS vs. VMM2. OS vs. VMM
2. OS vs. VMM
 
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary session
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary sessionLinaro Connect Asia 13 : Citrix - Xen on ARM plenary session
Linaro Connect Asia 13 : Citrix - Xen on ARM plenary session
 
Running productioninstance 1-localcopy
Running productioninstance 1-localcopyRunning productioninstance 1-localcopy
Running productioninstance 1-localcopy
 

Ähnlich wie Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1

8 christian ferber xen_server_6_news
8 christian ferber xen_server_6_news8 christian ferber xen_server_6_news
8 christian ferber xen_server_6_news
Digicomp Academy AG
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
Alan Renouf
 
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
The Linux Foundation
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
The Linux Foundation
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
The Linux Foundation
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicom
Microsoft Singapore
 
Windows Server Virtualization - Hyper-V 2008 R2
Windows Server Virtualization - Hyper-V 2008 R2Windows Server Virtualization - Hyper-V 2008 R2
Windows Server Virtualization - Hyper-V 2008 R2
aralves
 

Ähnlich wie Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 (20)

Hyper-V Dynamic Memory in Depth
Hyper-V Dynamic Memory in Depth Hyper-V Dynamic Memory in Depth
Hyper-V Dynamic Memory in Depth
 
Virtualization Technology Overview
Virtualization Technology OverviewVirtualization Technology Overview
Virtualization Technology Overview
 
8 christian ferber xen_server_6_news
8 christian ferber xen_server_6_news8 christian ferber xen_server_6_news
8 christian ferber xen_server_6_news
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
 
Virtualization in the cloud
Virtualization in the cloudVirtualization in the cloud
Virtualization in the cloud
 
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
Virtualization in the Cloud @ Build a Cloud Day SFO May 2012
 
Virtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityVirtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and Availability
 
Windsor: Domain 0 Disaggregation for XenServer and XCP
	Windsor: Domain 0 Disaggregation for XenServer and XCP	Windsor: Domain 0 Disaggregation for XenServer and XCP
Windsor: Domain 0 Disaggregation for XenServer and XCP
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
 
Sdc 2012-how-can-hypervisors-leverage-advanced-storage-features-v7.6(20-9-2012)
Sdc 2012-how-can-hypervisors-leverage-advanced-storage-features-v7.6(20-9-2012)Sdc 2012-how-can-hypervisors-leverage-advanced-storage-features-v7.6(20-9-2012)
Sdc 2012-how-can-hypervisors-leverage-advanced-storage-features-v7.6(20-9-2012)
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
Virtualization Manager 5.0 – Now with Hyper-V Support!
Virtualization Manager 5.0 – Now with Hyper-V Support!Virtualization Manager 5.0 – Now with Hyper-V Support!
Virtualization Manager 5.0 – Now with Hyper-V Support!
 
Private cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicomPrivate cloud virtual reality to reality a partner story daniel mar_technicom
Private cloud virtual reality to reality a partner story daniel mar_technicom
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
Xen server 6.1 technical sales presentation
Xen server 6.1 technical sales presentationXen server 6.1 technical sales presentation
Xen server 6.1 technical sales presentation
 
Xen Hypervisor Update 2011
Xen Hypervisor Update 2011Xen Hypervisor Update 2011
Xen Hypervisor Update 2011
 
Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008
 
Vm mobility
Vm mobilityVm mobility
Vm mobility
 
Xen Summit 2009 Shanghai Ras
Xen Summit 2009 Shanghai RasXen Summit 2009 Shanghai Ras
Xen Summit 2009 Shanghai Ras
 
Windows Server Virtualization - Hyper-V 2008 R2
Windows Server Virtualization - Hyper-V 2008 R2Windows Server Virtualization - Hyper-V 2008 R2
Windows Server Virtualization - Hyper-V 2008 R2
 

Mehr von The Linux Foundation

Mehr von The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1

  • 1. Evaluation and Enhancement to Memory Sharing and Swapping in Xen 4.1 Xiaowei Yang, Chuan Ye, Qiangmin Lin {xiaowei.yang, yechuan, linqiangmin}@huawei.com HUAWEI Sponsored by: &
  • 2. Agenda • Background • Memory Overcommit Features / Policy • Evaluation
  • 3. Background • Challenges • Memory becomes bottleneck as VM# grows • E.g.: VM# > 100 in VDI scenario on 2-socket server • What’s the proper vMem size? • vMem# too small: bad performance • vMem# too big: low utilization • Memory overcommit is a key factor to high VM density / memory utilization • ESX / KVM have rich memory overcommit features / policies • ESX: Balloon, TPS, Host swap, compression • KVM: Balloon, KSM, Host swap • Xen adds sharing / swapping since 4.0, but • Untested in production • Need improvements in terms of efficiency, performance • No policies HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 4. Goals • Evaluate current memory overcommit features in Xen 4.1, particularly memory sharing / swapping • Make enhancements to current memory sharing / swapping features • Design a memory overcommit policy to reach higher VM density w/o sacrificing performance HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 5. Agenda • Background • Memory Overcommit Features / Policy • Evaluation
  • 6. Blktap2 Sharing How it works Rely on link-clone of parent image Share 1st read: • tapdisk2 read from the parent image on disk, and records the read / page (p1) Later read: • Tapdisk2 finds the record, notifies Xen HV to share with p1 Unshare • Write to the RO sharing page triggers unshare HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 7. Blktap2 Sharing (2) • Pros • Overhead is lower than content-based page sharing • No need to calculate/compare page contents in the background • Relief VM bootup storm issue • Later read from memory cache • Cons • Sharing% is much lower comparing to content-based sharing • Only VMs with the same parent image can share pages • Rely on PV driver • No share before PV driver loaded – startup • Page sharing # is limited by shm # • Between tapdisk2 processes to store global hash table HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 8. 0-page Sharing How it works Share • zerosharing triggers Xen HV to scan VM’s page contents periodically • If Xen HV finds the page’s content is all 0, free the original page, and points VM’s corresponding p2m entry to a special RO 0-page Unshare • Write to the RO 0-page triggers unshare * Red words are our enhancements HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 9. 0-page Sharing (2) • 0-page sharing is the most valuable part in content- based page sharing per our evaluation • 0-page sharing is more useful to Windows VM than to Linux VM • All free mem are 0-page in Windows – it scrubs before free • A proper scan rate of 0-page sharing is important • Slow: can’t relief memory pressure in time • Fast: high CPU util% • Actually 0-page sweep is used in POD • Usage limited - only when POD cache under the pressure HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 10. Host Swap How it works Swap out • xenpaging selects guest pages thru policy • Xenpaing saves their contents to disk, then notifies Xen HV to free them Swap in • Guest access to the swapped-out page triggers violation • Xen HV notifies xenpaging to read back the contents to a new allocated page HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 11. Host Swap (2) • Pros • The only memory overcommit feature that guarantees overcommit ratio • Cons • Page selection is hard • Which page is proper to be swapped out? • Inefficient • Disk access latency is much higher • Double swap scenario HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 12. Host Swap Policies • Basics • Random • Sequential • Improvements • Skip low memory: which is used for BIOS, kernel image • MRU: prevent recent swapped-in pages from being swapped out • Aggressive MRU: prevent X continuous pages adjacent to each MRU page from being swapped out • Swap->sharing: If the elected page is 0-page, share it instead of swap • Advanced • Based on statistic of Guest OS page usage in HV • Based on Guest OS MM knowledge – enlightened HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 13. Memory Overcommit Policy • Metrics • Host Free Memory # • VM Free Memory # • VM Maximum Memory • VM Reserved Memory • VM Current Zero Page # • VM Current Balloon Page # • VM Current Swap Page # • VM Current Sharing Page # • Configure options • Sharing threshold (default: 20%) • Balloon threshold (default: 10%) • Swap threshold (default: 5%) • 0-page sharing scan rate (frequency, page #) • … HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 14. Memory Overcommit Policy (2) • No memory pressure • Turn off memory overcommit used% < 80% • Host memory pressure is moderate • Turn on blktap2 / 0-page sharing used% > 80% • Set 0-page sharing scan rate • Host memory pressure is severe • Adjust 0-page sharing scan rate used% > 90% • Start balloon; balloon # by VM metrics • Host memory pressure is critical • Balloon can’t afford memory consumption used% > 95% • Start swap • When the pressure goes down, return memory to VM HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 15. Memory Overcommit Support Matrix Balloon Sharing Host Swap QEMU OK Sharing Trigger swap in breaks PV driver OK Sharing Trigger swap in breaks I/O device passthru OK Conflict! Conflict! VM live migration OK Sharing 1. Trigger swap in breaks 2. Swapfile accessible after L.M. VM save OK OK Triggers swap in vMem snapshot OK OK Triggers swap in VM resume OK Sharing OK breaks VM hibernate 1. Balloon in OK Triggers swap in 2. Redirect to 0-page HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 16. Agenda • Background • Memory Overcommit Features / Policy • Evaluation
  • 17. Experimental Environment Host Configure Processor 2x Intel X5670 @ 2.93GHz, SMT enabled Memory 96GB DDR3 Storage Intel 160G X25-M SSD VM Workloads SPECjbb 1 vCPU, 4G vMem; SLES 11 Heap size: 2.5GB Kernel Build 1 vCPU, 512M vMem; SLES 11 Linux kernel: 2.6.32 Sysbench OLTP 1 vCPU, 1G vMem; SLES 11 Database: mysql VDI benchmark 1 vCPU, 1G vMem; Windows 7 Workload: Office, IE, PDF, Java HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 18. Blktap2 / 0-page Sharing # -- VM startup Unshared # • 0-page sharing # is dominant • Xen scrubs pages (on host startup, domain destroy, …) • Lots of `free memory’ are 0-page, can be shared • Linux uses more memory to boot up • Less 0-page sharing # • More blktap2 sharing # HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 19. Blktap2 / 0-page Sharing # -- VDI workload VM’s Sharing # diff w/ VDI workload Before (MB) After (MB) Diff 0-page sharing # 750 400 ↓53% Blktap2 sharing # 42 171 ↑409% Unshared # 199 418 ↑210% HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 20. 0-page Sharing # -- Windows v.s. Linux • On startup almost all 0-pages are from `free memory’ • Windows: Free memory is 0-page all the time • Windows scrubs page before free • More friendly to page sharing • Linux: Free memory is 0-page only on startup • Linux doesn’t scrub free page * Mem Hog test case consumes 500MB memory when running HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 21. Performance Impact of blktap2 Sharing • Performance impact of blktap2 is negligible • Scalability is very good • In theory blktap2 sharing could benefit READ intensive workload • First time: from disk; afterwards: from cache • But the benefit is not observed in KB test HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 22. Performance Impact of 0-page Sharing • Few impacts on the benchmarks’ scores • Impacts of different scan rates are almost the same • 5%-7% CPU overhead in dom0 • Few new 0-pages are generated during the test – scan finishes fast • A better benchmark? HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 23. Host Swap Policies • Different policies result in different performances • Swap->sharing policy brings the best performance most of the time • When the remaining vMem# < working set, the performance drops dramatically HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 24. Host Swap v.s. Balloon • Balloon usually performs better than swap • Balloon transfers the memory pressure from host to guest • Guest OS knows better about memory usage: which is free; which is not least/most used • Swap->sharing policy narrows the gap between swap / balloon HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 25. VM Density -- VDI Workload Projected * Host memory = 96GB 1. W/ no memory overcommit: VM# = 85, memory is bottleneck 2. W/ balloon: VM# = 120, CPU/memory are both bottlenecks 3. W/ balloon+sharing: when VM# = 120, host free memory = 17GB, CPU is bottleneck 4. W/ balloon+sharing: w/o CPU bottleneck, same memory can host 145 VM (projected) * Test 1/2/3 are performed w/ QoE unchanged HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential
  • 26. Takeaways • 0-page sharing complements blktap2 sharing. The combination of both is competitive • The performance impact and overhead of blktap2 / 0- page sharing is small if used properly • Windows is more friendly to 0-page sharing than Linux • Host swap policy do matters. Swap->sharing policy narrows the gap between swap / balloon • In VDI scenario a good memory overcommit policy can increase VM density by 70+% w/ QoE uncompromised HUAWEI TECHNOLOGIES CO., LTD. Huawei Confidential