SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Tackling the Management Challenges of Server
                  Consolidation on Multi-core System


                                                    Hui Lv (hui.lv@intel.com)
                                      Intel SSG/SSD/SOTC/PRC Scalability Lab
                                                                   June. 2011




Software      1                 SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Agenda

   • SPECvirt_sc2010* Introduction
   • SPECvirt_sc2010* Workload Scalability Analysis
   • Hypervisor Overhead Analysis
   • Credit Scheduler Optimizations
   • Conclusions



 * The benchmark runs discussed here are for our research and non-compliant with the SPEC run-rules. The data presented here are only
 to illustrate the points discussed in this paper and cannot be compared with any other SPECvirt_sc2010 results
Software           2                                                             SSG/SSD/SOTC/PRC Scalability Lab
& Services
         group
SPECvirt_sc2010* Workload Introduction
  • Three sub-workloads: SPECjAppServer*, SPECimap*, SPECweb*
  • Six VMs comprise a Tile – to run as many as possible tiles
  • Score: calculate arithmetic mean of the 3 normalized values per tile and sum the
   scores for all Tiles

       Infrastructure     Webserver       IMAP Server       App Server        Database          Idle Server
             VM             VM               VM                VM               VM                  VM      Tile 1



                                    Virtualization Layer (XEN) and Hardware


              SPECweb2005* Driver            SPECimap2007* Driver          SPECjAppServer2004* Driver


Software        3                                                        SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Performance Scalability* Overview
  • Performance scaling got worse as system load increased
  • Response time became longer – worse Qos

                                                                                                                       *




                                                                                  * Response time: Geomean of three kinds
                                                                                  of sub-workload’s response time


   * The benchmark runs discussed here are for our research and non-compliant with the SPEC run-rules. The data presented here are
   only to illustrate the points discussed in this paper and cannot be compared with any other SPECvirt_sc2010 results
Software          4                                                             SSG/SSD/SOTC/PRC Scalability Lab
& Services
        group
CPU Cycles Components Breakdown
  • Hypervisor occupied 28% of the total CPU cycles per transaction – much high overheads!




Software      5                                       SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Hypervisor Overhead Analysis
     • The VMExit event of “Ext Interrupt” consumed ~48% of hypervisor cycles
     • Context Switch consumed 27% of total hypervisor Cycles
     • Most of the context switch happened in the VMExit event of “External Interrupt”
     • Context switches: ~15k per second for one physical core at peak performance
           -- the average running tile slice for a vcpu once scheduled is less than 0.1 ms.




* The cost of VMExit is calculated by removing domain0, cpuidle (7fff). It’s the real overhead for hypervisor to process VMExit.
* Context Switch means the process of de-schedule the current running vcpu and schedule in the next vcpu

 Software               6                                                                         SSG/SSD/SOTC/PRC Scalability Lab
 & Services
             group
Optimizations for Scheduler

  • The process of scheduling consumed a big part in hypervisor. Meanwhile, high frequent
  context switch will also make cache cold thus increase the cycles per instructiion
  • We worked out one way to optimize the scheduling process, so as to reduce overhead
  and improve performance




Software      7                                          SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Generic Scheduler Process
                                      •   Xen supplied generic API for specific
                                          implementation (credit1 and credit2)
                  Pick up next vcpu
                                      •   Two major parts in this flaw
                                          1. To pick up next vcpu (SCHED_P)
                                          2. To do context switch when selecting a new
                                             vcpu (SCHED_C)




Software      8                                      SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Context Switch Rate Controller (SRC)
  Solution: To control scheduling rate in the following conditions
  1) To skip the current scheduling process, if the frequency of context switch is bigger than the threshold during
  last period (10 ms) and last running vcpu is still runnable (not blocked)
  2) To skip the current scheduling process, if last running vcpu runs less than some time slice (1ms) and still
  runnable
         Schedule Triggered

                                                        Y                                           Y
    VCPU1                      ? Rate Control                    ? VCPU1 Runnable                        Ret VCPU1

                                   N
                                                        Y                      N
                         ? Running less than 1ms

                                   N                                 do_schedule
Software        9                                                     SSG/SSD/SOTC/PRC Scalability Lab
& Services
       group
Performance Increase with SRC Optimization
  • Perf/(cpu utilization) boosted by 15%
  • Number of context switch reduced by 50%, thus cycles of hypervisor reduced by 22%
  • Due to less context switch, decreased cache  lower CPI  lower CPU cycles for both
    Guest and Hypervisor
                                        Base     With SRC             SRC/Base
    Perf/(cpu cycles)                     945        1,088                 1.15
    CPU% (Total)                      92.00%       80.88%                  0.88
    Guest U                           31.21%       28.56%                  0.92
    Guest K                           31.58%       28.63%                  0.91
    Dom0                               2.96%        3.20%                  1.08
    Xen                               26.23%       20.48%                  0.78
    SCHED_Total                        7.28%        4.40%                  0.60
    SCHED_Pick (credit)                2.40%        1.54%                  0.64
    SCHED_Context_Switch               2.33%        1.16%                  0.50
    Sched: runs through scheduler   6,312,866    5,304,230                 0.84
    Sched: context switches         6,008,568    3,329,377                 0.55
Software      10                                      SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Credit1 vs. Credit2
  • Credit2 is the prototype brought in XEN 4.x.
  • So far, it can work in complex consolidation environment
  • Currently, overhead of credit2 is a bit higher than credit1 -- much faster pickup process in
    credit2, but slower context switch process
                                          Credit1       Credit2      Credit2/Credit1
    Perf/transaction                        1,254         1,077                0.86
    CPU% (Total)                         46.68%        54.47%                  1.17
    Guest U                              15.21%        16.64%                  1.09
    Guest K                              15.61%        17.24%                  1.10
    Dom0                                   1.82%         2.02%                  1.11
    Xen                                  14.04%        18.58%                  1.32
    SCHED_Total (cycles)                     0.04          0.05                1.24
    SCHED_P (cycles)                       1.32%         0.62%                 0.47
    SCHED_C (cycles)                       0.95%         1.92%                 2.02
    Sched: runs through scheduler      6,339,737     5,808,118                 0.92
    Sched: context switches            4,689,289     4,615,206                 0.98
Software      11                                          SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Conclusion

  • Performance scalability got worse as system load increased in consolidation environment.
  • Hypervisor composed a big part of the total system cycles, ~28%
  • Too frequent context switch resulted in high overhead
  • Some kind of rate controller for Credit scheduler benefit performance improvement
  • Call people attention to continue developing a more powerful scheduler for Xen, in
   complex consolidation environment


    ® Intel and Xeon are trademarks of Intel Corporation in the United States and other countries
    * Other names and brands may be claimed as the property of others.
Software       12                                                       SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Backup




Software      13            SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Hardware Layout
                     SUT X5680 @ 3.33GHz
                                                                         Switch


                                           SR-IOV VFs




                                                                   Clients        Clients
                            iSCSI Direct
                                Link
                                                    iSCSI Target
                                                                             Storage Bay


                                                                                      HBA Card
                   Intel 82599 10Gbit Ethnet Adapter


Software      14                                                         SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Server Under Test Configurations
                   Processor              Intel Xeon 5680 ®
                   Sokets/Cores/Threads   2/12/24
                   Frequency              3.33GHz
                   LLC                    12MB
                   BIOS                   HT ON, Turbo OFF, Power OFF, NUMA ON
                   Memory                 12 x 8GB DDR3
                   Platform               S5520UR
                   Controller             LSI 3801 HBA
                   Storage                ISCSI for data disk, QEMU disk for OS disk
                   Network                82599 10G NIC
                   Hypervisor             Xen upstream c/s 22940
                   VM configs             HVM Guests



Software      15                                                   SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Which Caused the Worse Scalability
  • Cycles/transaction increase was caused by both CPI and Path Length increase
     -- Increase of CPI was partially due to increasing cache miss rate
     -- Increase of PL indicated some software bottlenecks existing




Software      16                                         SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
Hypervisor Events Overview
  • Do we really need so many context switch work – ~15k per second for one physical core at
    peak performance? It means the average running time slice for a vcpu once scheduled is
    less than 0.1 ms.
              Events (number/s)                1tile       9tile            9tile/1tile
              VMExits                        55,862     700,542                 12.54
              Hypercalls                     52,612     417,770                   7.94
              APIC timer interrupts           5,733      31,591                   5.51
              IRQ                            10,633     115,244                 10.84
              IPI                            14,245     139,991                   9.83
              sched: runs through schedule   42,774     348,230                   8.14
              sched: context switches        28,917     302,803                 10.47
              csched: migrate_queued               7     39,757                 5,847
              csched: migrate_running              0           3                   N/A



Software       17                                      SSG/SSD/SOTC/PRC Scalability Lab
& Services
      group
VMExit Events Distribution
  • At peak performance, top three VMExit events were ‘APIC Access’, ‘External Interrupt’ and
      ‘CR Access’
  •   However, larger number does not mean higher overhead – it depends on the cost of related
      VMExit event




Software        18                                      SSG/SSD/SOTC/PRC Scalability Lab
& Services
        group

Weitere ähnliche Inhalte

Was ist angesagt?

booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
Samsung Electronics
 
Tegra 4 outperforms snapdragon
Tegra 4 outperforms snapdragonTegra 4 outperforms snapdragon
Tegra 4 outperforms snapdragon
Brian Caulfield
 
Synchronous Log Shipping Replication
Synchronous Log Shipping ReplicationSynchronous Log Shipping Replication
Synchronous Log Shipping Replication
elliando dias
 
Galvin-operating System(Ch6)
Galvin-operating System(Ch6)Galvin-operating System(Ch6)
Galvin-operating System(Ch6)
dsuyal1
 

Was ist angesagt? (10)

VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
 
Tegra 4 outperforms snapdragon
Tegra 4 outperforms snapdragonTegra 4 outperforms snapdragon
Tegra 4 outperforms snapdragon
 
Synchronous Log Shipping Replication
Synchronous Log Shipping ReplicationSynchronous Log Shipping Replication
Synchronous Log Shipping Replication
 
NetScaler TCP Performance Tuning
NetScaler TCP Performance TuningNetScaler TCP Performance Tuning
NetScaler TCP Performance Tuning
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
Galvin-operating System(Ch6)
Galvin-operating System(Ch6)Galvin-operating System(Ch6)
Galvin-operating System(Ch6)
 
VMworld 2013: Storage DRS: Deep Dive and Best Practices to Suit Your Storage ...
VMworld 2013: Storage DRS: Deep Dive and Best Practices to Suit Your Storage ...VMworld 2013: Storage DRS: Deep Dive and Best Practices to Suit Your Storage ...
VMworld 2013: Storage DRS: Deep Dive and Best Practices to Suit Your Storage ...
 
Reviewer cpu scheduling
Reviewer cpu schedulingReviewer cpu scheduling
Reviewer cpu scheduling
 
Template 1 ch5
Template 1 ch5Template 1 ch5
Template 1 ch5
 

Andere mochten auch

Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
Andri Yabu
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linux
Cuong Tran
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
sprdd
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
Open-NFP
 
Cpu scheduling(suresh)
Cpu scheduling(suresh)Cpu scheduling(suresh)
Cpu scheduling(suresh)
Nagarajan
 

Andere mochten auch (20)

BKK16-404A PCI Development Meeting
BKK16-404A PCI Development MeetingBKK16-404A PCI Development Meeting
BKK16-404A PCI Development Meeting
 
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUsSpecification-Based Test Program Generation for ARM VMSAv8-64 MMUs
Specification-Based Test Program Generation for ARM VMSAv8-64 MMUs
 
Dulloor xen-summit
Dulloor xen-summitDulloor xen-summit
Dulloor xen-summit
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
 
Reverse engineering for_beginners-en
Reverse engineering for_beginners-enReverse engineering for_beginners-en
Reverse engineering for_beginners-en
 
Virtualization overheads
Virtualization overheadsVirtualization overheads
Virtualization overheads
 
Docker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in PragueDocker and friends at Linux Days 2014 in Prague
Docker and friends at Linux Days 2014 in Prague
 
Linux numa evolution
Linux numa evolutionLinux numa evolution
Linux numa evolution
 
BKK16-104 sched-freq
BKK16-104 sched-freqBKK16-104 sched-freq
BKK16-104 sched-freq
 
Gc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linuxGc and-pagescan-attacks-by-linux
Gc and-pagescan-attacks-by-linux
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Known basic of NFV Features
Known basic of NFV FeaturesKnown basic of NFV Features
Known basic of NFV Features
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 
Linux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and OpportunitiesLinux NUMA & Databases: Perils and Opportunities
Linux NUMA & Databases: Perils and Opportunities
 
P4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC OffloadP4, EPBF, and Linux TC Offload
P4, EPBF, and Linux TC Offload
 
Cpu scheduling(suresh)
Cpu scheduling(suresh)Cpu scheduling(suresh)
Cpu scheduling(suresh)
 
Process scheduling linux
Process scheduling linuxProcess scheduling linux
Process scheduling linux
 
NUMA overview
NUMA overviewNUMA overview
NUMA overview
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 

Ähnlich wie Tackling the Management Challenges of Server Consolidation on Multi-core Systems

System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
journalBEEI
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
Framework and Application Benchmarking
Framework and Application BenchmarkingFramework and Application Benchmarking
Framework and Application Benchmarking
Paul Jones
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
James McGalliard
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Fisnik Kraja
 

Ähnlich wie Tackling the Management Challenges of Server Consolidation on Multi-core Systems (20)

SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
DPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. MeltonDPDK Integration: A Product's Journey - Roger B. Melton
DPDK Integration: A Product's Journey - Roger B. Melton
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how
 
KVM Tuning @ eBay
KVM Tuning @ eBayKVM Tuning @ eBay
KVM Tuning @ eBay
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...Towards "write once - run whenever possible" with Safety Critical Java af Ben...
Towards "write once - run whenever possible" with Safety Critical Java af Ben...
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
z/VM Performance Analysis
z/VM Performance Analysisz/VM Performance Analysis
z/VM Performance Analysis
 
Framework and Application Benchmarking
Framework and Application BenchmarkingFramework and Application Benchmarking
Framework and Application Benchmarking
 
Software-defined migration how to migrate bunch of v-ms and volumes within a...
Software-defined migration  how to migrate bunch of v-ms and volumes within a...Software-defined migration  how to migrate bunch of v-ms and volumes within a...
Software-defined migration how to migrate bunch of v-ms and volumes within a...
 
The Pill for Your Migration Hell
The Pill for Your Migration HellThe Pill for Your Migration Hell
The Pill for Your Migration Hell
 
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
YOW2020 Linux Systems Performance
YOW2020 Linux Systems PerformanceYOW2020 Linux Systems Performance
YOW2020 Linux Systems Performance
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 

Mehr von The Linux Foundation

Mehr von The Linux Foundation (20)

ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
XPDDS19: How TrenchBoot is Enabling Measured Launch for Open-Source Platform ...
 
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
XPDDS19 Keynote: Xen in Automotive - Artem Mygaiev, Director, Technology Solu...
 
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
XPDDS19 Keynote: Xen Project Weather Report 2019 - Lars Kurth, Director of Op...
 
XPDDS19 Keynote: Unikraft Weather Report
XPDDS19 Keynote:  Unikraft Weather ReportXPDDS19 Keynote:  Unikraft Weather Report
XPDDS19 Keynote: Unikraft Weather Report
 
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...
 
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, XilinxXPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
 
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
XPDDS19 Keynote: Patch Review for Non-maintainers - George Dunlap, Citrix Sys...
 
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, BitdefenderXPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
XPDDS19: Memories of a VM Funk - Mihai Donțu, Bitdefender
 
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...OSSJP/ALS19:  The Road to Safety Certification: Overcoming Community Challeng...
OSSJP/ALS19: The Road to Safety Certification: Overcoming Community Challeng...
 
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making... OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
OSSJP/ALS19: The Road to Safety Certification: How the Xen Project is Making...
 
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, CitrixXPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
XPDDS19: Speculative Sidechannels and Mitigations - Andrew Cooper, Citrix
 
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltdXPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
XPDDS19: Keeping Coherency on Arm: Reborn - Julien Grall, Arm ltd
 
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
XPDDS19: QEMU PV Backend 'qdevification'... What Does it Mean? - Paul Durrant...
 
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&DXPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
XPDDS19: Status of PCI Emulation in Xen - Roger Pau Monné, Citrix Systems R&D
 
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM SystemsXPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
XPDDS19: [ARM] OP-TEE Mediator in Xen - Volodymyr Babchuk, EPAM Systems
 
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
XPDDS19: Bringing Xen to the Masses: The Story of Building a Community-driven...
 
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
XPDDS19: Will Robots Automate Your Job Away? Streamlining Xen Project Contrib...
 
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
XPDDS19: Client Virtualization Toolstack in Go - Nick Rosbrook & Brendan Kerr...
 
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSEXPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Tackling the Management Challenges of Server Consolidation on Multi-core Systems

  • 1. Tackling the Management Challenges of Server Consolidation on Multi-core System Hui Lv (hui.lv@intel.com) Intel SSG/SSD/SOTC/PRC Scalability Lab June. 2011 Software 1 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 2. Agenda • SPECvirt_sc2010* Introduction • SPECvirt_sc2010* Workload Scalability Analysis • Hypervisor Overhead Analysis • Credit Scheduler Optimizations • Conclusions * The benchmark runs discussed here are for our research and non-compliant with the SPEC run-rules. The data presented here are only to illustrate the points discussed in this paper and cannot be compared with any other SPECvirt_sc2010 results Software 2 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 3. SPECvirt_sc2010* Workload Introduction • Three sub-workloads: SPECjAppServer*, SPECimap*, SPECweb* • Six VMs comprise a Tile – to run as many as possible tiles • Score: calculate arithmetic mean of the 3 normalized values per tile and sum the scores for all Tiles Infrastructure Webserver IMAP Server App Server Database Idle Server VM VM VM VM VM VM Tile 1 Virtualization Layer (XEN) and Hardware SPECweb2005* Driver SPECimap2007* Driver SPECjAppServer2004* Driver Software 3 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 4. Performance Scalability* Overview • Performance scaling got worse as system load increased • Response time became longer – worse Qos * * Response time: Geomean of three kinds of sub-workload’s response time * The benchmark runs discussed here are for our research and non-compliant with the SPEC run-rules. The data presented here are only to illustrate the points discussed in this paper and cannot be compared with any other SPECvirt_sc2010 results Software 4 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 5. CPU Cycles Components Breakdown • Hypervisor occupied 28% of the total CPU cycles per transaction – much high overheads! Software 5 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 6. Hypervisor Overhead Analysis • The VMExit event of “Ext Interrupt” consumed ~48% of hypervisor cycles • Context Switch consumed 27% of total hypervisor Cycles • Most of the context switch happened in the VMExit event of “External Interrupt” • Context switches: ~15k per second for one physical core at peak performance -- the average running tile slice for a vcpu once scheduled is less than 0.1 ms. * The cost of VMExit is calculated by removing domain0, cpuidle (7fff). It’s the real overhead for hypervisor to process VMExit. * Context Switch means the process of de-schedule the current running vcpu and schedule in the next vcpu Software 6 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 7. Optimizations for Scheduler • The process of scheduling consumed a big part in hypervisor. Meanwhile, high frequent context switch will also make cache cold thus increase the cycles per instructiion • We worked out one way to optimize the scheduling process, so as to reduce overhead and improve performance Software 7 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 8. Generic Scheduler Process • Xen supplied generic API for specific implementation (credit1 and credit2) Pick up next vcpu • Two major parts in this flaw 1. To pick up next vcpu (SCHED_P) 2. To do context switch when selecting a new vcpu (SCHED_C) Software 8 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 9. Context Switch Rate Controller (SRC) Solution: To control scheduling rate in the following conditions 1) To skip the current scheduling process, if the frequency of context switch is bigger than the threshold during last period (10 ms) and last running vcpu is still runnable (not blocked) 2) To skip the current scheduling process, if last running vcpu runs less than some time slice (1ms) and still runnable Schedule Triggered Y Y VCPU1 ? Rate Control ? VCPU1 Runnable Ret VCPU1 N Y N ? Running less than 1ms N do_schedule Software 9 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 10. Performance Increase with SRC Optimization • Perf/(cpu utilization) boosted by 15% • Number of context switch reduced by 50%, thus cycles of hypervisor reduced by 22% • Due to less context switch, decreased cache  lower CPI  lower CPU cycles for both Guest and Hypervisor Base With SRC SRC/Base Perf/(cpu cycles) 945 1,088 1.15 CPU% (Total) 92.00% 80.88% 0.88 Guest U 31.21% 28.56% 0.92 Guest K 31.58% 28.63% 0.91 Dom0 2.96% 3.20% 1.08 Xen 26.23% 20.48% 0.78 SCHED_Total 7.28% 4.40% 0.60 SCHED_Pick (credit) 2.40% 1.54% 0.64 SCHED_Context_Switch 2.33% 1.16% 0.50 Sched: runs through scheduler 6,312,866 5,304,230 0.84 Sched: context switches 6,008,568 3,329,377 0.55 Software 10 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 11. Credit1 vs. Credit2 • Credit2 is the prototype brought in XEN 4.x. • So far, it can work in complex consolidation environment • Currently, overhead of credit2 is a bit higher than credit1 -- much faster pickup process in credit2, but slower context switch process Credit1 Credit2 Credit2/Credit1 Perf/transaction 1,254 1,077 0.86 CPU% (Total) 46.68% 54.47% 1.17 Guest U 15.21% 16.64% 1.09 Guest K 15.61% 17.24% 1.10 Dom0 1.82% 2.02% 1.11 Xen 14.04% 18.58% 1.32 SCHED_Total (cycles) 0.04 0.05 1.24 SCHED_P (cycles) 1.32% 0.62% 0.47 SCHED_C (cycles) 0.95% 1.92% 2.02 Sched: runs through scheduler 6,339,737 5,808,118 0.92 Sched: context switches 4,689,289 4,615,206 0.98 Software 11 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 12. Conclusion • Performance scalability got worse as system load increased in consolidation environment. • Hypervisor composed a big part of the total system cycles, ~28% • Too frequent context switch resulted in high overhead • Some kind of rate controller for Credit scheduler benefit performance improvement • Call people attention to continue developing a more powerful scheduler for Xen, in complex consolidation environment ® Intel and Xeon are trademarks of Intel Corporation in the United States and other countries * Other names and brands may be claimed as the property of others. Software 12 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 13. Backup Software 13 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 14. Hardware Layout SUT X5680 @ 3.33GHz Switch SR-IOV VFs Clients Clients iSCSI Direct Link iSCSI Target Storage Bay HBA Card Intel 82599 10Gbit Ethnet Adapter Software 14 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 15. Server Under Test Configurations Processor Intel Xeon 5680 ® Sokets/Cores/Threads 2/12/24 Frequency 3.33GHz LLC 12MB BIOS HT ON, Turbo OFF, Power OFF, NUMA ON Memory 12 x 8GB DDR3 Platform S5520UR Controller LSI 3801 HBA Storage ISCSI for data disk, QEMU disk for OS disk Network 82599 10G NIC Hypervisor Xen upstream c/s 22940 VM configs HVM Guests Software 15 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 16. Which Caused the Worse Scalability • Cycles/transaction increase was caused by both CPI and Path Length increase -- Increase of CPI was partially due to increasing cache miss rate -- Increase of PL indicated some software bottlenecks existing Software 16 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 17. Hypervisor Events Overview • Do we really need so many context switch work – ~15k per second for one physical core at peak performance? It means the average running time slice for a vcpu once scheduled is less than 0.1 ms. Events (number/s) 1tile 9tile 9tile/1tile VMExits 55,862 700,542 12.54 Hypercalls 52,612 417,770 7.94 APIC timer interrupts 5,733 31,591 5.51 IRQ 10,633 115,244 10.84 IPI 14,245 139,991 9.83 sched: runs through schedule 42,774 348,230 8.14 sched: context switches 28,917 302,803 10.47 csched: migrate_queued 7 39,757 5,847 csched: migrate_running 0 3 N/A Software 17 SSG/SSD/SOTC/PRC Scalability Lab & Services group
  • 18. VMExit Events Distribution • At peak performance, top three VMExit events were ‘APIC Access’, ‘External Interrupt’ and ‘CR Access’ • However, larger number does not mean higher overhead – it depends on the cost of related VMExit event Software 18 SSG/SSD/SOTC/PRC Scalability Lab & Services group