SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Key Considerations for
Virtual Infrastructure Benchmarking
White Paper
Diversity of Virtualization Platforms
Cloud Services whether SaaS, IaaS, CaaS, VDI, or PaaS are adding incredible diversity in
both the number of providers and services offered. However most of this ecosystem is
running on a handful of virtual platforms –or Hypervisors –from vendors such as VMware,
Amazon, and Microsoft, or from an open source platform like KVM. Compared to traditional
server-client applications running over traditional IP networking products, virtualization
platforms are comparatively new. They are also considerably less tested in terms of network
application performance as they run over standard x86 based server hardware, opposed
to purpose built proprietary hardware from networking equipment vendors. In fact the
longstanding demarcation between boxes hosting server applications and those performing
packet forwarding operations is breaking down. Now a cluster of servers or even a single
host can serve as an application server and a major network gateway forwarding device
simultaneously. The challenges this presents comes from the significant differences that exist
between application workload processing, and network packet processing and how best
to share CPU time and allocate other critical compute resources. This paper focuses on the
above mentioned challenges and why testing hypervisor performance for Cloud and NFV
use cases is critical, especially in a live cloud and virtual environment.
Hypervisor and benchmarking its performance
Virtualization introduces a number of issues in the areas of shared resource allocation, some
of which have been more specifically addressed like the use of shadow tables, while others
such as the scheduling algorithm implementation are not as clear cut as there is no best fit
scheduler for all application job types. While the variables affecting hypervisor performance
can be many, we will take a closer look at few factors in the following sections.
The world of computing is changing
at a rapid pace. Cloud Computing,
Software-Designed Data Centers
(SDDC), and Network Functions
Virtualization (NFV) all rely heavily
on Virtualization technologies to
deliver their benefits. Virtualization
not only enables new functionalities
like NFV, VM migration and elastic
computing, but also brings with it a
tremendous opportunity for capital
and operational cost savings. Most
of the savings come from utilization
optimization. By redeploying more
functionality into software over
hypervisors using COTS hardware,
and also consolidating the number
of physical servers deployed, unseen
levels of resource optimization
and cost savings can be achieved.
This hyper-optimized configuration
comes at costs however; platform
centralization and greater software
resource scheduling complexity.
Key Considerations for Virtual Infrastructure Benchmarking
2 | spirent.com
White Paper
Virtual CPUs
Most hypervisor schedulers are modeled with fairness between VMs in mind like round robin servicing, or even a resource
allocation budget that prioritizes access time for VMs that have been least active. Design differences between the guest
scheduler and the hypervisor scheduler can cause significant application degradation especially for time-sensitive jobs like those
associated with network applications.
Hierarchical scheduling problem, also referred to as a semantic gap, between the guest OS scheduler and the hypervisor
scheduler can result in Application processing delay beyond that of normal job execution time as potentially two completely
different schedulers must play a role in job scheduling. There is also the potential for conflicting scheduling designs. At the heart
of the issue is incognizance of the hypervisor scheduler to that of the guest scheduler. For example, guests running Windows and
MAC OS X have a multilevel feedback queue implemented. This design uses a number of FIFO queues each assigned a time
quantum from lowest to highest value. This enforces shorter duration processes to be executed first in lower order queues, while
longer jobs drop into queues with a higher quantum one queue at a time until execution finishes, getting less precedence with
each queue change.
Figure 1: Blocking between two VMs on a monolithic type 1 host configured with 1-n vCPUs
The potential issues arising from double scheduling that impact application performance are:
ƒƒ Conflicting precedence between guest and hypervisor scheduling algorithms
ƒƒ Preemption of critical sections on guest vCPU by hypervisor scheduler
ƒƒ vCPU stacking where semaphore holder schedules after semaphore waiter
ƒƒ Task priority inversion and CPU fragmentation due to co-scheduling implementations with multi-processor VMs
Virtual platform vendors have implemented or are considering various types of hybrid scheduler designs or adding heuristics
to existing ones to mitigate these problems such as relaxed co-scheduling (gang scheduling), borrowed virtual time scheduling
(BVT), real-time deferrable server (RTDS), or even combination approaches like using round robin or weighted round robin
(WRR) algorithms depending on the process requiring CPU time for their virtual machine monitors (VMMs). The designs continue
to evolve, but ultimately hypervisor scheduling comes down to basing on fairness, prioritization, or execution time, or some
combination of the three. Regardless of the algorithm(s) chosen, only testing will expose the true performance of a scheduler
under real workloads in large-scale environments.
spirent.com | 3
Virtual Memory
One of the performance implication of virtualization is duplication of Dynamic Address Translation (DAT) lookups in the main
secondary storage page table, and shadow table instances on VMs. Similar to double scheduling, virtual to physical addresses
translations by default are performed unless new technologies are enabled such as Second Level Address Translation (SLAT) or
“nested paging”. Because memory does not store values contiguously, Virtual addresses are used by applications. The OS is then
responsible for mapping these virtual addresses and the data stored there to the corresponding physical regions. This is known as
“paging” or using a “swap file”. Just as main memory uses cache to speed up the search, paged memory has Translation Lookaside
Buffers (TLB) to speed up virtual to physical mapping searches. A miss in cache prompts a much slower search in system memory.
A miss in the TLB, produces the slower page file search. Initially hypervisors simply replicated the page table per guest thus
duplicating the mapping effort.
Figure 2: Duplicate address translation lookups via shadow tables vs. nested page tables
Current hardware-assisted virtualization technology like SLAT is enabled to eliminate double page walks by preventing the
overhead associated with virtual machine page table to host machine page table mappings that required frequent updating.
However, there are specific application cases where the page size should be set higher to avoid performance loss with SLAT
optimization active by increasing the ratio of TLB hits1
. Increasing page size has the drawback however of increasing the likelihood
of internal fragmentation since the number of pages needed by a process is highly variable and a process requiring only slightly
more than one page wastes the majority of the second page space. This wasted page memory can be exacerbated on hosts
running many VMs, and especially with different types of applications running many concurrent processes increasing the chance of
page allocation failures.
1	 VMWare: Performance Evaluation of Intel EPT Hardware Assist
Key Considerations for Virtual Infrastructure Benchmarking
4 | spirent.com
White Paper
Storage I/O
Storage I/O performance is another aspect of virtualization that requires significant benchmarking to ensure application
performance is acceptable in cloud and NFV deployments. There are many possible storage implementations: direct attached
(DAS), storage area networks (SAN), and network attached (NAS), magnetic, flash, and DRAM all of which have benefits and
drawbacks in cost, performance, configuration and durability considerations. For SaaS applications, spinning disks offer
acceptable performance at the most attractive price point. For NFV however, typical hardware based appliances rely heavily on
NVRAM/SDRAM to perform high-speed lookup operations, and DRAM to store, retrieve and frequently modify large scale control
plane tables. Therefore VNFs will require much larger footprints consuming far more DRAM than a typical server application. VNFs
typically require setting processor affinity, or CPU pinning as well, to ensure proper performance that further hoards what are
intended to be shared compute resources with other adjacent VMs and complicates their placement during provisioning to avoid
significant performance degradation from resource contention.
Determining storage performance comes down to three key metrics: throughput, IOPS, and latency. Note each type of storage
media supports a maximum data transfer rate that is part of the three metrics discussed here.
ƒƒ Throughput is simply the maximum amount of data that storage will process with respect to time. Bandwidth is often used
interchangeably, but more accurately should be described as a storage system’s ability to sustain a specific throughput
level over time whether that is the maximum or some level below maximum. There will be different measurements for direct
attached vs. network-attached storage due to the throughput a storage network will support at a given time. This hits back to
the importance of live benchmarking virtual infrastructure.
ƒƒ Input/Output Operations per Second (IOPS) measures read and/or write operations per second and should be considered in
the context of data block size (BS). IOPS will typically be higher for smaller block sizes, and lower for larger blocks sizes.
ƒƒ Latency measures the response time of completing a storage operation, and most importantly, factors all sub components
that are involved in a data transaction over virtual infrastructure: storage appliances, SAN components, internal buses,
controllers, etc.
Figure 3: Impact of varying IOPS and BS profiles on volume Storage hitting same SAN target
spirent.com | 5
Cost considerations aside, one of the tougher choices to make is choosing between flash and conventional magnetic solutions
for a given software application. While magnetic disks offer parity between read and write latency, flash offers superior access
times under specific conditions. Some of the key metrics needed when evaluating whether to use flash based storage in virtual
infrastructure are:
ƒƒ Storage access profile - flash excels at reads, has write penalty
ƒƒ Block sizes of data transfers – larger, smaller, varied
ƒƒ Queue depths – flash performs better at larger queue depths
ƒƒ I/O pattern – read, write, sequential, burst, random etc.
ƒƒ Utilization of storage – flash performance degrades as storage approaches capacity
Regardless of storage type used in a cloud or NFVI, performance is largely dependent on the variation of block sizes and IOPS
read and written by applications, the total latency between the source VM and target storage media, and especially contention
from other VMs accessing the same storage resources.
Network I/O
The final piece of the puzzle we will examine is network I/O implementations in virtualization. Outside of special applications
like video processing, network I/O is one of the most important resources where virtualized application workloads require near
machine level performance for a good user experience. Like schedulers and storage, there are a number of options for virtualizing
I/O devices. Two of the earliest methods, device emulation and para-virtualized drivers, were strong in sharing but weaker
in performance. The performance degradation comes from excessive overhead of packet processing on the host CPU. The
performance cost from device emulation of an Ethernet card could be higher than 50% compared to running on bare metal. This
was quickly answered with hardware “pass-through” options like Intel® VT-x and VT-d which provide the VM’s network driver direct
access to the network card’s resources installed on the host. However configuration restrictions, namely one to one virtual port to
physical port mappings reduced the benefits of device sharing provided by emulation and para-virtualization. This is where Single
Root I/O Virtualization (SR-IOV) and Sharing was introduced by PCI-SIG as a standard to bring the performance gains of pass-
through with the flexibility of sharing seen with emulation and para-virtualization.
Figure 4: NFV service chain throughput bottleneck between SR-IOV and vSwitch connectivity
© 2015 Spirent. All Rights Reserved.
All of the company names and/or brand names and/or product names referred to in this document, in particular, the name “Spirent”
and its logo device, are either registered trademarks or trademarks of Spirent plc and its subsidiaries, pending registration in
accordance with relevant national laws. All other registered trademarks or trademarks are the property of their respective owners.
The information contained in this document is subject to change without notice and does not represent a commitment on the part
of Spirent. The information in this document is believed to be accurate and reliable; however, Spirent assumes no responsibility or
liability for any errors or inaccuracies that may appear in the document. 	 Rev A | 11/15
Key Considerations for Virtual Infrastructure Benchmarking
spirent.com
AMERICAS 1-800-SPIRENT
+1-818-676-2683 | sales@spirent.com
EUROPE AND THE MIDDLE EAST
+44 (0) 1293 767979 | emeainfo@spirent.com
ASIA AND THE PACIFIC
+86-10-8518-2539 | salesasia@spirent.com
White Paper
Throughput performance difference between hardware assisting solutions like SR-
IOV and internal software based virtual switches can be extreme, especially for open
source virtual switches. This can especially impact virtual service chains where VNFs
process packets sequentially within the service chain. If the vSwitch introduced latency
is excessive, the buffering at the physical network cards can become congested
even leading to drops. To address this performance discrepancy, VNF vendors are
enhancing performance with data plane optimizations via toolkits. This improves
buffering and handling of packets, particularly for smaller packet sizes by reserving
specific areas of main memory and utilizing huge page support. Many of these toolkit
enhancements are still underway by most VNF vendors so validating their packet
forwarding performance is crucial to determine if the VNF meets their SLAs and QoE
expectations. It’s imperative such testing can occur in live environments.
Summary
In conclusion, the demands and complex interactions placed on virtual infrastructure
necessitate considerable amounts of testing and benchmarking. It helps determine
and validate realistic expectations of cloud services or virtual service chains delivered
by NFV. Hypervisor is by far the most critical component of the infrastructure. From
the chosen scheduling algorithm to the memory management, to storage and network
configurations, the potential for conflicts and resulting discord can wreck havoc
on application performance. The hypervisor’s ability to handle software workloads
across massive scale deployments over a large number of compute nodes, is key to
ensuring that the performance of a particular service offering or service chain meets
the customer’s needs. The demands placed on any given hypervisor are so excessive
that only live testing of NFV service chains and on cloud platforms themselves can
provide accurate assessments of their true performance. Cloud and Virtual testing
needs are greatly simplified with Spirent Cloud solutions by making performance and
benchmarking optimization of virtualized networks and cloud infrastructure – more
transparent and effective – in turn maximizing your cloud investments and delivering
money-back SLAs to your customers. For more information about Spirent Cloud
solutions, please visit www.spirent.com/Cloud.
About Spirent
Spirent provides software solutions to
high-tech equipment manufacturers
and service providers that simplify and
accelerate device and system testing.
Developers and testers create and
share automated tests that control and
analyze results from multiple devices,
traffic generators, and applications while
automatically documenting each test with
pass-fail criteria. With Spirent solutions,
companies can move along the path toward
automation while accelerating QA cycles,
reducing time to market, and increasing
the quality of released products. Industries
such as communications, aerospace and
defense, consumer electronics, automotive,
industrial, and medical devices have
benefited from Spirent products.

Weitere ähnliche Inhalte

Mehr von Malathi Malla

Spirent CloudStress - One click cloud validation
Spirent CloudStress - One click cloud validationSpirent CloudStress - One click cloud validation
Spirent CloudStress - One click cloud validationMalathi Malla
 
Spirent TestCenter EVPN Emulation
Spirent TestCenter EVPN EmulationSpirent TestCenter EVPN Emulation
Spirent TestCenter EVPN EmulationMalathi Malla
 
Spirent TestCenter VxLAN Emulation
Spirent TestCenter VxLAN EmulationSpirent TestCenter VxLAN Emulation
Spirent TestCenter VxLAN EmulationMalathi Malla
 
Spirent TestCenter OpenFlow Switch Emulation
Spirent TestCenter OpenFlow Switch EmulationSpirent TestCenter OpenFlow Switch Emulation
Spirent TestCenter OpenFlow Switch EmulationMalathi Malla
 
Spirent TestCenter OpenFlow Controller Emulation
Spirent TestCenter OpenFlow Controller EmulationSpirent TestCenter OpenFlow Controller Emulation
Spirent TestCenter OpenFlow Controller EmulationMalathi Malla
 
Spirent HyperScale Test Solution
Spirent HyperScale Test SolutionSpirent HyperScale Test Solution
Spirent HyperScale Test SolutionMalathi Malla
 
Spirent TestCenter Virtual
Spirent TestCenter VirtualSpirent TestCenter Virtual
Spirent TestCenter VirtualMalathi Malla
 
Spirent TestCenter Virtual on OracleVM
Spirent TestCenter Virtual on OracleVMSpirent TestCenter Virtual on OracleVM
Spirent TestCenter Virtual on OracleVMMalathi Malla
 
Acclerating SDN and NFV Deployments with Spirent
Acclerating SDN and NFV Deployments with SpirentAcclerating SDN and NFV Deployments with Spirent
Acclerating SDN and NFV Deployments with SpirentMalathi Malla
 
Spirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsSpirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsMalathi Malla
 
Spirent 3D Topology Suite
Spirent 3D Topology SuiteSpirent 3D Topology Suite
Spirent 3D Topology SuiteMalathi Malla
 

Mehr von Malathi Malla (11)

Spirent CloudStress - One click cloud validation
Spirent CloudStress - One click cloud validationSpirent CloudStress - One click cloud validation
Spirent CloudStress - One click cloud validation
 
Spirent TestCenter EVPN Emulation
Spirent TestCenter EVPN EmulationSpirent TestCenter EVPN Emulation
Spirent TestCenter EVPN Emulation
 
Spirent TestCenter VxLAN Emulation
Spirent TestCenter VxLAN EmulationSpirent TestCenter VxLAN Emulation
Spirent TestCenter VxLAN Emulation
 
Spirent TestCenter OpenFlow Switch Emulation
Spirent TestCenter OpenFlow Switch EmulationSpirent TestCenter OpenFlow Switch Emulation
Spirent TestCenter OpenFlow Switch Emulation
 
Spirent TestCenter OpenFlow Controller Emulation
Spirent TestCenter OpenFlow Controller EmulationSpirent TestCenter OpenFlow Controller Emulation
Spirent TestCenter OpenFlow Controller Emulation
 
Spirent HyperScale Test Solution
Spirent HyperScale Test SolutionSpirent HyperScale Test Solution
Spirent HyperScale Test Solution
 
Spirent TestCenter Virtual
Spirent TestCenter VirtualSpirent TestCenter Virtual
Spirent TestCenter Virtual
 
Spirent TestCenter Virtual on OracleVM
Spirent TestCenter Virtual on OracleVMSpirent TestCenter Virtual on OracleVM
Spirent TestCenter Virtual on OracleVM
 
Acclerating SDN and NFV Deployments with Spirent
Acclerating SDN and NFV Deployments with SpirentAcclerating SDN and NFV Deployments with Spirent
Acclerating SDN and NFV Deployments with Spirent
 
Spirent SDN and NFV Solutions
Spirent SDN and NFV SolutionsSpirent SDN and NFV Solutions
Spirent SDN and NFV Solutions
 
Spirent 3D Topology Suite
Spirent 3D Topology SuiteSpirent 3D Topology Suite
Spirent 3D Topology Suite
 

Kürzlich hochgeladen

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Key considerations for virtual infrastructure benchmarking

  • 1. Key Considerations for Virtual Infrastructure Benchmarking White Paper Diversity of Virtualization Platforms Cloud Services whether SaaS, IaaS, CaaS, VDI, or PaaS are adding incredible diversity in both the number of providers and services offered. However most of this ecosystem is running on a handful of virtual platforms –or Hypervisors –from vendors such as VMware, Amazon, and Microsoft, or from an open source platform like KVM. Compared to traditional server-client applications running over traditional IP networking products, virtualization platforms are comparatively new. They are also considerably less tested in terms of network application performance as they run over standard x86 based server hardware, opposed to purpose built proprietary hardware from networking equipment vendors. In fact the longstanding demarcation between boxes hosting server applications and those performing packet forwarding operations is breaking down. Now a cluster of servers or even a single host can serve as an application server and a major network gateway forwarding device simultaneously. The challenges this presents comes from the significant differences that exist between application workload processing, and network packet processing and how best to share CPU time and allocate other critical compute resources. This paper focuses on the above mentioned challenges and why testing hypervisor performance for Cloud and NFV use cases is critical, especially in a live cloud and virtual environment. Hypervisor and benchmarking its performance Virtualization introduces a number of issues in the areas of shared resource allocation, some of which have been more specifically addressed like the use of shadow tables, while others such as the scheduling algorithm implementation are not as clear cut as there is no best fit scheduler for all application job types. While the variables affecting hypervisor performance can be many, we will take a closer look at few factors in the following sections. The world of computing is changing at a rapid pace. Cloud Computing, Software-Designed Data Centers (SDDC), and Network Functions Virtualization (NFV) all rely heavily on Virtualization technologies to deliver their benefits. Virtualization not only enables new functionalities like NFV, VM migration and elastic computing, but also brings with it a tremendous opportunity for capital and operational cost savings. Most of the savings come from utilization optimization. By redeploying more functionality into software over hypervisors using COTS hardware, and also consolidating the number of physical servers deployed, unseen levels of resource optimization and cost savings can be achieved. This hyper-optimized configuration comes at costs however; platform centralization and greater software resource scheduling complexity.
  • 2. Key Considerations for Virtual Infrastructure Benchmarking 2 | spirent.com White Paper Virtual CPUs Most hypervisor schedulers are modeled with fairness between VMs in mind like round robin servicing, or even a resource allocation budget that prioritizes access time for VMs that have been least active. Design differences between the guest scheduler and the hypervisor scheduler can cause significant application degradation especially for time-sensitive jobs like those associated with network applications. Hierarchical scheduling problem, also referred to as a semantic gap, between the guest OS scheduler and the hypervisor scheduler can result in Application processing delay beyond that of normal job execution time as potentially two completely different schedulers must play a role in job scheduling. There is also the potential for conflicting scheduling designs. At the heart of the issue is incognizance of the hypervisor scheduler to that of the guest scheduler. For example, guests running Windows and MAC OS X have a multilevel feedback queue implemented. This design uses a number of FIFO queues each assigned a time quantum from lowest to highest value. This enforces shorter duration processes to be executed first in lower order queues, while longer jobs drop into queues with a higher quantum one queue at a time until execution finishes, getting less precedence with each queue change. Figure 1: Blocking between two VMs on a monolithic type 1 host configured with 1-n vCPUs The potential issues arising from double scheduling that impact application performance are: ƒƒ Conflicting precedence between guest and hypervisor scheduling algorithms ƒƒ Preemption of critical sections on guest vCPU by hypervisor scheduler ƒƒ vCPU stacking where semaphore holder schedules after semaphore waiter ƒƒ Task priority inversion and CPU fragmentation due to co-scheduling implementations with multi-processor VMs Virtual platform vendors have implemented or are considering various types of hybrid scheduler designs or adding heuristics to existing ones to mitigate these problems such as relaxed co-scheduling (gang scheduling), borrowed virtual time scheduling (BVT), real-time deferrable server (RTDS), or even combination approaches like using round robin or weighted round robin (WRR) algorithms depending on the process requiring CPU time for their virtual machine monitors (VMMs). The designs continue to evolve, but ultimately hypervisor scheduling comes down to basing on fairness, prioritization, or execution time, or some combination of the three. Regardless of the algorithm(s) chosen, only testing will expose the true performance of a scheduler under real workloads in large-scale environments.
  • 3. spirent.com | 3 Virtual Memory One of the performance implication of virtualization is duplication of Dynamic Address Translation (DAT) lookups in the main secondary storage page table, and shadow table instances on VMs. Similar to double scheduling, virtual to physical addresses translations by default are performed unless new technologies are enabled such as Second Level Address Translation (SLAT) or “nested paging”. Because memory does not store values contiguously, Virtual addresses are used by applications. The OS is then responsible for mapping these virtual addresses and the data stored there to the corresponding physical regions. This is known as “paging” or using a “swap file”. Just as main memory uses cache to speed up the search, paged memory has Translation Lookaside Buffers (TLB) to speed up virtual to physical mapping searches. A miss in cache prompts a much slower search in system memory. A miss in the TLB, produces the slower page file search. Initially hypervisors simply replicated the page table per guest thus duplicating the mapping effort. Figure 2: Duplicate address translation lookups via shadow tables vs. nested page tables Current hardware-assisted virtualization technology like SLAT is enabled to eliminate double page walks by preventing the overhead associated with virtual machine page table to host machine page table mappings that required frequent updating. However, there are specific application cases where the page size should be set higher to avoid performance loss with SLAT optimization active by increasing the ratio of TLB hits1 . Increasing page size has the drawback however of increasing the likelihood of internal fragmentation since the number of pages needed by a process is highly variable and a process requiring only slightly more than one page wastes the majority of the second page space. This wasted page memory can be exacerbated on hosts running many VMs, and especially with different types of applications running many concurrent processes increasing the chance of page allocation failures. 1 VMWare: Performance Evaluation of Intel EPT Hardware Assist
  • 4. Key Considerations for Virtual Infrastructure Benchmarking 4 | spirent.com White Paper Storage I/O Storage I/O performance is another aspect of virtualization that requires significant benchmarking to ensure application performance is acceptable in cloud and NFV deployments. There are many possible storage implementations: direct attached (DAS), storage area networks (SAN), and network attached (NAS), magnetic, flash, and DRAM all of which have benefits and drawbacks in cost, performance, configuration and durability considerations. For SaaS applications, spinning disks offer acceptable performance at the most attractive price point. For NFV however, typical hardware based appliances rely heavily on NVRAM/SDRAM to perform high-speed lookup operations, and DRAM to store, retrieve and frequently modify large scale control plane tables. Therefore VNFs will require much larger footprints consuming far more DRAM than a typical server application. VNFs typically require setting processor affinity, or CPU pinning as well, to ensure proper performance that further hoards what are intended to be shared compute resources with other adjacent VMs and complicates their placement during provisioning to avoid significant performance degradation from resource contention. Determining storage performance comes down to three key metrics: throughput, IOPS, and latency. Note each type of storage media supports a maximum data transfer rate that is part of the three metrics discussed here. ƒƒ Throughput is simply the maximum amount of data that storage will process with respect to time. Bandwidth is often used interchangeably, but more accurately should be described as a storage system’s ability to sustain a specific throughput level over time whether that is the maximum or some level below maximum. There will be different measurements for direct attached vs. network-attached storage due to the throughput a storage network will support at a given time. This hits back to the importance of live benchmarking virtual infrastructure. ƒƒ Input/Output Operations per Second (IOPS) measures read and/or write operations per second and should be considered in the context of data block size (BS). IOPS will typically be higher for smaller block sizes, and lower for larger blocks sizes. ƒƒ Latency measures the response time of completing a storage operation, and most importantly, factors all sub components that are involved in a data transaction over virtual infrastructure: storage appliances, SAN components, internal buses, controllers, etc. Figure 3: Impact of varying IOPS and BS profiles on volume Storage hitting same SAN target
  • 5. spirent.com | 5 Cost considerations aside, one of the tougher choices to make is choosing between flash and conventional magnetic solutions for a given software application. While magnetic disks offer parity between read and write latency, flash offers superior access times under specific conditions. Some of the key metrics needed when evaluating whether to use flash based storage in virtual infrastructure are: ƒƒ Storage access profile - flash excels at reads, has write penalty ƒƒ Block sizes of data transfers – larger, smaller, varied ƒƒ Queue depths – flash performs better at larger queue depths ƒƒ I/O pattern – read, write, sequential, burst, random etc. ƒƒ Utilization of storage – flash performance degrades as storage approaches capacity Regardless of storage type used in a cloud or NFVI, performance is largely dependent on the variation of block sizes and IOPS read and written by applications, the total latency between the source VM and target storage media, and especially contention from other VMs accessing the same storage resources. Network I/O The final piece of the puzzle we will examine is network I/O implementations in virtualization. Outside of special applications like video processing, network I/O is one of the most important resources where virtualized application workloads require near machine level performance for a good user experience. Like schedulers and storage, there are a number of options for virtualizing I/O devices. Two of the earliest methods, device emulation and para-virtualized drivers, were strong in sharing but weaker in performance. The performance degradation comes from excessive overhead of packet processing on the host CPU. The performance cost from device emulation of an Ethernet card could be higher than 50% compared to running on bare metal. This was quickly answered with hardware “pass-through” options like Intel® VT-x and VT-d which provide the VM’s network driver direct access to the network card’s resources installed on the host. However configuration restrictions, namely one to one virtual port to physical port mappings reduced the benefits of device sharing provided by emulation and para-virtualization. This is where Single Root I/O Virtualization (SR-IOV) and Sharing was introduced by PCI-SIG as a standard to bring the performance gains of pass- through with the flexibility of sharing seen with emulation and para-virtualization. Figure 4: NFV service chain throughput bottleneck between SR-IOV and vSwitch connectivity
  • 6. © 2015 Spirent. All Rights Reserved. All of the company names and/or brand names and/or product names referred to in this document, in particular, the name “Spirent” and its logo device, are either registered trademarks or trademarks of Spirent plc and its subsidiaries, pending registration in accordance with relevant national laws. All other registered trademarks or trademarks are the property of their respective owners. The information contained in this document is subject to change without notice and does not represent a commitment on the part of Spirent. The information in this document is believed to be accurate and reliable; however, Spirent assumes no responsibility or liability for any errors or inaccuracies that may appear in the document. Rev A | 11/15 Key Considerations for Virtual Infrastructure Benchmarking spirent.com AMERICAS 1-800-SPIRENT +1-818-676-2683 | sales@spirent.com EUROPE AND THE MIDDLE EAST +44 (0) 1293 767979 | emeainfo@spirent.com ASIA AND THE PACIFIC +86-10-8518-2539 | salesasia@spirent.com White Paper Throughput performance difference between hardware assisting solutions like SR- IOV and internal software based virtual switches can be extreme, especially for open source virtual switches. This can especially impact virtual service chains where VNFs process packets sequentially within the service chain. If the vSwitch introduced latency is excessive, the buffering at the physical network cards can become congested even leading to drops. To address this performance discrepancy, VNF vendors are enhancing performance with data plane optimizations via toolkits. This improves buffering and handling of packets, particularly for smaller packet sizes by reserving specific areas of main memory and utilizing huge page support. Many of these toolkit enhancements are still underway by most VNF vendors so validating their packet forwarding performance is crucial to determine if the VNF meets their SLAs and QoE expectations. It’s imperative such testing can occur in live environments. Summary In conclusion, the demands and complex interactions placed on virtual infrastructure necessitate considerable amounts of testing and benchmarking. It helps determine and validate realistic expectations of cloud services or virtual service chains delivered by NFV. Hypervisor is by far the most critical component of the infrastructure. From the chosen scheduling algorithm to the memory management, to storage and network configurations, the potential for conflicts and resulting discord can wreck havoc on application performance. The hypervisor’s ability to handle software workloads across massive scale deployments over a large number of compute nodes, is key to ensuring that the performance of a particular service offering or service chain meets the customer’s needs. The demands placed on any given hypervisor are so excessive that only live testing of NFV service chains and on cloud platforms themselves can provide accurate assessments of their true performance. Cloud and Virtual testing needs are greatly simplified with Spirent Cloud solutions by making performance and benchmarking optimization of virtualized networks and cloud infrastructure – more transparent and effective – in turn maximizing your cloud investments and delivering money-back SLAs to your customers. For more information about Spirent Cloud solutions, please visit www.spirent.com/Cloud. About Spirent Spirent provides software solutions to high-tech equipment manufacturers and service providers that simplify and accelerate device and system testing. Developers and testers create and share automated tests that control and analyze results from multiple devices, traffic generators, and applications while automatically documenting each test with pass-fail criteria. With Spirent solutions, companies can move along the path toward automation while accelerating QA cycles, reducing time to market, and increasing the quality of released products. Industries such as communications, aerospace and defense, consumer electronics, automotive, industrial, and medical devices have benefited from Spirent products.