2. Topics Covered
⢠Introduction
⢠Root Cause Analysis
⢠Performance Characteristics
⢠CPU
⢠Networking
⢠Memory
⢠Disk
⢠Virtual Machine optimisation
⢠ESXTop
⢠vm-support
⢠Service Console
⢠Resource Groups
⢠Design Guidelines
⢠Capacity Planner limitations and cautions
⢠Conclusion
⢠Reference Articles
3. Introduction
Multiple layers of virtualisation are used to
increase service levels, availability and
manageability
However, multiple layers of virtualisation often
mask performance and configuration issues
making it more of a challenge to troubleshoot
and correct
The worst out come is that performance issues
after a virtualisation project lead to the
perception that VMware results in reduced
performance and future confidence in VMware
can be affected
9. Monitoring Performance
⢠Do not rely on guest tools, but
â Can show high CPU, & Memory Utilisation
â Measurement of Latency & throughput of Disk &
Network Interfaces
⢠Use the virtualisation layer, to diagnose cause:
â Guest is unaware of virtualisation workload
â The way in which guest OSâs account time is
different
â No visibility of available resources
10. Performance Analysis Tools
⢠esxtop (service console only)
⢠resxtop (remote command line utilities)
⢠Performance graphs in vCentre
11. esxtop
⢠esxtop can be run:
â Interactively
â Batch (eg. esxtop -a -b > analysis.csv)
â Load batch into windows perfmon or MS Excel
⢠Two keys to remember
â H : help
â F : fields to display
12. esxtop basics
Host Resources
Name of Resource
Pool, Virtual Number of Worlds
Machine or World
13. Performance Characteristics
CPU Memory Networking Disk
Slow Processing Slow Processing Packet Loss Log Stalls
High CPU Wait Disk Swapping Slow Network Disk Queue
Slow Application Performance
Reduced User Experience
Data Loss and Corruption
14. CPU
ESX Scheduler
Basic World States
Read / Run / Wait
CPU States
Service Virtual Ready / Usage / Wait
Console Machine
Limits / Shares / Reservations
15. CPU High %RDY + High %User can imply over commitment
esxtop
â˘PCPU(%): CPU utilization
â˘%USED: Utilization
â˘%RDY: Ready Time
â˘%RUN: Run Time
â˘%WAIT: Wait and idling time
16. CPU
VI-Client
Used Time > Ready Time:
Possible CPU over-committment
Used Time
Ready Time
19. VMware Memory Management
⢠Transparent Page Sharing
⢠VMware Tools Balloon Driver to force the VM to swap to disk
⢠Virtual Machine Page File
20. Memory
Ballooning vs. Swapping
Ballooning driver causes the
host to swap pages that it
chooses to disk
ESX Swapping will swap any
pages to disk.
21. Memory
⢠Ballooning can be disabled (0 value) or
controlled on a per Virtual Machine basis
using:
sched.mem.maxmemctl
⢠Default is set to 65%, can be controlled at host
level.
⢠Only is an issue in resource contention
scenarios. (or VMâs with low latency eg Citrix)
22. Memory - Host
VI Client shows memory usage of the host. This is calculated as âconsumed + overhead
memory + Service Consoleâ.
Performance charts are a very good way of showing the Virtual Machine memory
breakdown.
⢠Consumed Memory
⢠Ballooned Memory
⢠Shared Memory
⢠Swapped Memory
23. Memory - Guest
Host Memory = Consumed + Overhead Memory
Guest Memory = Active Memory for Guest OS
25. Memory Virtual Machine Memory Metrics â VI Client
Metric Description
Memory Active (KB) Physical pages touched recently by a VM
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of
shared pages. Doesnât include overhead memory
Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than
configured memory. Includes shared pages. Doesnât include overhead
memory.
Memory Shared (KB) Physical pages shared with other virtual machines
Memory Balloon (KB) Physical memory ballooned from a virtual machine
Memory Swapped (KB) Physical memory in swap file (approx. âswap out â swap inâ). Swap out
and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
26. Memory Host Memory Metrics â VI Client
Metric Description
Memory Active (KB) Physical pages touched recently by the host
Memory Usage (%) Active memory / configured memory
Memory Consumed (KB) Total host physical memory â free memory on host. Includes Overhead
and Service Console memory
Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesnât include
overhead memory.
Memory Shared (KB) Physical pages shared by virtual machines on host
Shared Common (KB) Total machine pages used by shared pages
Memory Balloon (KB) Machine pages ballooned from virtual machines
Memory Swap Used (KB) Physical memory in swap file (approx. âswap out â swap inâ). Swap out
and Swap in are cumulative
Overhead Memory (KB) Machine pages used for virtualisation
27. Memory PMEM: Total physical memory breakdown
esxtop VMKMEM: Memory managed by vmkernel
COSMEM: Service Console memory breakdown
PSHARE: Page sharing statistics
SWAP: Swap statistics
MEMCTL: Balloon driver data
28. Memory esxtop / VI Client metrics : Virtual Machines
VI Client esxtop
Active Memory TCHD
Memory Usage %ACTV
Consumed Memory N/A
Memory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)
Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOP
Memory Balloon MCTLSZ
Memory Swapped SWCUR (SWR/s & SWW/s are rates)
Overhead Memory OVHD & OVHDMAX
29. Memory esxtop / VI Client metrics : Host Usage
VI Client esxtop
Memory Active N/A (try /proc/vmware/sched/mem-verbose)
Memory Usage N/A (try /proc/vmware/sched/mem-verbose)
Memory Consumed PMEM total â PMEM free
Memory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)
Memory Shared PSHARE (shared)
Memory Shared Common PSHARE (common)
Memory Balloon MEMCTL
Memory Swap Used SWAP (r/w and w/s are rates)
Overhead Memory OVHD & OVHDMAX
35. Disk
Varying Factors
⢠File system performance
⢠Disk subsystem configuration (SAN, NAS, iSCSI, local disk)
⢠Disk caching
⢠Disk formats (thick, sparse, thin)
ESX Storage Stack
â˘Different latencies for different disks
â˘Queuing within the kernel
K: Kernel
D: Device
G: Guest
36. Disk VI Client statistics
Quite Coarse Statistics
⢠Disk read / write rate (KB/s)
⢠Disk usage: sum of read BW and write BW (KB/s)
⢠Disk read / write requests (per 20s interval)
⢠Bus resets / Command aborts (per 20s interval)
â˘Per LUN or aggregated stats
37. Disk esxtop statistics
Aggregated stats similar to VI Client
⢠Disk read / write per sec (READS/s, WRITES/s)
⢠MB read / write per sec (MBREAD/s, MBWRTN/s)
Latency Statistics
⢠Kernel Average / command (KAVG/cmd)
⢠Device Average / command (DAVG/cmd)
⢠Guest Average / command (GAVG/cmd)
Queuing Information
⢠Adapter Queue Length (AQLEN)
⢠LUN Queue Length (LQLEN)
⢠VMKernel (QUED)
⢠Active Queue (ACTV)
⢠%Used (%USD = ACTV/LQLEN)
38. Disk
SAN Rough Estimates
Purely looking at a single ESX host, roughly:
Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec
FC, rough maximums:
Effective Link Bandwidth = ~80/90% of Real Bandwidth
Effective (2Gbps) = 200 â 230 MBps
Effective (4Gbps) = 410 â 460 MBps
Effective (8Gbps) = 820 â 920 MBps
iSCSI / NFS / FCoE, rough maximums:
Effective Link Bandwidth = ~70/80% of Real Bandwidth
Effective (1GigE) = 90 â 100 MBps
Effective (10GigE) = 900 â 1000 MBps
39. Disk
Desired Latency Calculations
Desired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host
Example:
Number of Hosts: 16
Effective Link Bandwidth: 90 MBps
Throughput per host: 90 / 16 = 5.6 MBps
Desired Latency: (32 * 32) / (5.6) = 182.86 msec
Workload Cached Sequential Read Cached Sequential Write
Desired Latency (msec) 182.86 182.86
Observed Latency (msec) ~350 ~180
Throughput Drop? Yes No
Throughput (MBps) ~45 ~90
40. Disk SAN Cache enabled
VI Client High throughput
SAN Cache disabled
Poor throughput
41. Disk
esxtop
Latency is quite high
After enabling cache,
Latency is reduced
42. Virtual Machine Optimisation
Deploy all machines from an optimised template!
⢠VMware tools MUST be installed
⢠The disks MUST be block aligned to the storage (even when using NFS and SAN)
⢠Where possible, always separate data disks from OS disks
⢠Windows performance settings should be optimised for application performance
⢠Guest operating system timeouts should be set as defined by the SAN vendor
⢠Pagefile should be separated where appropriate (this can impact VMware SRM however)
⢠Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)
⢠Last access update time should be disabled (unless where required)
⢠Logging of the VM should be disabled (only enabled for troubleshooting)
⢠Remove any unused virtual hardware (floppy drives, USB, etc.)
⢠Disable screen savers and power saving features, including logon screen saver
⢠Enable Remote Desktop, avoid using the VI Client for remote administration
⢠Install standard applications into template (bginfo, AntiVirus, any host agents, etc)
⢠Multiple-CPUâs should be allocated sparingly
44. Command Action
esxtop space Update the display
? Show the help page
Command Options q quit
when inside esxtop f/F Add or Remove columns from the display
o/O Change the order the display is sorted
s change the update interval
# change the number of instances to display
W Write configuration to file
e Expand / Rollup CPU Stats
V View only VM instances
L Change the length of the NAME field
m Display memory statistics
n Display network statistics
i Display interrupt statistics
d Display disk adapter statistics
u Display disk device statistics
v Display disk VM statistics
45. esxtop
Command Line Options
from the console
Command Action
-b batch mode
-l locks the objects available in the first snapshot
-s enables secure mode
-a show all statistics
-c sets the configuration file
-R enables replay mode (used with âvm-support âSâ)
-d sets the update interval
-n runs esxtop for n iterations
47. vm-support
Creates a packaged zip file containing the following sections:
⢠boot
⢠contains the grub configuration
⢠etc
⢠contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)
⢠proc
⢠contains much of the hardware configuration modules and variables
⢠tmp
⢠contains a lot of the ESX specific configuration output
⢠var
⢠contains log files and any core dumps
⢠vmfs
⢠contains the structure of the VMFS datastores
⢠esx3-installation (where appropriate)
⢠contains a copy if the previous esx3 configuration variables
48. vm-support
Using vm-support to extract performance information:
vm-support âS âd <duration> -i <interval>
<duration> and <interval> are in seconds
The output from this can then be replayed in esxtop for review after it has been
extracted.
esxtop âR <path_to_vm-support_output>
49. Service Console Performance
â˘Multiple Service Console networks â for network resiliency
â˘Increased Service Console memory â upto 800MB
â˘Use host agents supplied by your vendors
â˘Make storage recommended tweaks such as HBA Queue Depth
and IO timeouts
â˘Minimal use of the VI Client console â RDP or SSH instead
â˘Properly sized vCenter server â 64bit OS where possible
50. Resource Groups
Dynamically reallocate resource shares
Additional VM, shares allow you to over-
commit resources and have a graceful
re-allocation
Remove a VM and exploit extra resources
across all remaining VMâs
51. Design Guidelines
⢠Full Resilience / Multiple paths
⢠Standard configuration across all aspects (ESX, Storage, Networking, etc.)
⢠Standard naming conventions
⢠Learn from others mistakes
⢠Follow guidelines from vendors best-practices
⢠Rule out the basics before requesting support
52. Capacity Planner & P2V Cautions and Limitations
⢠Peak CPU usage can sometimes be misleading
⢠Back-end storage system performance
⢠P2V machines will require block-aligning to the storage
⢠P2V machines will still require guest OS optimisation
53. Conclusion
⢠Performance issues can often be traced with simple root cause
analysis using basic tools (VI Client / esxtop)
⢠Performance tools help diagnose issues and help rule out non-
issues
⢠Performance tools are useful in different contexts, not always
either/or
⢠Real-time data and troubleshooting: esxtop
⢠Historical data: VI Client
⢠Coarse resource / cluster usage: VI Client
⢠Detailed resource usage: esxtop
⢠Combine information from various tools to get a complete picture
⢠Always benchmark your systems first so you not what the optimal
performance is that you can receive