Virtualizing Tier One Applications - Varrow

Virtualizing Tier One
Applications
May 10, 2012
Varrow

Andrew Miller
Senior Technical Consultant, vExpert
t: @andriven w: www.thinkmeta.net

Housekeeping
• If tweeting, include #varrow and maybe #vbca

• Feel free to send me commentary at @andriven

• Hours of stuff packed into a single hour so…

• No shame about content source.

Agenda
• Top 10 Myths About Virtualizing Business-Critical
Applications
• Best Practices for Virtualizing Mission Critical
Applications (courtesy of @cxi and VMware)
• Real-world Tools
– Confio IgniteVM
– vCenter Operations

– Note: Varrow is 1 of 10 VBCA Compentency Holders.

Top 10 Myths
About Virtualizing
Business-Critical Applications

Myth 2: Newer applications may be "built for the cloud," but my
legacy business-critical applications are not designed to benefit
from cloud infrastructure.
Truth: Virtualization brings cloud-like benefits to existing legacy
applications by providing dynamic scalability, built-in high
availability, provisioning in minutes and automated disaster
recovery at the infrastructure level.

Myth 4: Virtualization is about cost savings, and I'm not willing to
risk the health of my business applications to save on hardware
costs.
Truth: Virtualization is not just about cost reduction. It also helps
improve application quality of service by enabling applications to
scale up or scale out on demand. The need for fully tested disaster
recovery is one of the key drivers for many organizations to
virtualize their most important applications.
Site A (Primary) Site B (Recovery)

VMware Site Recovery VMware Site Recovery
vCenter Server Manager vCenter Server Manager

VMware vSphere VMware vSphere

Servers Servers

Myth 9: Virtualization can handle everything except my most I/O-
intensive applications.
Truth: vSphere features like storage and network I/O controls, for
example, allow reservations and priorities to enable policy-based
compute, network and storage resource management for business
applications.

“Oh, and one more ti n g…”
h

• Link: http://tinyurl.com/bca-bundle

Best Practices
for Virtualizing
Mission Critical Applications
(courtesy of @cxi and VMware)

Virtualizing Tier 1 is Impossible

Who’s doing it?
• United States Navy/Marine Corps – 750,000
mailboxes
• University of Plymouth – 40,000 mailboxes
• VMware IT – 9,000 very heavy mailboxes
• University of Texas at Brownsville – 25,000
mailboxes
• EMC IT – 53,000 mailboxes

Virtual Exchange Start Here
• Refer to Support Policies, Recommendations and
Best Practice Documents
• Architect for the application, not for the
virtualization solution
• Pretend like you’re doing it physically… and Just
do it virtually
• Defaults unless requiring optimization!

Start Simple
• Deploy VMs with similar roles on separate hosts
– MBX VMs in same DAG should not co-locate
– Deploy with VMFS
– Scale up and scale out
– Spread your CAS around

Licensing Exchange in the Virtual!
• One server license is required for each running
instance of Exchange Server 2010 – whether it is
installed natively on a physical machine or on a
virtual machine

• That’s pretty simple!

Configure Storage
• Review the Exchange Calculator to determine your memory, spindle and
IOPS requirement
• Configure your storage how you would handle it physically, then present
it to your VMs
• Size your MBX VMDK <2TB
– Some suggest 2040GB to be on the safe side
• Take advantage of “Optimized for Virtualization” acceleration
technologies by storage vendors
– Storage Offloading (VAAI)
– Per VMDK Locking
• Unlike in the physical world, most data stores host more than one VM
so account for that IO
• Auto-tiering with small granularity (768k) can result in significant
storage savings

Exchange Best Practices
• Do not P2V your Exchange Servers
– Build new servers virtually and move mailboxes
• Split your roles and size their CPU/Mem on a role by
role basis
• Analyze performance characteristics before and after if
performing migration
• Less physical servers != fewer resources

Exchange Best Practices
• Size Exchange VMs to fit within NUMA nodes for best
performance
• Do not over commit memory unless absolutely required
• Consider DAG for local site HA, and SRM for site
resiliency/DR

Get on the road to Virtual SQL

Virtual SQL Start Here
• Refer to Support Policies, Recommendations and
Best Practice Documents
• Architect for the application, not for the
virtualization solution
• Pretend like you’re doing it physically… and Just
do it virtually
• Defaults unless requiring optimization!

Start Simple
• The average physical SQL Server uses 2 CPUs is 6% utilized,
3Gb Mem, 60% utilized, ~20 IOPS
• Light workload?
– Start with 2vCPUs, 3Gb ram
• Heavy workload?
– Start with 4vCPUs, 8Gb+ ram
• Really Heavy workload?
– Architect as if physical in the virtual
– Use a capacity planner tool to assist
• Remember: what’s above is for Tier 1. You can start
smaller if you want (and it’s good idea overall).

Licensing SQL in the Virtual?!
• Standard, Workgroup, Enterprise per proc
– You must license SQL for each virtual processor
• Standard, Workgroup per Server/CAL
– You must license each virtual operating system
• Enterprise per physical proc
– Licensing each physical processor entitles you to run any
number of SQL server instances
• 2012 switches to per core licensing!
• Unsure? Contact licensing professionals!

Virtualized SQL is blazing fast!

Configure Storage Correctly
• Database LUN needs enough spindles
• Log LUN needs enough spindles
• Mixing sequential (logs) and random (database) can
result in random behavior
– Avoid mixing workloads, refer to storage vendor
• Eager-Zeroed Thick VMDK for your Database and Log
volumes

Configure Storage Continued
• vMotion is supported with SQL Server
• Try to leverage Array Tiering and Acceleration
technologies if possible
– Use Array based caching to improve performance
• Most DBs, even High IO ones are hot ~10-15% of the
database, the rest is cold IO
– Automatic Tiering makes for higher performance and
higher efficiency while reducing cost

Migrating SQL
• Analyze your existing environment
• Perform a virtualization assessment
• Pay attention to disk spindles not total space
• Easy Migration: Use converter to clone server
• Easier mgmt and provisioning: Use Templates
• In between: Open Migrator  P2V + vRDM 
Storage vMotion = VM with vmdk’s.
– More complicated but minimizes downtime.

Database Best Practices
• Follow Microsoft Best Practices for SQL Server
• Evaluate workloads for SQL-intensive ops
• Consider Scaling Out for high end deployments
• Defrag SQL Databases
• Design back-end to support workload (IOPS)
• Monitor DB/Logs for Disk r/w, Disk Queues
• Use Fibre-channel connectivity for storage

Configuring Physical Files
• Os/App, Data, Log and TempDB on separate spindles –
Separate LUNs on single datastore will not provide IO
separation
• Use RAID10 or RAID5 (read-only)
– Refer to your storage vendors best practices
• Pre-size data files, do not AUTOGROW
• Pre-size log files, ~10% of DB on average

Configuring TempDB
• Move TempDB to dedicated LUN
• # of TempDB files = # of CPU cores
• All TempDB files should be equal in size
• Pre-Allocate TempDB space for workload
• Set file growth increment to minimize expand
• Microsoft recommends FILEGROWTH incr 10%

SQL Failover Clustering Best Practices
• Failover clustering is supported with caveats
– Follow best practices guide for SQL Clustering
– Use RDMS for DB and Log volumes
– Use eagerthickzeroed disks
– Use separate vSCSI controller for OS and Data
– Use separate vSwitches for Public and Heartbeat
– Team NICs for network redundancy

SQL Failover Clustering Best Practices
• SQL Database Mirroring (SQL 2008) or AlwaysOn
Availability Groups (2012) can provide similar
levels of availability as failover clusters but
without the strict requirements or vendor
support issues.
• Most DBs have no failover capability not
clustered. By making them virtual and letting
them take advantage of vSphere HA adds
availability not possible with physical servers

General Best Practices - Memory
• Allocate your memory based upon your application
workload
• Database memory doesn’t dedupe well
• Do not over subscribe mission critical workloads
• Do NOT OVER SUBSCRIBE MISSION CRITICAL
WORKLOADS
– Use memory reservations for mission critical SQL workloads to
avoid memory contention issues.

General Best Practices - CPU
• Only allocate vCPUs which are being used
– Idle vCPUs will compete for system resources
• If workload is unknown, size for fewer vCPUs
– You can always add more later if reqs demand
• For Performance Critical VMs
– Try to ensure total number of vCPUs assigned to all
VMs is <= total number of cores on the host
– CPU load average of <=1. If greater, add more cpu

FCoTR is the key to the future.

General Best Practices - Networking
• Separate vMotion, Logging and console traffic; or use
VLAN tagging
• Use a paravirtualized vNIC for high performance
workloads
• Leverage 802.1q using Virtual Switch Tagging (VST). -
VST is most common configuration
• Follow networking design guidelines
• Do NOT use Jumbo Frames*
– Let’s chat afterwards if questions.

Clusters
• Microsoft does not support migration of running virtual
machines running cluster software.
– Caveat*

Alignment
• Ensure your VMs have their disks aligned
– Boot alignment is auto in 2008, manual in 2003
– Application LUN is manual, follow application and
storage vendor best practices

Images courtesy of Vaughn Stewart, @vStewed

Links
• Microsoft Support Policies and Recommendations for Exchange Servers in Hardware Virtualization Environments
• Exchange 2010 on VMware - Best Practices Guide
• http://www.vmware.com/pdf/Virtualizing_Exchange2003.pdf
• http://www.vmware.com/files/pdf/solutions/08Q4_VM_Exchange_Server_2007_VI3_WP.pdf
• http://www.vmware.com/files/pdf/Exchange_2010_on_VMware_-_Best_Practices_Guide.pdf

• Microsoft Virtualization Best Practices for Exchange
• Policies and Recommendations for Exchange Servers in Virtualization Environments

• Refer to these great blog series which covers Exchange and VMware
• http://www.clearpathsg.com/blogs/2010/07/13/exchange-2010-vsphere-4-best-practices-part-1

• Duncan Epping
• http://www.yellow-bricks.com/2008/12/17/exchange-2007on-vmware/

• Best Practices for SQL Server with VMware
• Microsoft SQL Server and VMware Virtual Infrastructure Best Practices
• Consolidation Guidance for SQL Server
• Licensing SQL
• Alignment

Database Performance Analysis
When Virtualized

(aka Confio IgniteVM)

Monitoring - vSphere
 Get access to vSphere client
• Need a user account
• http://<machine> - provides download link
 Why should I use vSphere?
• Standard O/S Counters may be wrong!

VMware Perfmon Counters

Special Perfmon
Counters on
Windows VMs
16

O/S Counter Problem

This is what the O/S thinks,
but it is based on 6GB.
Because of 2GB limit, the
correct utilization is 83%
25

Monitoring - Memory

 Primary Metric – Swapping, Ballooning
 Secondary Metrics – VM & Host Memory Utilization, VM
Memory Reservation, VM Memory Limit
 Rules
• If Any Swapping is occurring
– Host needs more memory because it cannot satisfy current demands
– Lessen demands for memory – lower reservations where possible
• Excessive Ballooning
– May be ok for now, but could be a pending issue
• VM Memory Utilization High
– May not be a problem now unless Guest O/S swapping is occurring
– If VM is limited, may want to increase memory this VM can get
• If Host Memory Utilization High
– May not be a problem now if no swapping or ballooning
– Could be a problem soon for all VMs on this host

CPU Metrics

 Primary Metric – VM Ready Time
 Secondary Metrics – VM CPU Utilization, Host CPU
Utilization
 Rules
• If VM Ready Time > 10-20%
– If Host CPU Utilization is high => Need more CPU resources on Host
– If Host CPU Utilization ok => VM is limited, give more CPU resources
• If VM CPU Utilization high (sustained over 80%)
– May not be a problem now if no ready time
– could be a problem soon for this VM
• If Host CPU Utilization high (sustained over 80%)
– May not be a problem now if no ready time on any VM
– Could be a problem soon for all VMs on this host
– Balance VM resources better

Monitoring - Storage

 Primary Metrics – Host maxTotalLatency, Host Device
Latency (by device), VM Disk Commands Aborted, VM
Command Latency
 Secondary Metrics – Host Disk Read Rate, Host Disk Write
Rate, VM Disk Usage Rate
 Rules
• If Host Latency >= 20-30 ms
– Review Device Latencies to understand which one has latencies
– Review Disk Read / Write rates
– If Close to Storage Capacity - Overloaded Storage
– Otherwise - Slow Storage
• If VM Command Latency >= 30ms only for your VM
– Tune Disk I/O intensive processes on database
– Are Memory / CPU issues causing I/O problems

Monitoring - Network

 Primary Metric – Dropped Receive Packets, Dropped
s
Transmit Packets
 Secondary Metrics – Network Rate
 Rules
• If any packets are being dropped
– Look for errors on te H t ’s NIC
h o
– See if one NIC is getting all traffic
– Understand which VM is causing the most traffic and reduce it
• If Network Rate is getting close to maximum for hardware
– Understand which VM is causing load
– May need to get better network hardware

This Layer shows
Database Response Time Metrics

This Layer shows
Database Health Metrics

This Layer shows
O/S and Virtual Machine Metrics

This Layer shows
Metrics for the Physical Host

This Layer shows
Metrics for the Storage Layer
40

Tooltip: Another VM (ProdServerB) moved
onto this Physical Host

43

Confio Software

 Award Winning Performance Tools
 Ignite8 for Oracle, SQL Server, DB2, Sybase
 IgniteVM for Databases on VMware
• Download at www.confio.com
 Provides Answers for
• What changed recently that affected end users
• What layer (VM or DB) is causing the problem
• Who and How should we fix the problem
Download free trial at
www.confio.com
46

4 Big Things

• Performance Monitoring
• Performance Trending
• Capacity Planning
• Root Cause Analysis

Managing Performance

Is it healthy? Is it enough? Is it optimised?

• Every VM & ESX • Enough CPU, RAM, • Which VMs need
performing well? Network, Disk? adjustment?
CPU, RAM, Future risk? • What are my key
Network, Disk? • Time remaining? ratios?
• Are they behaving • Capacity • How much can I
expectedly? remaining? claim back from
• Any fault on any • Where are the “fat” VMs?
component? “Stress points” • How many more
in time? VMs can I put
without impacting
performance?

• Is it healthy = Health
– Workload
– Anomalies
– Faults
• Is it enough = Risk
– Time remaining
– Capacity remaining
– Stress period
• Is it optimised = Efficiency
– What can we reclaim?
– Density. Key ratios for
management

Threshold: Shift in Mindset
• vCenter sets “static” threshold, which can be misleading
– During peak, it is common for VM to reach high utilisation.
• Static threshold will generate alerts when they should not.
• vSphere admin quickly learns to ignore them, defeating the purpose of alert to begin with.
– During non-peak, it might be abnormal for VM to reach even 50% utilisation.
• Static threshold will not generate alerts when they should have.
• vCenter only sets high threshold
– Do you set static threshold when CPU or RAM utilisation drops below 5%? 
• A drop in entire array storage IOPS might be a sign of terrible day ahead.
– Will not alert when these happen:
• Utilisation drops from 75% to 1% when it should not.
• Utilisation change from 5% to 70% when it should not.
– We need to plots both upper range and lower range
• But each VM differs. And the same VM differs depending on day/time… 
– Intelligence required to analyse each metrics and their expected “normal”
behaviour.

m 1m 1 m

0,0 i, j i, j 0,0 1
m 1m 1 m m 1m 1 m
i 1 j 1 i m, j 1 1 1
 P1,1,P1,2 ,...,Pm ,m ( p1,1, p1,2 ,..., pm,m ) m 1m 1 m
pi , j
i,j
pi , j
i,j
1 pi , j pi , j
i 1 j 1 i m, j 1 i 1 j 1 i m, j 1
0,0 i, j i, j
i 1 j 1 i m, j 1

m 1m 1 m
where pi , j pi , j 1 0
, pi , j 1 and z t z 1e t dt
0
i 1 j 1 i m, j 1

 The marginal distribution of the i th row of J is:
m 1
Dirichlet i, j , i ,1 , i ,2 ,..., i ,m 1 for i 1 m 1
,...,
j 1
( pi ,1,..., pi ,m 1 ) 
m
Dirichlet 0,0 m, j , m,1 , m,2 ,..., m, m , 0,0 for i m
j 1

m 1m 1 m
where 0,0 i, j i, j
i 1 j 1 i m, j 1

It is pretty difficult for a human to beat the computer in analysis of the data..
The above is one of the many algorithm applied by vCenter Operations.
Thank goodness I don’t have to explain this 

Recap
• Figures out normal – this is huge.
• 500 VMs, 50 ESX Hosts = 10,000+ Counters
• Setup and walk away for a while.
• Walkthrough Demo by Clint Kitson
– http://www.youtube.com/watch?v=Z-DJuTiqKag
• Less technical but much more fun overviews
– http://www.vmwarecloudmanagement.com/
• Great in-depth training doc up on VMware
Communities (179 slides with notes).
– http://communities.vmware.com/docs/DOC-18592

Questions?

(I’m hanging around.)

Virtualizing Tier One Applications - Varrow

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Virtualizing Tier One Applications - Varrow

Ähnlich wie Virtualizing Tier One Applications - Varrow (20)

Mehr von Andrew Miller

Mehr von Andrew Miller (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Virtualizing Tier One Applications - Varrow

Hinweis der Redaktion