VMworld 2013
Justin King, VMware
Ravi Soundararajan, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
2. 2
Goals
Help you understand vCenter Architecture
Help you use this knowledge to guide vCenter deployment
3. 3
vCenter Deployment Options
One vCenter
Many vCenters
1 vCenter per site
Multiple vCenters using linked mode within a single site
Multiple vCenters using linked mode across sites
…
5. 5
For Most of You, This Is vCenter
C# clients
API clients
C# clients
API clients
vpxd DB
vCenter server
6. 6
However, This is Approximately vCenter (We Will Dissect This…)
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
Log
vctomcat
7. 7
Understanding vCenter Control Flow: Web Client Login
App
Server
vSphere
Web
Clients
vCenter server
1. Login
AD
SSO2. SSO
Authenticates
3. After user is authenticated, user
has access to all providers registered
with SSO (e.g., vCenter)
8. 8
Understanding vCenter Control Flow: C# Client Login
VPXD
vCenter server
AD
SSO
VI Clients
1. Login request to vCenter
(vpxd service)
2. vpxd contacts
SSO for authentication
3. User is able to
view inventory
Note: vpxd no longer
directly talks to AD
9. 9
Understanding vCenter Control Flow: A PowerOn Operation
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
vSphere
Web
Clients
Inv
Serv
vCenter server
1. PowerOn
2. To Vpxd
3. DRS +
Admission
Control
4. Issue Command
To ESX. Report Status.5. Persist
To DB
6a,b. Notify clients;
Persist to Inv Svc
Note: client is
authenticated, so
SSO not invoked
during operation
10. 10
Agenda for vCenter Architectural Deep Dive
vCenter to ESX interactions
vCenter server internals
Database
Clients
11. 11
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
vCenter service
Architecture Deep Dive: vCenter to ESX Interactions
3 main interactions:
1. Command traffic (depends on load)
2. Update host status (host sync)
3. Statistics (bursty)
12. 12
vCenter-to-ESX Considerations: Latency and Throughput
• Data transferred is typically small (KBs, not MBs)
• Latency from VC-to-ESX has larger impact than throughput
• Latency example: 4x diff (100ms vs. 500ms) 2x powerOn
latency difference
• Throughput example: 3x diff (512Kbps vs. 1.5Mbps) 0 powerOn
latency difference
• Other implications of high latency or low throughput
• Impact on statistics
• Slower stats collection
• Slower real-time queries
• Impact on browsing
• Console slower
• Host config slower
• Other stuff should be same
13. 13
Architecture Deep Dive: vCenter Server Internals
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
vpxd
• (Core business logic)
• Sends tasks to
appropriate hosts
• Retrieves config changes
from hosts
• Pushes config updates to DB
• Inserts stats into DB
• Satisfies queries from clients
CPU/Memory important
Log
vctomcat
14. 14
Architecture Deep Dive: vCenter Server
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
Inv Serv (Inventory Service)
• Cache of DB data
• Stores extension data
(SRM, PBSM)
• Satisfies Web client queries
• Helps with Linked Mode search
• Contains embedded DB
IO crucial: install on different
spindles from vpxd
Multi-threaded: CPU/mem
important
App Server (Web Client Server)
• Satisfies web client requests
• Forwards to Inv Serv, SSO, etc.
• Spawns remote console service
1-1.5 CPUs should be enough
Log
vctomcat
15. 15
Architecture Deep Dive: vCenter Server
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
vctomcat
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
SSO (Single-sign on)
• C/C++ plus Java-based STS
(secure-token service)
• Handles authentication
• Communicates with AD, etc.
Vctomcat
• Contains Health service
• Contains SRS
• Stats reporting service for
overview perf charts
• Retrieves data from DB
• Contains EAM
• ESX Agent Manager for
manager VMs
Log
16. 16
Architecture Deep Dive: vCenter Server
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
Tomcat
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
Log (Log Browser service)
• Allows log viewing in web client
PBSM (Policy-based storage mgr)
• Contains SMS + policy engines
• Satisfies “Storage View” queries
from clients
• Every 2 hrs, queries DB and Inv
Serv for most up-to-date data
Can be CPU/Mem-intensive
during queries
Log
17. 17
vCenter Server Resource Usage
vctomcat: SRS, EAM, Health, etc. Inventory Service
Web Client App Server
and remote console
PBSM
STS
Log Browser
18. 18
vCenter Server Performance Considerations (1 of 2)
Resource requirements
• Many new services
• Need sufficient CPU and Memory
• May need to tune JVM heap sizes according to inventory size
• Rules of thumb (Unofficial…please check documentation):
• Small setups (< 1000 VMs): 2-4 vCPUs, 8-12GB
• Medium setups (< 4000 VMs): 4-8 vCPUs, 12-24GB
• Large setups (> 4000 VMs): 8-16 vCPUs, 24-32GB
• Embedded database for Inventory Service
• IO requirements higher (2-3K IOPs depending on load)
• Place on its own spindles (separate from other services)
• Consider SSDs
19. 19
vCenter Server Performance Considerations (2 of 2)
Inventory Structure
• Single datastore/datacenter/network can sometimes be vCenter bottleneck
• Several smaller clusters may be better than 1 big cluster
• Spreading hosts/networks/datastores across different datacenters relieves
some bottlenecks
20. 20
Architecture Deep Dive: vCenter-to-Database Interactions
VPXD
DB
VC talks to
DB when…
1. Persisting statistics
(5-minute intervals)
2. Persisting config
changes (e.g., host
syncs)
higher when
more tasks
3. Answering certain
UI queries (e.g.,
cluster/datacenter
charts, historical stats
queries like past-day,
past-week, etc.)
4.Persisting version
information (for inv svc)
ESXi + HostD + VPXA
STORAGE
NETWORK
DB also performs these tasks:
• Stats Rollups: VPX_HIST_STATX
• 30 minutes, 2 hours, 1 day
• Purging stats
• when entities deleted
• Purging events (if auto-purge configured)
• Purging tasks (if auto-purge configured)
• TopN computation
• 10 min, 30 min, 2 hours, 1 day
• Satisfying SMS data refresh for Storage
views (every 2 hours)
21. 21
DB Performance Considerations (1 of 2)
Latency to DB important (often more so than ESX-to-VC latency)
• Almost everything involves the DB…
• Stats persistence
• Certain UI queries
• Updating configuration information
• Historical queries (events, alarms, task history)
• …
Recommendation:
Place DB and vCenter close together
Note: DB and vCenter on different hosts/VMs allows for independent
sizing and tuning
22. 22
DB Performance Considerations (2 of 2)
DB traffic is write-mostly
• Stats inserts and rollups, version updates, config changes, purges
• Sufficient disk subsystem needed. If SSDs are an option, use them (2K IOPs)
Manage database disk growth
• Majority of DB data is “SEAT” data (Stats, events, alarms, tasks): 80-85% (10s
of GBs or more in big setups)
• Inventory data: 10-15% of data (usually < 10GB for large inventories)
• Choose stats levels wisely to avoid excessive growth
• Utilize automatic purging of event/task tables if possible
Recompute DB stats on highly-volatile tables (at least once a day)
• VPX_PROPERTY_BULLETIN
• VPX_TOPN*
23. 23
Architecture Deep Dive: Client Interactions
• C# VI client refreshes frequently
• Induces load on vpxd
More clients, more load
• Web client
• Does not auto-refresh
• Read requests satisfied by
app server, not vpxd
Less load on vpxd
• API clients
• If listening to subset of
inventory/properties, small
load on vCenter
• Limit of 2000 sessions to
vCenter: includes all clients +
remote console
App server: Can put in same geo or
on same server as Inv Svc
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
Tomcat
…
Java
Inv
Serv
…
SSO
PBSM
Log
24. 24
Client Considerations
Clients add load
• If you aren’t using a session, log out
Web Client App Server can go in same server as Inventory Service
• Small resource footprint
• Low latency to inventory service
For API clients, try to be a good citizen
• Avoid frequent/expensive DB calls
• Example: frequency createEventHistoryCollector with complex EventFilterSpec
• Monitor specific inventory items or properties, not all entities and all properties
• Log out when you are done (don’t waste sessions!)
25. 25
Client Notes: Simple Example of “Bad” Client in PowerCLI
Example of a good vs. bad client in PowerCLI
PowerCLI:
• Simple to use, but involves client-side filtering
• Example: Get-VM gets all VMs from server, filters list @ client
$vmList = Get-VM –name “vm1”,”vm2”,”vm3”,”vm4”
Good: 1 server call, client throws away all but vm1,vm2,vm3,vm4
$nameList = “vm1”,”vm2”,”vm3”,”vm4”
foreach ($name in $nameList) {
Get-VM $name
}
Bad: 4 server calls, gets all VMs 4 times…excess client/server work
Also: Please log out when you are done!
26. 26
vCenter Architecture: Summary (Whew!)
ESXi + HostD + VPXA
STORAGE
NETWORK
VPXD
DB
App
Server
Health SRS
vSphere
Web
Clients
VI Clients
Update
Manager
Converter
AD
API
Clients
…
Java
Inv
Serv
…
vCenter server
SSO
PBSM
Log
vctomcat
28. 28
You Say n VMs/Hosts, but I Can Only Reach N. Why?
How we set limits
Create a ‘large environment’
Attach clients, solutions, etc.
Run management operations (clones, powerOps, etc.)
Measure latency and throughput
Why your setup may not reach our scale
Different stats level
Different device configuration of hosts/VMs (e.g., # of datastores)
Different DB configuration (less memory, different recovery mode)
Different latencies from VC-to-ESX or VC-to-DB
Viewing Different Client Pages
Accumulating events and tasks vs. purging them
Each might stress your vCenter/DB/network etc. more than ours
29. 29
How Many Concurrent Operations Can I Perform? (1 of 2)
vCenter hard limits
• 640 concurrent operations before incoming requests are queued
• 2000 concurrent sessions (incoming requests plus remote console sessions)
Per-host or per-datastore limits
• A host can perform up to 8 provisioning operations at once
(provisioning = clone, VMotion, relocate)
• If host is source and destination, host can only do 4 operations at once
• A datastore can perform up to 128 VMotions at once
• A datastore can perform up to 8 Storage VMotions at once
• Limits can be changed, but changes are not officially supported
Other limits
• Datacenter/host/datastore synchronization at VC can limit concurrency
30. 30
vCenter Concurrency (2 of 2)
Clone VM from host A to host B
Each host can participate in 7
other provisioning operations
Clone VM from host A to host A
Host A can only participate in 6
more operations
vCenter
Host A
VM 1
Host B
VM 2
Cost to A: 1 Cost to B: 1
vCenter
Host A
VM 1 VM 2
Cost to A: 2
Do not use a single host as the source of all clones (i.e., spread out templates)
Better disk performance and better concurrency
31. 31
Why Should I Upgrade from VC5.0?
One big reason: In 5.1 and 5.5, stats tables are partitioned
• Stats inserts more efficient (into a small partition at a time)
• Rollups more efficient (plus, amount of data rolled up at once is throttled)
• Stats data purging more efficient (simply truncating a partition)
• vCenter can support higher stats levels for longer periods of time
• Still recommend running higher stats levels (2-4) only for temporary troubleshooting
Inserts
Rollups
Purge
32. 32
What Is the Real Dirt on Stats Levels?
Changing stats levels increases load on the database
Rough rules of thumb (not official VMware recommendations)
• Level 1 stats: per-VM and per-host aggregate stats
• Level 2 stats: additional per-VM/per-host stats
4x or more stats than Level 1 depending on configuration
• Level 3 stats: per-instance stats
6x or more stats than Level 2 depending on configuration
• Level 4 stats: additional rollup types
1.4x more stats than Level 3 depending on configuration
• Use the stats calculator in vCenter
• Try to use higher stats levels only for temporary debugging
• If the stat you want is at the wrong level, let us know
• Consider VCOps for more advanced stats functionality?
33. 33
Should I Distribute VC Services across VMs? (1 of 2)
You can distribute services (Inv Svc, SSO, vpxd, DB) to multiple
VMs, but…
• Better performance when vpxd and Inv Svc are co-located
• Better performance when Web Client service and Inventory Service are
close together
• Better performance when vpxd and DB are close together
34. 34
Should I Distribute VC Services across VMs? (2 of 2)
Typical deployment pre-5.1
• VC and assorted services in 1 VM
• VC DB in another VM
Will still work fine with VC 5.5
Another suggestion
• Put all in 1 VM
• Make sure VM has sufficient CPU/Memory/Disk/Network
(follow best practices)
• Put Inventory Service partition on separate spindles from vpxd and DB
• Put DB partition on separate spindles
• Advantage: looks ahead to future ‘single-VM’ appliance
35. 35
Why Are Cluster/Datacenter Charts Sometimes Slow?
These charts are computed on the fly
They require collection of data from hosts and VMs
A single slow host can hurt performance
37. 37
When Should I Use Multiple vCenters?
Considerations
• Have you exceeded the single host limit?
• Do you want one vCenter per geography?
• Do you want one vCenter per organizational boundary?
(finance, engineering, etc.)
• Do you want a primary and secondary site (e.g., SRM)?
• Do you prefer to manage smaller VCs?
38. 38
Single Site with Multiple vCenters
ESX
ESX
ESX
vCenter Server
ESX
ESX
ESX
vCenter Server
AD
VI
Client
API
Client
Important Considerations
How do I decide how many
vCenters I need?
(Consider vCenter limits,
Organizational boundaries)
Do I want a single view of
inventory managed by all
vCenters?
How do I synchronize
roles/permissions across
vCenters?
VI
ClientVI
Client
API
ClientAPI
Client
Site A
Yes? Consider
“linked mode” …
39. 39
Linked Mode
Single pane of
glass from UI for
inventory data
Search across
VC instances
Unified roles and
permissions via AD
41. 41
Multiple vCenters in a Single Site in Linked Mode
VI
Client
API
ClientVI
ClientVI
Client
API
ClientAPI
Client
ESX
ESX
ESX
vCenter Server
ESX
ESX
ESX
vCenter Server
ESX
ESX
ESX
vCenter Server
AD
Site A
Important
Considerations:
• At most 10
vCenters can be
linked together
• Does not work
on vCenter Server
Appliance (ADAM
Replication)
• Cross-vCenter
operations not
available
• API not linked
mode aware
42. 42
Linked Mode and Single Sign-On Considerations
Linked Mode
• Should I use linked mode across multiple sites?
• Business units that have computing needs across data centers
• What impact does bandwidth have on cross site linked mode?
• Except for query federation, linked mode sites only communicate via ADAM
Linked mode adds minimal cross-site network overhead over multi-site without linked
mode
Bandwidth tradeoffs same as for multi-site vCenters without linked mode
Single Sign-On
• Extend the vSphere authentication domain across sites
• Use Domain accounts for permissions instead of Local OS
• Define replication partners for WAN replication
44. 44
Looking Ahead (No Timelines…)
Many things, but a few main ones:
• Single VM vCenter appliance that can support increasing scale and federation
• Improved performance and scalability
• Operations across VC (like cross-VC VMotion)
45. 45
Conclusion
Single vCenter…some key takeaways
• Services can be placed in the same VM
• IO performance is critical for vCenter and inventory service
• DB provisioning is critical
• VC-to-DB latency is important
Multiple vCenters…Why?
• Exceeding single vCenter limits
• Organizational boundaries
• Security and compliance
• Local/remote administration
Should I use linked mode?
• Single pane of glass from UI? Yes (but also possible with just Web Client…)
• Synchronized roles? Yes
46. 46
Performance Community Resources
Performance Technology Pages
• http://www.vmware.com/technical-resources/performance/resources.html
Technical Marketing Blog
• http://blogs.vmware.com/vsphere/performance/
Performance Engineering Blog VROOM!
• http://blogs.vmware.com/performance
Performance Community Forum
• http://communities.vmware.com/community/vmtn/general/performance
Virtualizing Business Critical Applications
• http://www.vmware.com/solutions/business-critical-apps/
48. 48
Don’t miss:
vCenter of the Universe – Session # VSVC5234
Monster Virtual Machines – Session # VSVC4811
Network Speed Ahead – Session # VSVC5596
Storage in a Flash – Session # VSVC5603
Big Data:
Virtualized SAP HANA Performance, Scalability and Practices –
Session # VAPP5591
49. 49
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1304
vSphere Performance Optimization
Group Discussions:
VSVC1001-GD
Performance with Mark Achtemichuk