On occasion, Expedient customers request large numbers of vCPUs be added to their virtual machines hosted on Expedient’s Virtual Colocation platform. Most of these requests stem from a perception that more vCPUs will automatically increase virtual machine performance or are the recommendation from a vendor that usually deals with physical hardware installations. However, adding additional vCPUs to a virtual machine can actually have a negative performance impact on the virtual machine.
1. www.expedient.com
CPU Ready and Virtual
Colocation:
The restaurant will probably have an
assortment of table sizes available with
2 seats being the most popular. For
larger parties, they will either have larger
tables or push together smaller tables to
accommodate the larger group.
The larger your party, the less opportunity
you will have for a seating area compared
to a party of 2. You will either have to wait
for a large table, or wait until enough small
tables in the same area are available. The
length of your wait will vary depending on
the other parties in the restaurant. CPU
scheduling works in much the same way
as restaurant seating.
On occasion, Expedient customers request large numbers of vCPUs be added to their virtual machines hosted
on Expedient’s Virtual Colocation platform. Most of these requests stem from a perception that more vCPUs will
automatically increase virtual machine performance or are the recommendation from a vendor that usually deals
with physical hardware installations. However, adding additional vCPUs to a virtual machine can actually have a
negative performance impact on the virtual machine.
Visualization
1
To help understand this scenario a bit better, let’s imagine going to eat at a
popular restaurant on a Saturday night. If it is a busy night, you are more than
likely going to be waiting for a table. You arrive to the restaurant and get
placed onto a waiting list for your table.
1
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
4
Seats
4
Seats
4
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
2
Seats
12
Seats
8
Seats
8
Seats
4
Seats
Host / Hostess
Waiting List
for Tables
The larger your party, the
less opportunity you will
have for a seating area
compared to a party of two.
Figure 1
2. CPU Ready
We can correlate a virtual CPU (vCPU) to each
person in a dinner party, and the dinner party to a
virtual machine. As each vCPU of a virtual machine
requests resources (table), it is placed into a queue
that is processed by the CPU Scheduler service
(host/hostess) on the ESXi host (restaurant).
Just as you wouldn’t want your party of 8 to be
separated between 4 different 2 person tables, the
CPU Scheduler has to schedule all of the vCPUs in
a virtual machine to run concurrently. The CPU
Scheduler’s job is to pair the vCPU to a physical
CPU core* (pCPU). The time that elapses between
the instruction being placed into the queue by
the vCPU and the instruction being executed by
the pCPU is called CPU Ready. As CPU Ready times
increase, the performance of the virtual machine
decreases.
Figure 2 shows the time slots that an 8 core ESXi
host has available to the CPU Scheduler.
In the 6 timeslot example in Figure 2, an instruction
from a virtual machine with a single vCPU can be
executed in any of the 48 available pCPU timeslots.
In comparison, an 8 vCPU virtual machine would
only have 6 opportunities to be executed since
all 8 pCPUs would need to be available at the
same time to process its instruction, even if some
of the vCPUs were idle. As the number of vCPUs
increase, the opportunities for execution decrease,
which increases the time the instruction is waiting
to be scheduled, thus increasing CPU Ready and
decreasing performance.
This is the reason that right-sizing your virtual
machines is so important. An oversized virtual
machine with too many vCPUs is similar to a party
of 4 going into a restaurant and asking for a table
that will seat 8 or 12. While this can be done, you
will be waiting longer for a larger table to become
available just to have empty seats doing absolutely
nothing.
2
As the number of vCPUs increase,
the opportunities for execution
decrease, which increases the
time the instruction is waiting
to be scheduled, thus increasing
CPU Ready and decreasing
performance.
2
*In this case, hyperthreading is ignored due to the fact that
the hypervisor attempts to avoid scheduling vCPUs from
the same virtual machine on threads of the same physical
CPU core.
Figure 2
3. The Expedient Virtual Colocation infrastructure is
provisioned to customers as dedicated resources. A
customer that subscribes to 24 units ofVirtual Colocation
is provided 24 GB of memory and 6 GHz of CPU. If a single
pCPU is capable of 3 GHz, that customer is essentially
provided 2 full pCPUs. However, this customer would
still be able to assign more than 2 vCPUs to a single
virtual machine. All 16 pCPUs would still be available for
scheduling, but no more than 6 GHz could be used at any
one time. If an 8 vCPU virtual machine were provisioned,
the maximum that each vCPU could run simultaneously
wouldbe25%. Inthisscenario,thevirtualmachinewould
have increased performance by assigning 2 vCPUs that
were able to run at 100% utilization. The performance
increase would come from more opportunities for 2
vCPUs to be scheduled versus 8 vCPUs being scheduled.
3
Recommendations
If possible, start with a single vCPU.
Monitor the CPU usage over time and if it is
consistently using over 80-90% of the vCPU,
then assign an additional vCPU. Repeat the
monitoring, adding additional vCPUs only as
required.
Scale applications out instead of up. If a
virtual machine is consuming all of the vCPUs
assigned, considering scaling the application
across multiple virtual machines if feasible.
There will be more scheduling opportunities
for 2x 2 vCPU virtual machines versus a single
4 vCPU virtual machine.
Only assign multiple vCPUs to virtual
machines hosting applications that run
multithreaded processes.
Limit the number of vCPUs to the number
of pCPUs of a single CPU socket. The CPU
Scheduler will perform faster if it only has to
schedule the cores across a single physical
CPU socket.
Virtual
Colocation &
CPU Limits
www.expedient.com
3
3GHz
0.75
GHz
6 GHz Total Used
3GHz
6 GHz Total
Used
8 vCPU with 6 GHz Limit
2 vCPU with 6 GHz
Limit
Figure 3