Electricity charge for operating data centers is reaching approximately 27% of total operation cost. For this reason, ARM servers have been getting more attention for future energy-efficient data centers and the performance of ARM processors keeps increasing (i.e., almost 3GHz). For efficiently utilizing ARM cores, ARM PVH has been introduced in Xen 4.3, and based on this, we have implemented live migration feature and evaluated on top of dualcore ARM board. More specifically, we choose multimedia streaming workload, measure the maximum concurrent clients, and calculate clients per watt (CPW) as the performance metric. From this, we have found out that even dualcore ARM processor (with virtualization) gives higher CPW (7 CPW) over x86 case (6 CPW). In addition we could reduce the energy consumption around 70% (4-to-1 consolidation for low-loaded servers) by using server consolidation.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung
1. Performance Evaluation of Live
Migration based on Xen ARM PVH
for Energy-efficient ARM Server
2013-10-24
Jaeyong Yoo, Sangdok Mo, Sung-Min Lee,
ChanJu Park, Ivan Bludov, Nikolay Martyanov
Software R&D Center
Samsung Electronics
Software Center
2. Contents
• Motivation
• Live Migration in Xen ARM PVH
– Design and Implementation
• Performance Evaluation
1. Streaming service with ARM vs. x86
2. Streaming server consolidation with live migration
3. Streaming service with quad-core ARM board
• Concluding Remark
Software Center
4. Energy Problem in Datacenters
• Datacenters eat up magnificent amount of electricity
Racks (3%)
Electricity
(27%)
Space (17%)
Cooling
Equipment
(6%)
Service
(13%)
Power Equipment
(17%)
Datacenter operation cost
Software Center
Engineering &
Installation (19%)
Ref: Jaroslav Rajić, ``Evolving Toward the Green Data Center,’’
http://stack.nil.si/ipcorner/GreenDC/#chapter2
5. ARM Servers
for Future Green Data Center
•
Economical choice
•
Vendors of ARM Server Soc
•
OS for ARM Servers
– Significant advantage in compute/watt
– AMD: Seattle (64-bit ARM server processor, 2H 2014)
– Calxeda: ECX-1000
– Applied Micro: X-Gene
– Linaro LEG
– Redhat deploys ARM-Based Servers for Fedora Project
Applied micro
X-Gene
AMD Seattle: 64-bit ARM server
Software Center
Calxeda Energy Core ECX-1000
6. ARM Servers
for Future Green Data Center
•
Economical choice
•
Vendors of ARM Server Soc
•
OS for ARM Servers
– Significant advantage in compute/watt
– AMD: Seattle (64-bit ARM server processor, 2H 2014)
– Calxeda: ECX-1000
– Applied Micro: X-Gene
– Linaro LEG
– Redhat deploys ARM-Based Servers for Fedora Project
Further energy efficiency maximization:Applied micro
AMD Seattle: 64-bit ARM server
Server consolidation by virtualization
Software Center
Calxeda Energy Core ECX-1000
X-Gene
8. Overall Architecture
• Components for Live Migration in Xen ARM PVH
libvirt
Dom0
DomU
perform-migrate
xl
libxl
apache
mysql
libxc
streaming
server
ARM-migrate
Kernel
Kernel
Legend
get dirtybitmap
dirty-page
detecting
Memory data
save/restore
Memory map
get/set
VCPU
save/restore
Hardware
(Arndale)
Software Center
HVM context
save/restore
Hypervisor
suspend
/resume
Cortex-A15 Dualcore 1.7 GHz, 2GB Memory, SATA3, USB3.0
Existing
module
Newly
Impleme
nted
Modified
module
9. Sequence of Live Migration
migration source
DomU
Suspend
VCPU
save
HVM
save
dirty
bitmap
migration destination
memory memory
dirty
xc
detection
save
get map
xl
xl
xc
memory memory
set map restore
migratedomain receive domain
-restore
- save
HVM
restore
VCPU
restore
DomU
resume
get/set
memory map
store
dirtypages
start
dirtypaging
get dirty bitmap
loop until
stop-condition
save/restore
memory contents
suspend
domU
last-dirty
pages
save/restore
HVM
save/restore
VCPU
resume
DomU
Software Center
10. Major Hypercalls for Live Migration
Implemented Hypercalls for Enabling Live Migration Feature in Xen ARM PVH
Functions
Hypercalls
Description
Memory Migration
XENMEM_get/set_memory_map
• Save/restore physical memory
map of DomU
XEN_DOMCTL_shadow_op
• Enable dirty-page detection
• Get dirty-page bitmap
XENMEM_add_to_physmap_range
• Access the domU’s memory
from dom0
VCPU Migration
XEN_DOMCTL_get/setvcpucontext
• Save/restore the vcpu registers
HVM Migration
XEN_DOMCTL_get/sethvmcontext
• Save/restore the hvm contexts
(e.g., timer, interrupt controller)
Software Center
11. Dirty-page Tracing: Get-dirty Bitmap
libxc
ARM-migrate
XEN_DOMCTL_
shadow_op
(peek dirtypages)
hypercall param from toolstack:
dirty-page bitmap
get dirty-page
bitmap
Filling up the
dirty-page bitmap
Software Center
Temporary dirty-page
storing
dirty pages
candidates:
1. Embedded in page table
(use un-used bits in PTE)
2. Linked list of PFNs
3. Bitmap of PFNs
Dirty-page
detecting
12. Dirty-page Tracing: Dirty-page Detection
guest VA
Guest page table
Level 1
Level 2
Level 3
IPA
domu kernel
Xen page table
Level 1
Level 1
Level 2
Level 2
Level 3
Xen-side for
Xen itself
Software Center
p2m: physical to machine
page table
Level 3
MA
Xen-side
for domu
13. Dirty-page Tracing: Dirty-page Detection
guest VA
Guest page table
Level 1
Level 2
Level 3
IPA
domu kernel
w=0
Xen page table
Level 1
Level 2
Level 2
Level 3
Xen-side for
Xen itself
Software Center
write
bit=0/1
Level 1
Level 3
PTE
MA
Xen-side
for domu
14. Dirty-page Tracing: Dirty-page Detection
guest VA
write request
Guest page table
Level 1
Level 2
Level 3
IPA
domu kernel
w=0
Xen page table
Level 1
write
bit=0/1
Level 1
Level 2
Level 2
Level 3
Xen-side for
Xen itself
Software Center
Level 3
MA
fault
traped by
xen
PTE
Xen-side
for domu
16. Manual Walking of p2m Table
IPA
Xen-side for
Xen itself
Xen-side
for domu
Level 1
Level 1
Level 2
Superpage checking
Level 2
Level 3
PTE
w bit modification
PTE
Level 3
MA
create a
mapping to Xen
(3 times)
physical memory
(a.k.a. machine memory)
Software Center
17. Virtual-linear Page Table
• Consider third-level page table as a continuous
memory block in virtual address space
※ virtually continous third-level page table
(8GB DomU requires 16MB third-level page table)
virtual memory
Xen page table
Lev
el 1
3lvl PT
#2
※ guest’s third-level
page table
Software Center
3lvl PT
#1
3lvl PT
#5
Lev
el 2
Lev
el 3
physical memory
(a.k.a. machine memory)
ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
18. Virtual-linear Page Table
• Consider third-level page table as a continuous
memory block in virtual address space
※ virtually continous third-level page table
(8GB DomU requires 16MB third-level page table)
virtual memory
for given IPA, with some arithmetic,
calculate the Xen VA and just read
it!
3lvl PT
#2
※ guest’s third-level
page table
Software Center
3lvl PT
#1
3lvl PT
#5
Xen page table
Lev
el 1
Lev
el 2
Lev
el 3
physical memory
(a.k.a. machine memory)
ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
20. Experiment Environment
(Hardware/Software)
•
x86 hardware
–
–
–
•
8 cores (i7-2600 3.4GHz)
Intel 1Gbps NIC
4GB memory
•
Xen source: Xen 4.4 staging
Domain kernels:
– Dom0: Linaro kernel 3.11
– DomU: Linaro kernel 3.9
Streaming server:
–
ARM
–
–
–
–
–
•
•
Arndale board
2 cores
1Gbps Network card (USB 3.0)
SSD mSATA
2GB memory
Exp. Platform 2
ffserver (RTSP streaming)
Exp. Platform 1
Streaming
Server
Exp. Platform 2
Streaming
Server
Streaming
Server
Linux
Linux
Linux
xen
x86
HW
clients
Software Center
1G switch
power
source
Arndale
board
220v power
Power meter (Yokogawa WT3000)
21. Experiment Environment
(Hardware/Software)
•
x86 hardware
–
–
–
•
8 cores (i7-2600 3.4GHz)
Intel 1Gbps NIC
4GB memory
•
Xen source: Xen 4.4 staging
Domain kernels:
– Dom0: Linaro kernel 3.11
– DomU: Linaro kernel 3.9
Streaming server:
–
ARM
–
–
–
–
–
•
•
Arndale board
2 cores
1Gbps Network card (USB 3.0)
SSD mSATA
2GB memory
Exp. Platform 2
ffserver (RTSP streaming)
Exp. Platform 1
Streaming
Server
Streaming
Server
Exp. Platform 2
Streaming
Server
Linux
Linux
Linux
Note: Major evaluations are performed within
mobile-featured ARM board.
Performance evaluation of server-featured ARM
Arndale
x86
board is presented at the end of the slides.
HW
clients
Software Center
1G switch
power
source
xen
board
220v power
Power meter (Yokogawa WT3000)
22. Experiment Environment
(Scenarios)
Test case 1: Streaming service
with ARM vs. x86
Saturate the streaming server to
get the maximum number of
streaming clients
Test case 2: Streaming server consolidation
with live migration
10% of the maximum number of streaming clients
Measurement 1:
Measurement 1:
Measurement 3:
Maximum number of streaming
clients for each test platform
Energy-efficiency comparison
for each test platform
Total live migration time,
service downtime
Measurement 2:
Measurement 2:
Energy-efficiency comparison
for each test platform
Streaming server consolidation
within xen-virtualized servers
Test case 3: Streaming with
quad-core ARM board
Software Center
Maximum clients with varying
number of ARM cores
(in-progress)
Measurement 4:
Dirty-page detection time,
dirty-page get-bitmap time,
total dirty-page counts
23. Case 1: Streaming Service ARM vs. x86
(Maximum capacity of ARM virtualized Server)
• Max streaming clients with varying number of VMs
– Dual-core ARM board
– Single VCPU for each VM
Number Per VM
of VMs Memory
Max Streaming
Clients
Watt
1
512MB
around ~110
14.8
2
512MB
around ~80
12.6
3
256MB
around ~90
14.5
4
256MB
around ~80
11.8
Software Center
Finding:
ARM cores are major bottleneck
point
24. Case 1: Streaming Service ARM vs. x86
(Energy-efficiency comparison to x86 hardware)
• Compare with the best case of ARM* virtualization
OS
Total
memory in
server
Max
Streaming
Clients
Watt
Client/Watt
Required
memory
x86 with
Linux
4GB
~750
121.5 W
6.17 CPW
~ 2.4GB
ARM with
native Linux
2GB
~200
11.7 W
17.09 CPW
~ 707MB
ARM with
virtualization
512MB
~110
14.8 W
7.43 CPW
~ 340MB
* Dual-core ARM CPU
Software Center
Finding:
Even dual-core ARM with virtualization
show higher CPW than x86
25. Case 2: Streaming Server Consolidation of
ARM virtualized server
• Scenario:
– 4 ARM boards, each running a 256MB VM
– Each VM has 10 clients
– Consolidate all VMs to one ARM board, and turn off other 3
ARM boards
Watts before
consolidation
Watts after
consolidation
Energy saving
percentage
2 to 1
consolidation
2 x 8w = 16w
8.6w
46% saving
[extrapolated]
3 to 1
3 x 8w = 24w
8.9w
63% saving
[extrapolated]
4 to 1
4 x 8w = 32w
9.4w
71% saving
Software Center
Finding:
Server consolidation can significantly save
energy consumption
26. Case 2: Live Migration Performance
• Migrate a VM at a time
– With different domU memory size (128MB, 256MB, 512MB)
• Measurements:
– Live migration time
• Whole time for live migration
– Total dirty pages
• Number of dirtied pages during the time of live migration
Software Center
27. Case 2: Live Migration Performance
• Number of dirty-pages in iterations
configuration for
stop-condition
max iter: 29
max_mem_factor: 3
min_dirty_per_iter: 50
Software Center
28. Case 2: Service downtime due to live
migration
• Service downtime
– The time that VM is not responding to outside interaction
– Measurement method:
• flood-ping to migrating domain
• time difference between packets send from the migrating domain
Software Center
29. Case 2: Performance of dirty-page
detection
• Measure the elapsed time of two major functions
– dirty-page detection
– dirty-page collection
Software Center
30. Case 3: Quad-core ARM board
(In-progress)
• ARM board: 4 ARM cores with 8GB memory
Number
of VMs
Per VM
Memory
Max Streaming
Clients
Watt
CPW
1
1GB
~ 120
17.0 W
7.06 CPW
2
1GB
~250
18.5 W
13.51 CPW
3
1GB
~300
18.9 W
15.87 CPW
• x86 case: (see slide 24)
OS
Total memory
Max Streaming
Clients
Watt
Client/Watt
x86 with
Linux
4GB
~750
121.5 W
6.17 CPW
Software Center
31. Concluding Remark
• ARM server is a good candidate for green data
centers
– Even ARM mobile processors with virtualization
results in better CPW compared to x86
– Virtualization in ARM servers can leverage the
energy efficiency by server consolidation
• Pass-through to DomU could significantly
increase the performance
Software Center