Infrastructure-as-a-Service compute clouds provide a flexible hardware platform on which customers host applications as a set of appliances, e.g., web servers or databases. Each appliance is a VM image containing an OS kernel and userspace processes, within which applications access resources via traditional APIs such as POSIX. However, the flexibility provided by the hypervisor comes at a cost: the addition of another layer in the already complex software stack which impacts runtime performance, and increases the size of the trusted computing base.
Given that modern software is generally written in high-level languages that abstract the underlying OS, we revisit how these appliances are constructed with our Mirage operating system. Mirage supports the progressive specialisation of source code, and gradually replaces traditional OS components with customisable libraries, ultimately resulting in "unikernel" VMs: sealed, fixed-purpose VMs that run directly on the hypervisor.
Developers no longer need to become sysadmins, expert in the configuration of all manner of system components, to use cloud resources. At the same time, they can develop their code using their usual tools, only making the final push to the cloud once they are satisfied their code works. As they explicitly link in components that would normally be provided by the host OS, the resulting unikernels are also highly compact: facilities that are not used are simply not included in the resulting microkernel binary.
This talk will describe the architecture of Mirage, and show a quick demonstration of how to build a web-server that runs as a unikernel on a standard Xen installation.
Mirage: extreme specialisation of virtual appliances
1. Mirage: OCaml Appliances
for Xen Clouds
Anil Madhavapeddy, University of Cambridge
with a merry crew
Haris Rotsos, Balraj Singh, Steven Smith
Jon Crowcroft, Steve Hand
(University of Cambridge)
Richard Mortier (University of Nottingham)
Thomas Gazagnaire (OCamlPro)
Dave Scott (Citrix Systems R&D)
openmirage.org @avsm
Monday, 27 August 12
2. Modern Stacks are Too Large
Application • Millions of lines of code & configuration
• Why build for clouds as for desktops?
Threads • We can simplify!
Language
Processes
OS Kernel
Hypervisor
Hardware
openmirage.org
Monday, 27 August 12
3. Modern Stacks are Too Large
Application • Millions of lines of code & configuration
• Why build for clouds as for desktops?
Threads • We can simplify!
Language
Processes Application Application
OS Kernel Language Language
Hypervisor Xen POSIX
Hardware Hardware Hardware
openmirage.org
Monday, 27 August 12
4. Modern Stacks are Too Large
Application • Millions of lines of code & configuration
• Why build for clouds as for desktops?
Threads • We can simplify!
Language
Processes Application Application
t
t
en
en
OS Kernel Language Language
m
ym
op
lo
Hypervisor Xen POSIX
el
ep
ev
D
D
Hardware Hardware Hardware
openmirage.org
Monday, 27 August 12
5. How Large is Large?
600
OCaml
500 C/C++
Lines of code (x 10 )
3
ASM
400
300
200
100
0
Li
gl
Bi
ht
O
O
N
M
O
nu
ib
pe
pe
ira
tp
nd
X-
c
d
x
nS
n
ge
9.
2.
za
2.
3.
vS
9.
SH
15
4.
2.
ku
w
0
2
2
itc
6.
h
0p
openmirage.org
1.
1
4.
0
Monday, 27 August 12
6. Why Do We Care?
• Critical memory bugs still occur regularly in mature software
• CVE-2012-1182 – Samba
– RPC code generator overflow
– Variable containing buffer length checked independently
of variable used to allocate memory for buffer
– Leads to root exploit
• CVE-2012-2110 – OpenSSL
– Integer conversion bug mixed with realloc wrappers
– Unsigned cast to signed, but realloc’d buffer not clamped
– Leads to heap corruption
openmirage.org
Monday, 27 August 12
7. The Cloud Threat Model
• Attack from without, not within: trust
hypervisor, but not I/O traffic or other VMs.
• VMs are no longer multi-user, but primarily
single-purpose deployments
• …and they are in a multi-tenant datacenter
• …and they are always network connected
openmirage.org
Monday, 27 August 12
9. Key Features of Mirage
• Static typing
– Pragmatic functional language (OCaml)
– Large set of libraries provided, some used in XCP already
Net.Manager.bind
(fun mgr dev !
let src = ’any_addr, 53 in
Dns.Server.listen dev src zones
)
openmirage.org
Monday, 27 August 12
10. Key Features of Mirage
• Static typing
– Pragmatic functional language (OCaml)
– Large set of libraries provided, some used in XCP already
• Cooperative concurrency
– Wrapped up in Lwt syntax extensions
– Threads encapsulated and hidden within typed modules
let main () =
lwt zones = read key "zones" "zone.db" in
Net.Manager.bind (fun mgr dev !
let src = ’any_addr, 53 in
Dns.Server.listen dev src zones)
openmirage.org
Monday, 27 August 12
11. Key Features of Mirage
• Static typing
– Pragmatic functional language (OCaml)
– Large set of libraries provided, some used in XCP already
• Cooperative concurrency
– Wrapped up in Lwt syntax extensions
– Threads encapsulated and hidden within typed modules
• Library-based operating system
– Fully re-entrant functional libraries
• No dynamic loading
– Configuration evaluated at compile-time, and sealed
– Recompile and redeploy to reconfigure
openmirage.org
Monday, 27 August 12
12. Progressive Specialization
Develop Test Deploy
ubuild posix-socket ubuild posix-direct ubuild xen-direct
kernel sockets tuntap+safe I/O stack safe device drivers
bytecode VM x86_64 native code link time optimisation
ELF REPL ELF ELF ELF μ μ μ μ μ μ
Linux Linux FreeBSD Xen
openmirage.org
Monday, 27 August 12
13. Progressive Specialization
• Development under UNIX:
Develop Test Deploy
– full debugging environment
ubuild posix-socket
(bytecode, debugger, xen-direct
ubuild posix-direct ubuild
gdb)
kernel sockets tuntap+safe I/O stack safe device drivers
– can use kernel sockets or tuntap
bytecode VM x86_64 native code and IO to optimisation
networking link time isolate bugs
ELF REPL ELF interactive REPLμfor editor μ μ
– ELF ELF μ μ μ
integration and other niceties.
Linux Linux FreeBSD Xen
openmirage.org
Monday, 27 August 12
14. Progressive Specialization
• Testing:
Develop Test Deploy
– Simulation
ubuild posix-socket ubuild posix-direct ubuild xen-direct
backend using
NS3/MPI
kernel sockets tuntap+safe I/O stack safe device drivers
bytecode VM
– Emulate your x86_64 native code link time optimisation
code at scale
before REPL
ELF ELF ELF ELF μ μ μ μ μ μ
deploying it
Linux Linux FreeBSD Xen
openmirage.org
Monday, 27 August 12
15. Progressive Specialization
• Xen output kernel is specialised:
Develop Test
– “operating system” is a set of Deploy
libraries (e.g Netfront, Blkfront,
ubuild posix-socket ubuild posix-direct ubuild xen-direct
TCP) linked with small C runtime.
– Configuration files are evaluated at
kernel sockets tuntap+safe I/O stack safe device drivers
this stage, and image must be
bytecode VM x86_64 native code
recompiled to reconfigure. link time optimisation
– Data files can also be linked in
relatively small. ELF ELF
directly, ifREPL
ELF ELF μ μ μ μ μ μ
– Dead-code elimination is across
Linux Linux FreeBSD Xen
whole OS image.
openmirage.org
Monday, 27 August 12
17. Optional VM Seal Hypercall
• Single address-space and no dynamic loading
– W^X address space
– Address offsets are randomized at compile-time
• Dropping page table privileges:
– Added freeze hypercall called just before app starts
– Subsequent page table updates are rejected by Xen.
– Exception for I/O mappings if they are non-exec and do
not modify any existing mappings.
• Very easy in unikernels due to focus on compile-time
specialisation instead of run-time complexity.
openmirage.org
Monday, 27 August 12
18. Event Driven co-threads
Cumulative frequency (%)
100
80
60
40 xen-direct
20 linux-native
linux-pv
0
0 0.05 0.1 0.15 0.2
Jitter (ms)
The microkernel is event-driven, with no pre-emption at all.
Graph shows CDF of thread wakeup latency for a Mirage VM
running directly on Xen, vs native or PV Linux userspace.
openmirage.org
Monday, 27 August 12
19. Single-instance Thread Scaling
4 linux-pv
Execution time (s)
linux-native
3 xen-direct (malloc)
xen-direct (extent)
2
1
0
0 5 10 15 20
Number of threads (millions)
Threads are heap allocated values, so benefit from the faster
garbage collection cycle in the Mirage Xen version, and the
scheduler can be overridden by application-specific needs.
openmirage.org
Monday, 27 August 12
20. Appliance Image Size
Appliance Standard Build Dead Code Elimination
DNS 0.449 MB 0.184 MB
Web Server 0.674 MB 0.172 MB
Openflow learning switch 0.393 MB 0.164 MB
Openflow controller 0.392 MB 0.168 MB
All configuration and data compiled into the
image by the toolchain, so no separate VBD
required beyond the PV kernel.
Live migration is easy and fun :-)
openmirage.org
Monday, 27 August 12
21. Microbenchmarks: Boot Time
linux-pv minimal-linux xen-direct
3
Time (s)
2
1
0
8
16
32
64
12
25
51
10
20
30
8
6
2
24
48
72
Memory size (MiB)
• Minimal Linux is a custom initrd which directly calls ioctls to
send a UDP packet. The Linux-PV is more realistic.
• The Xen-Direct is a standard Mirage VM, and is limited by
dom0 toolstack latency (XCP in this case).
openmirage.org
Monday, 27 August 12
22. Zero-Copy IO Buffer Management
free
IO Page Pool
recycle
get_page make_writebuf channel output_buf reset_buf
uninitialised reference
Grant Table
fixed data
mutable data Ring response
request
openmirage.org
Monday, 27 August 12
23. Microbenchmarks: TCP
1000
Throughput (reqs/s x 10 )
3
linux-pv, tx & rx
800 linux-pv tx, xen-direct rx
xen-direct tx, linux-pv rx
600
• Simple throughput test with all
400 offload turned off to stress CPU
• Performance bug in TX path
200
– …being fixed (ACKs being sent
out of order)
0
10 flo
1 flo
si w
pa ws
ng
ra
le
lle
l
openmirage.org
Monday, 27 August 12
25. Microbenchmarks: Block Storage
• Same Ring API as Net
– Unlike POSIX (libaio and so on)
– Caching is controlled via libraries, not API
• No reordering in the front-end
– How do we know what the backend storage is?
70
60 Single spindle SATA Linux domU
Throughput (MiB/s)
50 direct/buffered
40
30
20 Mirage
10
0
1
2
4
8
16
32
64
12
25
51
10
20
openmirage.org 40
8
6
2
24
48
96
Monday, 27 August 12 Block size (KiB)
26. Microbenchmarks: Summary
• Mirage is comparable to or exceeds Linux PV performance
– I.e., the functional programming language
overheads are offset via single-address space
specialization and careful buffer management.
• But what about in some more “realistic” scenarios?
– DNS server appliance
– HTTP server scaling across 6 vCPUs
– OpenFlow Controller and Switch implementations
openmirage.org
Monday, 27 August 12
27. DNS Server Performance
80
Throughput (reqs/s x 10 )
3
70
60 Bind9, Linux
50 NSD, Linux
NSD, MiniOS -O
40
NSD, MiniOS -O3
30 Mirage (no memo)
20 Mirage (memo)
10
0
100 1000 10000
Zone size (entries)
openmirage.org
Monday, 27 August 12
28. DNS Server Performance
let main () =
lwt zones = read key "zones" "zone.db" in
Net.Manager.bind (fun mgr dev →
let src = ’any_addr, 53 in
Dns.Server.listen dev src zones
)
openmirage.org
Monday, 27 August 12
30. Openflow Controller performance
180
160 maestro
Requests/s (x 10 )
3
140 nox-fast
120 xen-direct
100
80
60
40
20
0
Batch Single
• Openflow controller is competitive with Nox (C++), but much
more high-level. Applications can link directly against the
switch to route their data.
openmirage.org
Monday, 27 August 12
31. Design Discussion
• Service domains should be built this way
– Debugging and moving from dom0/domU is a lot easier.
– Dependencies are explicitly managed.
– Resource control can be fine-grained.
– Very useful for unit testing Xen features!
• Which language?
– Several viable alternatives: HalVM (Haskell), GuestVM
(Java), ErlangOnXen. More code sharing worthwhile?
– Protocol implementations take time to mature.
– Working on Mirage/XCP integration via “project Windsor”
• Stub domains?
– Less practical for stub domains, due to hardware devices.
Turn Linux into a library OS?
openmirage.org
Monday, 27 August 12
34. Mirage Roadmap
• Developer Release for Q3 2012
– First release will have pure-OCaml implementations of:
• Device drivers (netfront/blkfront/xenstore)
• TCP/IPv4 and DHCPv4
• HTTP
• DNS(SEC)
• SSH
• OpenFlow (controller/switch)
• vchan IPC
• 9P :-)
• NFS
• FAT32
• Distributed k/v store: arakoon.org
openmirage.org
Monday, 27 August 12
35. Mirage Roadmap
• Online resources:
– http://www.openmirage.org
– http://tutorial.openmirage.org
– https://lists.cam.ac.uk/mailman/listinfo/cl-mirage
– (draft in Q4 2012) http://realworldocaml.org
• Offline resources:
– Find me at the hotel pool
openmirage.org
Monday, 27 August 12