From L3 to seL4: What have we learnt in 20 years of L4 microkernels

From L3 to seL4:
What Have We Learnt in
20 Years of L4 Microkernels?
Kevin Elphinstone & @GernotHeiser
NICTA and UNSW Australia
SOSP'13

Copyright Notice
These slides are distributed under the Creative Commons
Attribution 3.0 License
• You are free:
– to share—to copy, distribute and transmit the work
– to remix—to adapt the work
• under the following conditions:
– Attribution: You must attribute the work (but not in any way that
suggests that the author endorses you or your use of the work) as
follows:
• “Courtesy of Gernot Heiser, [Institution]”, where [Institution] is one of
“UNSW” or “NICTA”
The complete license text can be found at
http://creativecommons.org/licenses/by/3.0/legalcode
©2013 Gernot Heiser, NICTA 2
COMP9242 S2/2014 W01

1993
SOSP'13
Improving IPC
by Kernel
Design [SOSP]

400
300
200
100
0
Mach
0 2000 4000 6000
Message Length [B]
[μs]
L4
1993 IPC Performance
SOSP'13
115 μs
5 μs
i486 @
50 MHz
Culprit:
Cache
footprint
[SOSP’95] raw copy

IPC Performance over 20 Years
Name Year Processor MHz Cycles μs
Original 1993 i486 50 250 5.00
Original 1997 Pentium 160 121 0.75
L4/MIPS 1997 R4700 100 86 0.86
L4/Alpha 1997 21064 433 45 0.10
Hazelnut 2002 Pentium 4 1,400 2,000 1.38
Pistachio 2005 Itanium 1,500 36 0.02
OKL4 2007 XScale 255 400 151 0.64
NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11
seL4 2013 i7 Haswell (32-bit) 3,400 301 0.09
seL4 2013 ARM11 532 188 0.35
seL4 2013 Cortex A9 1,000 316 0.32
SOSP'13

Core Microkernel Principle: Minimality
A concept is tolerated inside the
microkernel only if moving it outside
the kernel, i.e. permitting competing
implementations, would prevent the
implementation of the system’s
required functionality. [SOSP’95]
SOSP'13

Minimality: Source Code Size
Name Architecture C/C++ asm total kSLOC
Original i486 0 6.4 6.4
L4/Alpha Alpha 0 14.2 14.2
L4/MIPS MIPS64 6.0 4.5 10.5
Hazelnut x86 10.0 0.8 10.8
Pistachio x86 22.4 1.4 23.0
L4-embedded ARMv5 7.6 1.4 9.0
OKL4 3.0 ARMv6 15.0 0.0 15.0
Fiasco.OC x86 36.2 1.1 37.6
seL4 ARMv6 9.7 0.5 10.2
SOSP'13

L4 Family Tree
L4/MIPS
L4/Alpha
L3 → L4 “X” Hazelnut Pistachio
OKL4 Microvisor
Fiasco Fiasco.OC
UNSW/NICTA
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13
seL4
OKL4 μKernel
Codezero
P4 → PikeOS
L4-embed.
GMD/IBM/Karlsruhe NOVA
Dresden
Commercial Clone
OK Labs
SOSP'13
API Inheritance
Code Inheritance

L4 Deployments – in the Billions
SOSP'13
SiMKo 3 “Merkelphone”

seL4: Unprecedented Dependability
Proof Proof Proof
Non-interference
[S&P’13]
SOSP'13
Integrity
Abstract
Model
C Imple-mentation
Confiden-tiality
Availability
Binary
code
Functional
correctness
[SOSP’09]
Integrity
[ITP’11]
Translation
correctness
[PLDI’13]
• First & only OS kernel
with security proofs to
binary code
• First & only protected-mode
OS kernel with
sound timeliness analysis
Timeliness
[RTSS’11,
EuroSys’12]

L4 Design and Implementation
Implement. Tricks [SOSP’93]
• Process kernel
• Virtual TCB array
• Lazy scheduling
• Direct process switch
• Non-preemptible
• Non-portable
• Non-standard calling
convention
• Assembler
Design Decisions [SOSP’95]
• Synchronous IPC
• Rich message structure,
arbitrary out-of-line messages
• Zero-copy register messages
• User-mode page-fault handlers
• Threads as IPC destinations
• IPC timeouts
• Hierarchical IPC control
• User-mode device drivers
• Process hierarchy
• Recursive address-space
construction
SOSP'13
Objective: Minimise cache footprint and TLB misses

DESIGN
SOSP'13

L4 Synchronous IPC
SOSP'13
Thread1
Running Blocked
Thread2
Blocked Running
Send (dest, msg)
Wait (src, msg)
…....
Kernel
copy
Rendezvous
model
Kernel executes in sender’s context
• copies memory data directly to
receiver (single-copy)
• leaves message registers unchanged
during context switch (zero copy)

“Long” IPC
Sender address space
Kernel copy
Receiver address space
LONG IPC
ABANDONED
• IPC page faults are nested exceptions ⇒ In-kernel concurrency
– L4 executes with interrupts disabled for performance, no concurrency
• Must invoke untrusted usermode page-fault handlers
– potential for DOSing other thread
• Timeouts to avoid DOS attacks
– complexity
Page fault!
Why have long IPC?
• POSIX-style APIs
write (fd, buf, nbytes)
• Usually prefer shared buffers
SOSP'13

Timeouts
IPC Timeouts
ABANDONED
in seL4, OKL4
SOSP'13
Thread1
Running Blocked
Thread2
Blocked Running
Send (dest, msg)
….... Wait (src, msg)
Kernel
copy
Limit IPC
blocking
time
Thread1
Running Blocked
Rcv(NIL_THRD, delay)
…....
Timed
wait
• No theory/heuristics for
determining timeouts
• Typically server reply
with zero TO, else ∞
• Can do timed wait with
timer syscall

Synchronous IPC Issues
SOSP'13
Thread1
Running Blocked
Initiate_IO(…,…)
IO_Wait(…,…)
Not
generally
possible
Worker_Th
Running Blocked
IO_Th
Blocked Running
Unblock (IO_Th) ….... Call (IO,msg)
Sync(Worker_Th)
Sync(IO_Th) …....
• Sync IPC forces multi-threaded code!
• Also poor choice for multi-core

Asynchronous Notifications
SOSP'13
…....
Thread1
Running Blocked
Thread2
Blocked Running
w = Poll (…)
…... w = Wait (…)
Send (Thr _ 2 , …) …....
Send (Thr_2, …) • Delivers few bits (destructively)
• Kernel only buffers single word
• Maps well to interrupts, exceptions
Server
Client Driver Sync Async
• Thread can wait for
synchronous and
asynchronous messages
concurrently
Sync IPC
complemented
with async

IPC Destination Naming
Client Server
Thread IDs
replaced by
IPC “endpoints”
(ports)
SOSP'13
IPC
Client Server
Load
balancer Workers
Client Server
All IPCs
duplicated!
Original L4
addressed IPC
to threads
Client must do
load balancing?
RPC reply from
wrong thread!
• Inefficient designs
• Poor information hiding
• Covert channels [Shapiro ‘02]

Endpoints
Client Server
Client Server
SOSP'13
IPC
Send
Rcv
Sync EP
• Sync EP queues senders/receivers
• Does not buffer messages
0x01
0x10
0x30
Async EP
00xx0013011 • Async EP accumulates bits

Other Design Issues
IPC Control: “Clans & Chiefs” Process Hierarchy
SOSP'13
IPC
Chief
Clan
IPC outside clan
re-directs to chief
Create
Hierarchical
resource
management
Inflexible, clumsy,
inefficient hierarchies!
Hierarchies
replaced by
delegatable cap-based
access
control

IMPLEMENTATION
SOSP'13

Virtual TCB Array
Thread ID
SOSP'13
VM TCB TCB
Fast TCB &
stack lookup
TCB
proper
Kernel
stack
Get own
TCB base
by masking
stack pointer
Trades cache for TLB footprint
and virtual address space
• Not worthwhile on
modern processors!
• Stacks can dominate
kernel memory use!
Trades TLB
footprint
for cache
and kernel
memory

“Lazy” Scheduling
• In IPC-based systems, threads
block and unblock frequenty
• Many ready queue manipulations
SOSP'13
thread_t schedule() {
foreach (prio in priorities) {
foreach (thread in runQueue[prio]) {
if (isRunnable(thread))
return thread;
else
schedDequeue(thread);
}
}
return idleThread;
}
Idea: leave blocked
threads in ready
queue, scheduler
cleans up
Scheduler execution
time is unbounded!
“Benno scheduling”:
• All threads on ready queue
are runnable
• All runnable threads in ready
queue except the running one

L4 Design and Implementation
Implement. Tricks [SOSP’93]
• Process kernel
• Virtual TCB array
• Lazy scheduling
• Direct process switch
• Non-preemptible
• Non-portable
• Non-standard calling
convention
• Assembler
Design Decisions [SOSP’95]
• Synchronous IPC
• Rich message structure,
arbitrary out-of-line messages
• Zero-copy register messages
• User-mode page-fault handlers
• Threads as IPC destinations
• IPC timeouts
• Hierarchical IPC control
• User-mode device drivers
• Process hierarchy
• Recursive address-space
construction
SOSP'13

What are the Principles?
• Minimality is excellent driver of design decisions
– L4 kernels have become simpler over time
– Policy-mechanism separation (user-mode page-fault handlers)
– Device drivers really belong to user level
– Minimality is key enabler for formal verification!
• IPC speed still matters
– But not everywhere, premature optimisation is wastive
– Compilers have got so much better
– Verification does not compromise performance
– Verification invariants can help improve speed! [Shi, OOPSLA’13]
• Also found that capabilities are the way to go
– Shapiro (EROS) was right
• However, a clean abstraction of time still elusive
SOSP'13

Conclusions
Where to find more:
• UNSW Advanced Operating Systems Course
http://www.cse.unsw.edu.au/~cs9242
• NICTA Trustworthy Systems research
http://trustworthy.systems
• seL4 open-source portal
http://sel4.systems
• L4 Microkernel Headquarters
http://l4hq.org
• Gernot’s blog:
http://microkerneldude.wordpress.com/
• Gernot’s research home page:
http://ssrg.nicta.com.au/people/?cn=Gernot+Heiser
SOSP'13
• Details changed, but principles remained
• Microkernels rock! (If done right!)

From L3 to seL4: What have we learnt in 20 years of L4 microkernels

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From L3 to seL4: What have we learnt in 20 years of L4 microkernels

Similar to From L3 to seL4: What have we learnt in 20 years of L4 microkernels (20)

Recently uploaded

Recently uploaded (20)

From L3 to seL4: What have we learnt in 20 years of L4 microkernels