Self-awareness and Adaptive Technologies: the Future of Operating Systems?
1. Self-‐awareness
and
Adap/ve
Technologies:
the
future
of
opera/ng
systems?
Lamia
Youseff*
fos
team:
Nathan
Beckmann,
Harshad
Kasture,
Charles
Gruenwald
III,
Adam
Belay,
David
Wentzlaff**,
Lamia
Youseff,
Jason
Miller,
Anant
Agarwal
And
e-‐fos
team
in
collabora/on
with
the
Angstrom
team.
Live fos web server - http://fos.csail.mit.edu
* Lamia worked on fos while she was a postdoctoral associate with the Carbon group at CSAIL, MIT. 1
** David worked on fos while he was a Ph.D. candidate with the Carbon group at CSAIL, MIT.
2. Emerging
Computer
Architectures
Provide
Opportuni/es
for
OS
n Clouds
n Huge
compu/ng
power
n Scalability
with
a
“click”
n Mul/core
n Large
number
of
cores
n Very
fast
on-‐chip
core-‐
to-‐core
communica/on
2
All logos and trademarks appearing in this presentation are the property of their respective owners.
3. What
is
Wrong
with
Current
OSs?
n Unclear
how
to
scale
contemporary
OSs
on
Mul/core
and
Manycore
n Programming
on
an
Infrastructure
as
a
Service
(IAAS),
Amazon
EC2
style,
Cloud
is
a
challenge
3
4. Linux
Does
not
Scale
–
Page
Alloca/on
Example
n Physical
Page
alloca/on
(toy
example)
n Allocate
physical
pages
as
quickly
as
possible
n Examine
the
scalability
of
where
the
/me
goes
n Precision
/mers
&
Oprofile
n 16
core:
quad
–
quad
Intel
server
ON
EACH
CORE:
void
main(void)
{
char
*
big_array;
long
long
count;
big_array
=
malloc(MEMORY_SIZE_IN_MB
*
MB_IN_BYTES);
for
(count
=
0;
count
<
((MEMORY_SIZE_IN_MB
*
MB_IN_BYTES)/
PAGE_SIZE_IN_BYTES);
count++)
{
big_array[count
*
PAGE_SIZE_IN_BYTES]
=
count
%
256;
}
}
4
5. Lock
Conten/on
Dominates
Run/me
12
get_page_from_freelist
__pagevec_lru_add_active
10 __rmqueue_smallest
free_pages_bulk
rmqueue_bulk
main
8
page_fault Lock
Cycles (in Billions)
handle_mm_fault contention
other
6
clear_page_c
4
Architectural
2 overhead
Useful
work
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of Cores 5
6. Problem
with
Current
Day
Clouds
App
App
App
App
App
App
App
App
App
App
App
Linux
Linux
Linux
Linux
Linux
Linux
Linux
SMP
Linux
Linux
Linux
Linux
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
VM
Hypervisor
Hypervisor
Hypervisor
Hypervisor
Core
0
Core
1
Core
2
…
Core
0
Core
1
Core
2
…
Core
0
Core
1
Core
2
…
Core
0
Core
1
Core
2
…
Mul/core
Blade
Mul/core
Blade
Mul/core
Blade
Mul/core
Blade
Network
Switch
Independent
Linux
VM
instances
cobbled
together
in
an
ad-‐hoc
manner
at
applica/on
layer
6
7. OSes
Need
to
be
Rethought
n Shared
memory
and
locks
lead
to
poor
scalability
n Non-‐composable
due
to
locks
–
hard
to
modify
or
add
services
n OS
and
applica/on
fight
for
in-‐core
cache
n OS
relies
on
shared
memory,
which
does
not
work
across
a
cloud
Manycore moving from a
few cores to 100 s of cores
n No
resilience
to
faults;
single
failure
causes
OS
to
crash
7
8. Outline
n fos
Introduc/on
n Vision,
Structure
and
Architecture
n Anatomy
of
a
malloc()
n Scalability
in
fos
n Fleets
of
servers
n Example:
File
System
Stack
n Why
adaptable
and
self-‐aware
OS
services?
n Adaptability
in
fos
n Hybrid
Messaging
System
n Elas/c
Fleets
n Enabling
Automa/c
Self
Awareness
in
fos
8
9. What
is
fos?
A
novel
elas/c
opera/ng
system
for
mul/cores
and
clouds
n Scalability
and
reliability
for
future
App
mul/cores
with
100’s
to
1000’s
of
cores
fos
n Provides
single-‐system
image
across
cloud
machines
Hypervisor
Hypervisor
n Dynamically
grow
and
shrink
Core
0
Core
1
Core
0
Core
0 Core
0 2 Core
0
Core
1
Core
0
Core
0 Core
0 2
services
to
meet
applica/on
Mul/core
Blade
Mul/core
Blade
demands
Provides
typical
OS
services:
n Process
management,
scheduling
of
mul/ple
applica/ons,
virtual
memory
management,
protec/on,
file
systems,
networking
Also
provides
special
mul/core
services
n Process
migra/on,
fault
resilience,
dynamic
goal
op/miza/on,
elas/c
resource
alloca/on/dealloca/on
An
Opera)ng
System
as
a
Service
(OSaaS)
9
10. fos
Structure
File
read
Core
PS
FS
PS
FS
PS
Need
new
User I/O
page
App
PS
PS
FS
PS
n Internet-‐inspired
structure,
based
on
messaging.
n OS
is
collec/on
of
services
(e.g.
name
service,
page
service
PS,
file
service
FS)
n Each
service
is
implemented
by
a
fleet
of
distributed
servers
n Each
server
is
bound
to
a
core
n Applica/on
cores
message
a
par/cular
server
core
to
u/lize
service
n Server
cores
collaborate/communicate
to
implement
needed
OS
service
10
11. fos
Architecture
n Microkernel
PS PS nif PS FS
n Executes
on
all
cores
blk File
system
FS FS server
n Provides
protec/on
mechanism
app3
microkernel
but
not
policy
app1
PS NS PS FS PS
n Provides
memory
management
NS Name
Server
mechanism,
not
policy
NS
microkernel
n Naming
app4 NS
Applica/on
5
n Translates
symbolic
names
to
app2
app5
File System Server, fleet member
libfos
physical
des/na/on
Block Device Driver
microkernel
n Standard
high-‐level
API
hides
Process Management server, fleet member
Name server, fleet member
loca/on
Network Interface
n Provides
one-‐to-‐many
maps
Page allocator, fleet member
Applications
n Provides
redirec/on
n Provides
resilience
when
a
server
crashes
11
12. fos
Architecture
n System
Servers
PS PS nif PS FS
n Internet
Inspired
Servers
blk File
system
FS FS server
n Fleets:
Co-‐opera/ng
processes
app3
microkernel
that
provide
a
service
app1
PS NS PS FS PS
n Communicate
only
via
NS Name
Server
messaging
NS
microkernel
n Facilitates
transparent
app4 NS
communica/on
inter-‐machine
or
app2 app5 Applica/on
5
intra-‐machine
File System Server, fleet member
libfos
n Easily
move
server
processes
Block Device Driver
microkernel
anywhere
Process Management server, fleet member
Name server, fleet member
n Messaging
Network Interface
Page allocator, fleet member
n uk
level
messaging
for
core-‐to-‐ Applications
core
communica/on.
n Light-‐weight
user
space
messaging.
n Inter-‐machine
vs
intra-‐machine
communica/on.
12
13. Anatomy
of
a
malloc
Call
in
fos
Applica/on
1 Paging
Proxy-‐ Proxy-‐ fos
Server
(b)
System
Network
Network
libc
2 Server
Server
Server
6
libfos
msg
msg
3
Microkernel
Microkernel
Microkernel
Microkernel
Microkernel
4 Namecache
Namecache
Namecache
Namecache
Namecache
5
Hypervisor
Hypervisor
Core
1
Core
2
Core
3
Core
1
Core
2
Mul/core
Blade
Mul/core
Blade
14. Anatomy
of
a
malloc
Call
in
fos
when
Server
is
on
Remote
Machine
in
the
Cloud
Applica/on
1 fos
Server
(a)
Proxy-‐ Proxy-‐ Paging
7
Network
Network
libc
System
2 Server
Server
6
Server
11
libfos
8
msg
msg
msg
msg
3
9
Microkernel
Microkernel
Microkernel
Microkernel
Microkernel
10
4 Namecache
Namecache
Namecache
Namecache
Namecache
5
Hypervisor
Hypervisor
Core
1
Core
2
Core
3
Core
1
Core
2
Mul/core
Blade
Mul/core
Blade
15. Scalability
in
fos
n The
key
three
design
guidelines
n Inspired
by
the
Internet,
OS
is
built
as
collec/on
of
services
(e.g.,
naming
and
resource
discovery,
scheduler,
file
system)
n Each
service
is
fully
distributed
with
no
global
shared
state
or
locks
–
implemented
as
a
collec/on
of
coopera/ng
servers
n OS
servers
are
bound
to
cores
n Eliminates
interference
between
OS
and
applica/on
15
16. fos
Scalable
Fleet
example:
File
System
Fleet
File
System
Applica/on
1
Server
libfos
File System fleet
microkernel
microkernel
File
System
coordinator
File
System
Server
microkernel
microkernel
File
System
Applica/on
2
Server
libfos
microkernel
microkernel
PS PS FS PS FS app1
FS blk
FS FS
app4
app6 FS
PS PS FS PS
FS
app5 app2
app3
16
17. Outline
n fos
Introduc/on
n Vision,
Structure
and
Architecture
n Anatomy
of
a
malloc()
n Scalability
in
fos
n Fleets
of
servers
n Example:
File
System
Stack
n New
Challenges
for
today’s
OS
n Adaptability
in
fos
n Hybrid
Messaging
System
n Elas/c
Fleets
n Enabling
Automa/c
Self
Awareness
in
fos
17
18. New
Challenges
for
today’s
OS
n Unprecedented
amounts
of
resources
and
variables
App 1 App 2 App 3
n Increased
number
of
resources
heartbeat,
=>
increased
number
of
failures
goals algorithm System call
n Unprecedented
variability
in
demand
n For
the
same
applica/on,
its
Memory File Device
performance
characteris/cs
Scheduler
Manager System Drivers
and
requirements
usually
change
during
it
run/me
(e.g.
allocated
number
of
core)
speed
cache size,
power
miss
associativity
n Different
simultaneously-‐ rate
execu/ng
applica/ons
have
Cache
different
performance
I/O
requirements
(e.g.
a
memory-‐ Disk
intensive
vs
a
computa/onal
Core Devices
intensive
app).
18
19. Building Adaptability and Self-awareness
into fos
Heartbeat App 1 App 2 App 3
heartbeat,
goals algorithm System call
Analysis & Observe
FOS: A Self-Aware
Optimization
Engine
Operating
Memory File Device
Perf. Models Scheduler
System
Manager System Drivers
Learner
Decide Act
voltage, freq,
speed
activity, cache size,
power
miss
power, temp precision rate associativity
App 2
Cache Cache App 1
I/O
App 3 Disk
Core Core DRAM Devices
19
20. Building
Automa/c
self-‐awareness
into
fos
n fos Analysis and Optimization
Observe
Vital Signs
Engine.
n A closed feedback loop of (e.g. heartbeats)
ODA.
Actuators
n Each component provides a Decision
Decide
Act
new OS service: Engine • Applications (e.g. algorithms)
• OS sub-systems (e.g.
(e.g. performance growing fleets)
n The Observer provides vital models, control • Hardware components (e.g.
signs services. system, AI learner) frequency scaling)
n The Decision engine provides
performance models and AI Analysis & Optimization Engine
learning engine.
n The actuators changes system
status.
n Each component is either
implemented as a fos fleet Analysis &
Optimization
or integrated into another Engine
fleet.
20
21. Sefos:
Building
self-‐awareness
into
fos
Vital Signs Fleet
Observe
n Observes the status of software Vital Signs
and hardware components.
(e.g. heartbeats)
n Provides a global knowledge-
base of the system Actuators
Decision Engine
n Implemented as a fleet, storing
the status information in a
distributed data object Analysis & Optimization Engine
n key-value store
n Collects measurements from:
1) Applications
e.g. Apps heartbeats, App. specific measurements (fps in a video
encoder or flops in a scientific app).
2) OS subsystems
e.g. Utilization ratio of the file system fleet.
3) Hardware components
e.g. temperature, core frequency, power, cache miss rate.
21
22. Sefos:
Building
self-‐awareness
into
fos
Decision Engine
Vital Signs
n A new OS service in fos.
n Implemented through a fleet of
Decide
Decision Engine
servers for scalability. Actuators
(e.g. performance
n Process input from the vital models, control
signs service. system AI learner)
n Provides three approaches to
runtime decision making: Analysis & Optimization Engine
• Machine learning
• Classical control theory
• Performance Models
Uniform API across the mechanisms allowing them to be switched at runtime &
composable.
22
23. Sefos:
Building
self-‐awareness
into
fos
Analysis & Optimization Engine
Actuators
Vital Signs
n Action that allows the system to
adapt at runtime.
Actuators
n Three sets of actions, based on Decision Engine
Act
their impact: • Applications
• OS sub-systems
• Hardware components (e.g.
• Application actions frequency scaling)
• Allocating or de-allocating cores to
an application
• Switching between algorithms
• Migrating processes for better data
locality and cache usage
• OS services actions
• Growing and shrinking fleets
• Migrating servers
• Hardware actions
• Frequency scaling to save power
• Implemented
as
extension
to
fos
fleets,
ac/ons
for
hardware
through
OS
tools
or
applica/on-‐specific
ac/ons.
23
24. Adaptability
in
fos
n Fos
adapts
to
the
varying
demand
in
mul/core
and
clouds:
n Hybrid
Messaging
n Elas/c
Fleets
24
25. I.
Messaging
in
fos
n Messaging
in
fos
provides
IPC
through
message-‐
passing
abstrac/on
n Mailbox-‐based,
each
associated
with
name
and
capability
n Can
be
implemented
via
a
variety
of
mechanisms,
mul/plexed
via
libfos
library
layer
n Transparent
to
the
applica/on
n fos
currently
supports
three
mechanisms
25
26. Messaging
Mechanisms
in
fos
n Kernel
messaging
n Implemented
in
microkernel
via
shared
memory
n Messages
are
sent
by
trapping
into
the
microkernel
n User-‐level
messaging
n A
channel-‐base
mechanism
via
URPC
n Runs
en/rely
in
user-‐space
n Lower
messaging
latency
at
the
cost
of
ini/al
overhead
for
channel
crea/on
n Inter-‐machine
messaging
n Message
goes
through
the
proxy
server
n Proxy
server
routes
the
message
to
the
correct
machine
26
27. Hybrid
Messaging
n fos
adapts
automa/cally
using
various
messaging
mechanisms
1 x 105
Cumulative cycles
5 x 104
Number of messages 27
28. II.
Elas/c
Fleets
n fos
fleets
can
grow
and
shrink
to
meet
varying
demand
n Fleet
can
grow
n
New
servers
are
spawned
on
new
cores
n The
naming
service
provides
load-‐balancing
n Requests
directed
to
the
new
fleet
member
n Fleet
can
shrink
n Load
is
distributed
to
other
fleet
members
28
29. fos
Name
Service
Enables
Elas/city
PS
PS
FS1
PS
NS
File
System
Service
Elas=city
Example
User
App
1. Name
server
points
to
File
Server
1
2. Applica/on
request
rate
rises;
File
PS
PS
FS2
PS
Service
spawns
a
new
File
Server
2
3. Name
Server
directs
file
system
accesses
to
the
new
server
transparent
to
applica/on
Name
Service
enables
service
fleets
to
grow
elas/cally
with
demand
29
29
30. fos
File
System
Service
Elas/city
n fos
services
can
grow
and
shrink
to
match
demand
n A
synthe=c
benchmark
app
makes
requests
at
a
varying
rate;
n 2
Clients,
each
repeatedly
opens
a
file,
reads
1
KB
and
closes
the
file
n Clients
increase
request
rate
un/l
midpoint
then
decrease
30
n fos
filesystem
grows
from
1
to
2
servers
to
match
demand,
then
shrinks
as
demand
lessens
31. fos
File
System
Service
Elas/city
1 server
n fos
services
can
grow
and
shrink
to
match
demand
n A
synthe=c
applica=on
makes
requests
at
a
varying
rate
n fos
filesystem
grows
from
1
to
2
servers
to
match
demand,
then
shrinks
as
demand
lessens
31
32. fos
File
System
Service
Elas/city
1 server
2 servers
n fos
services
can
grow
and
shrink
to
match
demand
n A
synthe=c
applica=on
makes
requests
at
a
varying
rate
n fos
filesystem
grows
from
1
to
2
servers
to
match
demand,
then
shrinks
as
demand
lessens
32
33. fos
File
System
Service
Elas/city
Grow fleet to 2 servers
Shrink fleet to 1 server
1 server
2 servers
Elastic fleet
n fos
services
can
grow
and
shrink
to
match
demand
n A
synthe=c
applica=on
makes
requests
at
a
varying
rate
n fos
filesystem
grows
from
1
to
2
servers
to
match
demand,
then
shrinks
as
demand
lessens
33
34. An Early Prototype of
a self-aware fs service
n Observer: Leverage fs fleet utilization to indicate the load of the OS subsystems.
n Decision Engine: Use simplified high- and low-watermarks.
n Actions: grow and shrink the fleet
System throughput trans million cycles
4
50
3
40
Fleet size
30 2
20
1
10
0
0 2000 4000 6000 8000 10 000
34
Number of transactions
35. Conclusions
n New many-cores and clouds architectures present new OS challenges for the
OS community:
n Scalability
n Varying demand
n fos is a new factored operating system that is:
n Addressing scalability by the fleets design
n Addressing varying demands through the optimization and analysis engine
n We designed the analysis and optimization engine as a new OS service in the
form of a close ODA loop
n We demonstrated how adaptability and self-awareness can be achieved through
the analysis and optimization engine in two toy prototypes
n Hybrid messaging
n Elastic fs fleet
35
36. Ques/ons
live
fos
web
server
at
h0p://fos.csail.mit.edu
36
38. Elas/c
fleets
example:
fos
file
system
1 Server 2 Servers Elastic fleet
Throughput transactions 106 cycle
4 Grow
Shrink
3
2
1
0
0 1 109 2 109 3 109 4 109 5 109 6 109
Cycles elapsed
n 2
Clients,
each
repeatedly
opens
a
file,
reads
1
KB
and
closes
the
file
n Clients
increase
request
rate
un/l
midpoint
then
decrease
38
39. Anatomy
of
a
malloc
Call
in
fos
Applica/on
1 Paging
Proxy-‐ Proxy-‐ fos
Server
(b)
System
Network
Network
libc
2 Server
Server
Server
6
libfos
msg
msg
3
Microkernel
Microkernel
Microkernel
Microkernel
Microkernel
4 Namecache
Namecache
Namecache
Namecache
Namecache
5
Hypervisor
Hypervisor
Core
1
Core
2
Core
3
Core
1
Core
2
Mul/core
Blade
Mul/core
Blade
40. Anatomy
of
a
malloc
Call
in
fos
when
Server
is
on
Remote
Machine
in
the
Cloud
Applica/on
1 fos
Server
(a)
Proxy-‐ Proxy-‐ Paging
7
Network
Network
libc
System
2 Server
Server
6
Server
11
libfos
8
msg
msg
msg
msg
3
9
Microkernel
Microkernel
Microkernel
Microkernel
Microkernel
10
4 Namecache
Namecache
Namecache
Namecache
Namecache
5
Hypervisor
Hypervisor
Core
1
Core
2
Core
3
Core
1
Core
2
Mul/core
Blade
Mul/core
Blade
41. Overview of Sefos ODA
O:
Heartbeats services (Sensors or Observers): i.e.
knowledge-base of the system.
n Two sets of APIs:
n Collecting status updates from apps and system services
n Responding to queries about the apps and sys services
status
D:
Decision engine service (decision or controllers):
n Generalized form of the scheduling services
n Deploy Control theoric and AI models to make decisions.
A:
Variables (Actions or Actuators); e.g. leveraging :
n elasticity: growing and shrinking fleets.
n Spatial locality: by migrating process.
n Improving performance: by controlling the number of
cores allocated to a service/ app.
n Changing cores frequency: to save overall power
42. O: heartbeats Service
n A new OS service that keeps the current status of the
system
n App processes statistics
n Other system fleets statistics (utilization level,
incoming queue length, etc )
n Hardware components status (temp, cores freq, etc)
n Implemented as a fleet of servers.
n Functionality:
1. Collects the heartbeats from different applications and
other system service,
Via heartbeats API, e.g.
2. Responds to inquiries about overall system status or
specific application heartbeats
Via heartbeats_monitor API, e.g.
43. D: Decision Engine
n Another OS service that is a generalized form of the
scheduling service.
n Can maximize overall performance, minimize power or
other goals.
n Query the heartbeats service for system status.
n Make decision using:
n A closed feedback loop to decide on best course of
action to achieve a certain goal
n Works best for changing one variable at a time
n Model can become very difficult to understand as
number of goals and number of variables increase.
n Alternatively, an AI model
n Works best for changing several variables at a time
44. A: Actions
n Each sub-system provide a set of actions (variables)
and
n Three types of actions, according to their direct
impact…
n App Actions; e.g.
n Allocating or de-allocating cores to an application
n Migrating App processes for better spatial allocation
§ To move process to data: improve cache efficiency.
§ To decrease communication overhead (in asymmetric
machines)
n System Services Actions; e.g.
n Leverage fleet elasticity: grow and shrink the fleet
n Migrate servers to improve spatial locality
n Hardware Actions; e.g.
n Frequency scaling to save power