Presented at the 14th Workshop on Hot Topics in Operating Systems (HotOS XIV)
It presents Celias, a new concurrent programming model for data-intensive scalable computing. It aims to devise a new large-scale computation framework for complex algorithms, with elastic scalability and automatic fault tolerance.
The paper can be found here: http://www.eecs.berkeley.edu/~sangjin/static/pub/hotos2013_celias.pdf
2. Review:
MapReduce
and
Friends
Input
Output
Computa0on
map
filter
group
by
reduce
join
…
2
3. Review:
MapReduce
and
Friends
Input
Output
Computa0on
map
filter
group
by
reduce
join
…
Observa(on
1:
Bulk
transforma(on
of
immutable
data
(no
fine-‐grained
updates)
3
4. Example
1:
Sparse
Opera0ons
• k-‐hop
reachability
with
itera0ve
MapReduce
4
5. Example
1:
Sparse
Opera0ons
• k-‐hop
reachability
with
itera0ve
MapReduce
Graph
Source
node
MR
1-‐hop
nodes
5
18. Exis0ng
frameworks
assume:
Our
work:
1.
Bulk
transforma0on
of
immutable
data
Fine-‐grained
opera0ons
on
mutable
data
2.
Sta0c
dataflow
Dynamic,
data-‐dependent
control
flow
Yet
we
s0ll
want
elas0c
scalability
and
fault
tolerance
18
22. Data
Models
for
Mutable
Shared
Memory
Global
address
space:
UPC,
X10,
Fortress…
Too
low
level
22
23. Data
Models
for
Mutable
Shared
Memory
Global
address
space:
UPC,
X10,
Fortress…
Too
low
level
Key-‐value
tables:
RAMCloud,
Dynamo,
Piccolo…
Key
Value
…
…
Limited
lookup
ability
Consistency
concerns
23
24. Data
Models
for
Mutable
Shared
Memory
Global
address
space:
UPC,
X10,
Fortress…
Too
low
level
Key-‐value
tables:
RAMCloud,
Dynamo,
Piccolo…
Key
Value
…
…
Limited
lookup
ability
Consistency
concerns
Tuplespace:
Linda
(‘employee’,
‘John’,
29)
(‘todo’,
‘walk’)
(‘todo’,
‘shopping’)
Flexible
lookup
with
any
ahributes
Individual
tuples
are
immutable
24
25. Programming
model
=
data
model
+
computa0on
model
Linda
=
Tuplespace
+
Linda
processes
25
26. Linda
Processes
in(…)
…
out(…)
…
Process
A
…
out(…)
…
out(…)
…
Process
B
…
in(…)
…
in(…)
…
Process
C
26
27. Linda
Processes
in(…)
…
out(…)
…
Process
A
L
No
automa0c
scaling
L
No
fault
tolerance
…
out(…)
…
out(…)
…
Process
B
…
in(…)
…
in(…)
…
Process
C
27
28. Programming
model
=
data
model
+
computa0on
model
Linda
=
Tuplespace
+
Linda
processes
Celias
=
Tuplespace
+
microtasks
28
38. Two
func0ons:
add()
and
mul0ply()
E
×
F
J
Automa0c
scaling
J
Fault
tolerance
38
39. More
Examples
in
the
Paper…
• MapReduce
– Celias
is
Turing-‐complete
MapReduce-‐complete!
– without
any
ar0ficial
sync.
barriers
• Single-‐source
shortest
path
– Pregel-‐style
graph
processing
• Quicksort
– Recursive
control
flow
39
40. Summary
• MapReduce-‐like
frameworks
are
not
suitable
for
algorithms
with:
– Sparse/incremental/fine-‐grained
computa0on
– Dynamic
dataflow
• Celias
comes
to
our
rescue,
yet
it
is
also
– automa0cally
scalable
– fault
tolerant
40
41. Open
Ques0ons
• Microtask
abstrac0on:
good
enough?
went
too
far?
• Feasibility
of
an
efficient
implementa0on
– Reliable
tuplespace
– Signature
matching
– Microtask
transac0ons
• …
what
is
a
killer
app
of
Celias?
• <Your
ques0ons
here>
41