SlideShare a Scribd company logo
1 of 41
Download to read offline
Large-­‐scale	
  computa0on	
  
without	
  sacrificing	
  expressiveness	
  
Sangjin Han

Sylvia Ratnasamy

UC Berkeley

1	
  
Review:	
  MapReduce	
  and	
  Friends	
  
Input	
  

Output	
  

Computa0on	
  

map	
  
filter	
  
group	
  by	
  
reduce	
  
join	
  
…	
  

2	
  
Review:	
  MapReduce	
  and	
  Friends	
  
Input	
  

Output	
  

Computa0on	
  

map	
  
filter	
  
group	
  by	
  
reduce	
  
join	
  
…	
  

Observa(on	
  1:	
  Bulk	
  transforma(on	
  of	
  immutable	
  data	
  
(no	
  fine-­‐grained	
  updates)	
  
3	
  
Example	
  1:	
  Sparse	
  Opera0ons	
  
•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  

4	
  
Example	
  1:	
  Sparse	
  Opera0ons	
  
•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  

Graph	
  

Source	
  
node	
  

MR	
  

1-­‐hop	
  
nodes	
  
5	
  
Example	
  1:	
  Sparse	
  Opera0ons	
  
•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  

Graph	
  

Source	
  
node	
  

MR	
  

Graph	
  

1-­‐hop	
  
nodes	
  

MR	
  

2-­‐hop	
  
nodes	
  
6	
  
Example	
  1:	
  Sparse	
  Opera0ons	
  
•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  

Graph	
  

Source	
  
node	
  

MR	
  

Graph	
  

1-­‐hop	
  
nodes	
  

MR	
  

Graph	
  

2-­‐hop	
  
nodes	
  

MR	
  

…	
  

7	
  
Example	
  1:	
  Sparse	
  Opera0ons	
  
•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  

Graph	
  

Source	
  
node	
  

MR	
  

Graph	
  

1-­‐hop	
  
nodes	
  

MR	
  

Graph	
  

2-­‐hop	
  
nodes	
  

MR	
  

…	
  

8	
  
Example	
  1:	
  Sparse	
  opera0ons	
  

# of processed edges (Millions)

•  k-­‐hop	
  reachability	
  with	
  itera0ve	
  MapReduce	
  
Iterative MapReduce
Optimal

20

15

10

5

0
1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20 21 22

Iteration

Internet	
  router	
  topology	
  graph	
  (1.7M	
  nodes,	
  22.2M	
  edges)	
  

9	
  
Review:	
  MapReduce	
  and	
  Friends	
  (cont’d)	
  
Converged?	
  

10	
  
Review:	
  MapReduce	
  and	
  Friends	
  (cont’d)	
  
Converged?	
  

Filter	
  

Join	
  

Map	
  
Union	
  
Filter	
  

11	
  
Review:	
  MapReduce	
  and	
  Friends	
  (cont’d)	
  
Converged?	
  

Filter	
  

Join	
  

Map	
  
Union	
  
Filter	
  

Observa(on	
  2:	
  Sta(c	
  dataflow	
  
(no	
  data-­‐dependent	
  control	
  flow)	
  

12	
  
Example	
  2:	
  Irregular	
  parallelism	
  
•  Parallel	
  SAT	
  solver	
  
E	
  =	
  (p	
  ∨	
  !q)∧(!p	
  ∨	
  r	
  ∨	
  s)∧(q	
  ∨	
  !s	
  ∨	
  !t)∧(!p	
  ∨	
  s)∧…	
  

13	
  
Example	
  2:	
  Irregular	
  parallelism	
  
•  Parallel	
  SAT	
  solver	
  
E	
  =	
  (p	
  ∨	
  !q)∧(!p	
  ∨	
  r	
  ∨	
  s)∧(q	
  ∨	
  !s	
  ∨	
  !t)∧(!p	
  ∨	
  s)∧…	
  

p	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  
q	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  

14	
  
Example	
  2:	
  Irregular	
  parallelism	
  
•  Parallel	
  SAT	
  solver	
  
E	
  =	
  (p	
  ∨	
  !q)∧(!p	
  ∨	
  r	
  ∨	
  s)∧(q	
  ∨	
  !s	
  ∨	
  !t)∧(!p	
  ∨	
  s)∧…	
  

p	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  
q	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  

15	
  
Example	
  2:	
  Irregular	
  parallelism	
  
•  Parallel	
  SAT	
  solver	
  
E	
  =	
  (p	
  ∨	
  !q)∧(!p	
  ∨	
  r	
  ∨	
  s)∧(q	
  ∨	
  !s	
  ∨	
  !t)∧(!p	
  ∨	
  s)∧…	
  

p	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  
q	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  
r	
  =	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  T	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  F	
  

16	
  
MapReduce-­‐like	
  frameworks	
  assume:	
  
	
  
1.	
  	
  Bulk	
  transforma0on	
  of	
  immutable	
  data	
  
	
  

2.	
  	
  Sta0c	
  dataflow	
  

17	
  
Exis0ng	
  frameworks	
  assume:	
  
	
  Our	
  work:	
  
1.	
  	
  Bulk	
  transforma0on	
  of	
  immutable	
  data	
  
	
  	
  	
  	
  	
  	
  Fine-­‐grained	
  opera0ons	
  on	
  mutable	
  data	
  
	
  

2.	
  	
  Sta0c	
  dataflow	
  
	
  	
  	
  	
  	
  	
  Dynamic,	
  data-­‐dependent	
  control	
  flow	
  
Yet	
  we	
  s0ll	
  want	
  elas0c	
  scalability	
  and	
  fault	
  tolerance	
  
18	
  
Spinning	
  a	
  small	
  twist	
  to	
  Linda	
  

CELIAS	
  PROGRAMMING	
  MODEL	
  

19	
  
Programming	
  model	
  =	
  
	
   	
  data	
  model	
  	
  +	
  	
  computa0on	
  model	
  

20	
  
Data	
  Models	
  for	
  Mutable	
  Shared	
  Memory	
  

21	
  
Data	
  Models	
  for	
  Mutable	
  Shared	
  Memory	
  
Global	
  address	
  space:	
  UPC,	
  X10,	
  Fortress…	
  

Too	
  low	
  level	
  

22	
  
Data	
  Models	
  for	
  Mutable	
  Shared	
  Memory	
  
Global	
  address	
  space:	
  UPC,	
  X10,	
  Fortress…	
  

Too	
  low	
  level	
  

Key-­‐value	
  tables:	
  RAMCloud,	
  Dynamo,	
  Piccolo…	
  
Key	
  

Value	
  

…	
  

…	
  

Limited	
  lookup	
  ability	
  
	
  
Consistency	
  concerns	
  

23	
  
Data	
  Models	
  for	
  Mutable	
  Shared	
  Memory	
  
Global	
  address	
  space:	
  UPC,	
  X10,	
  Fortress…	
  

Too	
  low	
  level	
  

Key-­‐value	
  tables:	
  RAMCloud,	
  Dynamo,	
  Piccolo…	
  
Key	
  

Value	
  

…	
  

…	
  

Limited	
  lookup	
  ability	
  
	
  
Consistency	
  concerns	
  

Tuplespace:	
  Linda	
  
(‘employee’,	
  ‘John’,	
  29)	
  
(‘todo’,	
  ‘walk’)	
  
(‘todo’,	
  ‘shopping’)	
  

Flexible	
  lookup	
  with	
  any	
  ahributes	
  
	
  
Individual	
  tuples	
  are	
  immutable	
  
24	
  
Programming	
  model	
  =	
  
	
   	
  data	
  model	
  	
  +	
  	
  computa0on	
  model	
  
Linda	
  =	
  
	
   	
  Tuplespace	
  	
  +	
  	
  Linda	
  processes	
  

25	
  
Linda	
  Processes	
  

in(…)	
  
…	
  
out(…)	
  
…	
  
	
  
	
  
Process	
  A	
  

…	
  
out(…)	
  
…	
  
out(…)	
  
…	
  
	
  
Process	
  B	
  

…	
  
in(…)	
  
…	
  
in(…)	
  
…	
  
	
  
Process	
  C	
  
26	
  
Linda	
  Processes	
  

in(…)	
  
…	
  
out(…)	
  
…	
  
	
  
	
  
Process	
  A	
  

L	
  No	
  automa0c	
  scaling	
  
L	
  No	
  fault	
  tolerance	
  

…	
  
out(…)	
  
…	
  
out(…)	
  
…	
  
	
  
Process	
  B	
  

…	
  
in(…)	
  
…	
  
in(…)	
  
…	
  
	
  
Process	
  C	
  
27	
  
Programming	
  model	
  =	
  
	
   	
  data	
  model	
  	
  +	
  	
  computa0on	
  model	
  
Linda	
  =	
  
	
   	
  Tuplespace	
  	
  +	
  	
  Linda	
  processes	
  
Celias	
  =	
  
	
   	
  Tuplespace	
  	
  +	
  	
  microtasks	
  
28	
  
Microtasks	
  
(	
  ‘hello’,	
  5)	
  
…	
  

(‘world’,	
  2)	
  
(	
  ‘hello’,	
  7)	
  

Func0on	
  wordcount()	
  
Signature	
   (?word,	
  ?cnt1),	
  (?word,	
  ?cnt2)	
  
Code	
  

sum	
  :=	
  cnt1	
  +	
  cnt2	
  
emit	
  (word,	
  sum)	
  

29	
  
Microtasks	
  
(	
  ‘hello’,	
  5)	
  
…	
  

(‘world’,	
  2)	
  
(	
  ‘hello’,	
  7)	
  

word	
  =	
  ‘hello’	
  
cnt1	
  =	
  5	
  
cnt2	
  =	
  7	
  	
  

Func0on	
  wordcount()	
  
Signature	
   (?word,	
  ?cnt1),	
  (?word,	
  ?cnt2)	
  
Code	
  

sum	
  :=	
  cnt1	
  +	
  cnt2	
  
emit	
  (word,	
  sum)	
  

When	
  a	
  signature	
  matches:	
  
1.	
  microtask	
  launch	
  

30	
  
Microtasks	
  
(	
  ‘hello’,	
  5)	
  
…	
  

(‘world’,	
  2)	
  
(	
  ‘hello’,	
  7)	
  

Func0on	
  wordcount()	
  
Signature	
   (?word,	
  ?cnt1),	
  (?word,	
  ?cnt2)	
  
Code	
  

sum	
  :=	
  cnt1	
  +	
  cnt2	
  
emit	
  (word,	
  sum)	
  

When	
  a	
  signature	
  matches:	
  
1.	
  microtask	
  launch	
  
2.	
  code	
  execu0on	
  
5	
  +	
  7	
  =	
  ??	
  
31	
  
Microtasks	
  
(‘world’,	
  2)	
  
…	
   (	
  ‘hello’,	
  12)	
  

Func0on	
  wordcount()	
  
Signature	
   (?word,	
  ?cnt1),	
  (?word,	
  ?cnt2)	
  
Code	
  

sum	
  :=	
  cnt1	
  +	
  cnt2	
  
emit	
  (word,	
  sum)	
  

When	
  a	
  signature	
  matches:	
  
1.	
  microtask	
  launch	
  
2.	
  code	
  execu0on	
  
3.	
  atomic	
  replacement	
  
32	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

(A	
  +	
  B)	
  ×	
  (C	
  +	
  D)	
  

33	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

(A	
  +	
  B)	
  ×	
  (C	
  +	
  D)	
  

34	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

E	
  	
  	
  	
  	
  	
  ×	
  	
  	
  	
  	
  	
  F	
  

J	
  Automa0c	
  scaling	
  
35	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

E	
  	
  	
  	
  	
  	
  ×	
  	
  	
  	
  	
  	
  F	
  

J	
  Automa0c	
  scaling	
  
36	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

E	
  	
  	
  	
  	
  	
  ×	
  	
  	
  	
  	
  	
  F	
  

J	
  Automa0c	
  scaling	
  
37	
  
Two	
  func0ons:	
  add()	
  and	
  mul0ply()	
  

E	
  	
  	
  	
  	
  	
  ×	
  	
  	
  	
  	
  	
  F	
  

J	
  Automa0c	
  scaling	
  
J	
  Fault	
  tolerance	
  
38	
  
More	
  Examples	
  in	
  the	
  Paper…	
  
•  MapReduce	
  
–  Celias	
  is	
  Turing-­‐complete	
  MapReduce-­‐complete!	
  
–  without	
  any	
  ar0ficial	
  sync.	
  barriers	
  

•  Single-­‐source	
  shortest	
  path	
  
–  Pregel-­‐style	
  graph	
  processing	
  

•  Quicksort	
  
–  Recursive	
  control	
  flow	
  
39	
  
Summary	
  
•  MapReduce-­‐like	
  frameworks	
  are	
  not	
  suitable	
  
for	
  algorithms	
  with:	
  
–  Sparse/incremental/fine-­‐grained	
  computa0on	
  
–  Dynamic	
  dataflow	
  

•  Celias	
  comes	
  to	
  our	
  rescue,	
  yet	
  it	
  is	
  also	
  
–  automa0cally	
  scalable	
  
–  fault	
  tolerant	
  
40	
  
Open	
  Ques0ons	
  
•  Microtask	
  abstrac0on:	
  good	
  enough?	
  went	
  too	
  far?	
  
•  Feasibility	
  of	
  an	
  efficient	
  implementa0on	
  
–  Reliable	
  tuplespace	
  
–  Signature	
  matching	
  
–  Microtask	
  transac0ons	
  

•  …	
  what	
  is	
  a	
  killer	
  app	
  of	
  Celias?	
  
•  <Your	
  ques0ons	
  here>	
  
41	
  

More Related Content

What's hot

Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
Ankit Katiyar
 

What's hot (20)

laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
laplace transform and inverse laplace, properties, Inverse Laplace Calculatio...
 
Nyquist and polar plot 118 &amp; 117
Nyquist and polar plot 118 &amp; 117Nyquist and polar plot 118 &amp; 117
Nyquist and polar plot 118 &amp; 117
 
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
Modern Control - Lec 04 - Analysis and Design of Control Systems using Root L...
 
Signals and Systems Assignment Help
Signals and Systems Assignment HelpSignals and Systems Assignment Help
Signals and Systems Assignment Help
 
Lecture 23 loop transfer function
Lecture 23 loop transfer functionLecture 23 loop transfer function
Lecture 23 loop transfer function
 
Applications laplace transform
Applications laplace transformApplications laplace transform
Applications laplace transform
 
DSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-TransformDSP_FOEHU - MATLAB 03 - The z-Transform
DSP_FOEHU - MATLAB 03 - The z-Transform
 
z transforms
z transformsz transforms
z transforms
 
Tf
TfTf
Tf
 
High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018High Performance Systems Without Tears - Scala Days Berlin 2018
High Performance Systems Without Tears - Scala Days Berlin 2018
 
Lec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrencesLec 5 asymptotic notations and recurrences
Lec 5 asymptotic notations and recurrences
 
Chapter3 laplace
Chapter3 laplaceChapter3 laplace
Chapter3 laplace
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
 
Generic parallelization strategies for data assimilation
Generic parallelization strategies for data assimilationGeneric parallelization strategies for data assimilation
Generic parallelization strategies for data assimilation
 
Queueing theory
Queueing theoryQueueing theory
Queueing theory
 
Lec3
Lec3Lec3
Lec3
 
Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )Asymptotic notations(Big O, Omega, Theta )
Asymptotic notations(Big O, Omega, Theta )
 
Analog properties and Z-transform
Analog properties and Z-transformAnalog properties and Z-transform
Analog properties and Z-transform
 
Properties of laplace transform
Properties of laplace transformProperties of laplace transform
Properties of laplace transform
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub
 

Similar to Large-scale computation without sacrificing expressiveness

Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
Jonny Daenen
 

Similar to Large-scale computation without sacrificing expressiveness (20)

ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
05-Laplace Transform and Its Inverse_2.ppt
05-Laplace Transform and Its Inverse_2.ppt05-Laplace Transform and Its Inverse_2.ppt
05-Laplace Transform and Its Inverse_2.ppt
 
01 - DAA - PPT.pptx
01 - DAA - PPT.pptx01 - DAA - PPT.pptx
01 - DAA - PPT.pptx
 
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectu...
 
Metodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang LandauMetodo Monte Carlo -Wang Landau
Metodo Monte Carlo -Wang Landau
 
Unit 1-logic
Unit 1-logicUnit 1-logic
Unit 1-logic
 
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
論文紹介:Towards Robust Adaptive Object Detection Under Noisy Annotations
 
MATLABgraphPlotting.pptx
MATLABgraphPlotting.pptxMATLABgraphPlotting.pptx
MATLABgraphPlotting.pptx
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
 
04-Laplace Transform and Its Inverse.pptx
04-Laplace Transform and Its  Inverse.pptx04-Laplace Transform and Its  Inverse.pptx
04-Laplace Transform and Its Inverse.pptx
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Hermite integrators and Riordan arrays
Hermite integrators and Riordan arraysHermite integrators and Riordan arrays
Hermite integrators and Riordan arrays
 
Shors'algorithm simplified.pptx
Shors'algorithm simplified.pptxShors'algorithm simplified.pptx
Shors'algorithm simplified.pptx
 
Data assimilation with OpenDA
Data assimilation with OpenDAData assimilation with OpenDA
Data assimilation with OpenDA
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
Dumitru Vulcanov - Numerical simulations with Ricci flow, an overview and cos...
 
Class 16: Making Loops
Class 16: Making LoopsClass 16: Making Loops
Class 16: Making Loops
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
Extreme‐Scale Parallel Symmetric Eigensolver for Very Small‐Size Matrices Usi...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Large-scale computation without sacrificing expressiveness

  • 1. Large-­‐scale  computa0on   without  sacrificing  expressiveness   Sangjin Han Sylvia Ratnasamy UC Berkeley 1  
  • 2. Review:  MapReduce  and  Friends   Input   Output   Computa0on   map   filter   group  by   reduce   join   …   2  
  • 3. Review:  MapReduce  and  Friends   Input   Output   Computa0on   map   filter   group  by   reduce   join   …   Observa(on  1:  Bulk  transforma(on  of  immutable  data   (no  fine-­‐grained  updates)   3  
  • 4. Example  1:  Sparse  Opera0ons   •  k-­‐hop  reachability  with  itera0ve  MapReduce   4  
  • 5. Example  1:  Sparse  Opera0ons   •  k-­‐hop  reachability  with  itera0ve  MapReduce   Graph   Source   node   MR   1-­‐hop   nodes   5  
  • 6. Example  1:  Sparse  Opera0ons   •  k-­‐hop  reachability  with  itera0ve  MapReduce   Graph   Source   node   MR   Graph   1-­‐hop   nodes   MR   2-­‐hop   nodes   6  
  • 7. Example  1:  Sparse  Opera0ons   •  k-­‐hop  reachability  with  itera0ve  MapReduce   Graph   Source   node   MR   Graph   1-­‐hop   nodes   MR   Graph   2-­‐hop   nodes   MR   …   7  
  • 8. Example  1:  Sparse  Opera0ons   •  k-­‐hop  reachability  with  itera0ve  MapReduce   Graph   Source   node   MR   Graph   1-­‐hop   nodes   MR   Graph   2-­‐hop   nodes   MR   …   8  
  • 9. Example  1:  Sparse  opera0ons   # of processed edges (Millions) •  k-­‐hop  reachability  with  itera0ve  MapReduce   Iterative MapReduce Optimal 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Iteration Internet  router  topology  graph  (1.7M  nodes,  22.2M  edges)   9  
  • 10. Review:  MapReduce  and  Friends  (cont’d)   Converged?   10  
  • 11. Review:  MapReduce  and  Friends  (cont’d)   Converged?   Filter   Join   Map   Union   Filter   11  
  • 12. Review:  MapReduce  and  Friends  (cont’d)   Converged?   Filter   Join   Map   Union   Filter   Observa(on  2:  Sta(c  dataflow   (no  data-­‐dependent  control  flow)   12  
  • 13. Example  2:  Irregular  parallelism   •  Parallel  SAT  solver   E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…   13  
  • 14. Example  2:  Irregular  parallelism   •  Parallel  SAT  solver   E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…   p  =                                                                            T                                                            F     q  =                                    T                                  F                                                                  T                                  F   14  
  • 15. Example  2:  Irregular  parallelism   •  Parallel  SAT  solver   E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…   p  =                                                                            T                                                            F     q  =                                    T                                  F                                                                  T                                  F   15  
  • 16. Example  2:  Irregular  parallelism   •  Parallel  SAT  solver   E  =  (p  ∨  !q)∧(!p  ∨  r  ∨  s)∧(q  ∨  !s  ∨  !t)∧(!p  ∨  s)∧…   p  =                                                                            T                                                            F     q  =                                    T                                  F                                                                  T                                  F   r  =                                                              T                                  F   16  
  • 17. MapReduce-­‐like  frameworks  assume:     1.    Bulk  transforma0on  of  immutable  data     2.    Sta0c  dataflow   17  
  • 18. Exis0ng  frameworks  assume:    Our  work:   1.    Bulk  transforma0on  of  immutable  data              Fine-­‐grained  opera0ons  on  mutable  data     2.    Sta0c  dataflow              Dynamic,  data-­‐dependent  control  flow   Yet  we  s0ll  want  elas0c  scalability  and  fault  tolerance   18  
  • 19. Spinning  a  small  twist  to  Linda   CELIAS  PROGRAMMING  MODEL   19  
  • 20. Programming  model  =      data  model    +    computa0on  model   20  
  • 21. Data  Models  for  Mutable  Shared  Memory   21  
  • 22. Data  Models  for  Mutable  Shared  Memory   Global  address  space:  UPC,  X10,  Fortress…   Too  low  level   22  
  • 23. Data  Models  for  Mutable  Shared  Memory   Global  address  space:  UPC,  X10,  Fortress…   Too  low  level   Key-­‐value  tables:  RAMCloud,  Dynamo,  Piccolo…   Key   Value   …   …   Limited  lookup  ability     Consistency  concerns   23  
  • 24. Data  Models  for  Mutable  Shared  Memory   Global  address  space:  UPC,  X10,  Fortress…   Too  low  level   Key-­‐value  tables:  RAMCloud,  Dynamo,  Piccolo…   Key   Value   …   …   Limited  lookup  ability     Consistency  concerns   Tuplespace:  Linda   (‘employee’,  ‘John’,  29)   (‘todo’,  ‘walk’)   (‘todo’,  ‘shopping’)   Flexible  lookup  with  any  ahributes     Individual  tuples  are  immutable   24  
  • 25. Programming  model  =      data  model    +    computa0on  model   Linda  =      Tuplespace    +    Linda  processes   25  
  • 26. Linda  Processes   in(…)   …   out(…)   …       Process  A   …   out(…)   …   out(…)   …     Process  B   …   in(…)   …   in(…)   …     Process  C   26  
  • 27. Linda  Processes   in(…)   …   out(…)   …       Process  A   L  No  automa0c  scaling   L  No  fault  tolerance   …   out(…)   …   out(…)   …     Process  B   …   in(…)   …   in(…)   …     Process  C   27  
  • 28. Programming  model  =      data  model    +    computa0on  model   Linda  =      Tuplespace    +    Linda  processes   Celias  =      Tuplespace    +    microtasks   28  
  • 29. Microtasks   (  ‘hello’,  5)   …   (‘world’,  2)   (  ‘hello’,  7)   Func0on  wordcount()   Signature   (?word,  ?cnt1),  (?word,  ?cnt2)   Code   sum  :=  cnt1  +  cnt2   emit  (word,  sum)   29  
  • 30. Microtasks   (  ‘hello’,  5)   …   (‘world’,  2)   (  ‘hello’,  7)   word  =  ‘hello’   cnt1  =  5   cnt2  =  7     Func0on  wordcount()   Signature   (?word,  ?cnt1),  (?word,  ?cnt2)   Code   sum  :=  cnt1  +  cnt2   emit  (word,  sum)   When  a  signature  matches:   1.  microtask  launch   30  
  • 31. Microtasks   (  ‘hello’,  5)   …   (‘world’,  2)   (  ‘hello’,  7)   Func0on  wordcount()   Signature   (?word,  ?cnt1),  (?word,  ?cnt2)   Code   sum  :=  cnt1  +  cnt2   emit  (word,  sum)   When  a  signature  matches:   1.  microtask  launch   2.  code  execu0on   5  +  7  =  ??   31  
  • 32. Microtasks   (‘world’,  2)   …   (  ‘hello’,  12)   Func0on  wordcount()   Signature   (?word,  ?cnt1),  (?word,  ?cnt2)   Code   sum  :=  cnt1  +  cnt2   emit  (word,  sum)   When  a  signature  matches:   1.  microtask  launch   2.  code  execu0on   3.  atomic  replacement   32  
  • 33. Two  func0ons:  add()  and  mul0ply()   (A  +  B)  ×  (C  +  D)   33  
  • 34. Two  func0ons:  add()  and  mul0ply()   (A  +  B)  ×  (C  +  D)   34  
  • 35. Two  func0ons:  add()  and  mul0ply()   E            ×            F   J  Automa0c  scaling   35  
  • 36. Two  func0ons:  add()  and  mul0ply()   E            ×            F   J  Automa0c  scaling   36  
  • 37. Two  func0ons:  add()  and  mul0ply()   E            ×            F   J  Automa0c  scaling   37  
  • 38. Two  func0ons:  add()  and  mul0ply()   E            ×            F   J  Automa0c  scaling   J  Fault  tolerance   38  
  • 39. More  Examples  in  the  Paper…   •  MapReduce   –  Celias  is  Turing-­‐complete  MapReduce-­‐complete!   –  without  any  ar0ficial  sync.  barriers   •  Single-­‐source  shortest  path   –  Pregel-­‐style  graph  processing   •  Quicksort   –  Recursive  control  flow   39  
  • 40. Summary   •  MapReduce-­‐like  frameworks  are  not  suitable   for  algorithms  with:   –  Sparse/incremental/fine-­‐grained  computa0on   –  Dynamic  dataflow   •  Celias  comes  to  our  rescue,  yet  it  is  also   –  automa0cally  scalable   –  fault  tolerant   40  
  • 41. Open  Ques0ons   •  Microtask  abstrac0on:  good  enough?  went  too  far?   •  Feasibility  of  an  efficient  implementa0on   –  Reliable  tuplespace   –  Signature  matching   –  Microtask  transac0ons   •  …  what  is  a  killer  app  of  Celias?   •  <Your  ques0ons  here>   41