Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

Weitere Verwandte Inhalte

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Composing and Executing Parallel Data Flow Graphs wth Shell Pipes

  1. 1. Composing and Executing Parallel Data-flow Graphs with Shell Pipes Edward Walker (TACC) Weijia Xu (TACC) Vinoth Chandar (Oracle Corp)
  2. 2. Agenda • Motivation • Shell language extensions • Implementation • Experimental evaluation • Conclusions
  3. 3. Motivation • Distributed memory clusters are becoming pervasive in industry and academia • Shells are the default login environment on these systems • Shell pipes are commonly used for composing extensible unix commands. • There has been no change to the syntax/semantics of shell pipes since their invention over 30 years ago. • Growing need to compose massively parallel jobs quickly, using existing software
  4. 4. Extending Shells for Parallel Computing • Build a simple, powerful coordination layer at the Shell • The coordination layer transparently manages the parallelism in the workflow • User specifies parallel computation as a dataflow graph using extensions to the Shell • Provides the ability to combine different tools and build interesting parallel programs quickly.
  5. 5. Shell pipe extensions • Pipeline fork A | B on n procs • Pipeline join A on n procs | B • Pipeline cycles (++ n A) • Pipeline key-value aggregation A | B on keys
  6. 6. Parallel shell tasks extensions > function foo() { echo “hello world” } > foo on all procs # foo() on all CPUs > foo on all nodes # foo() on all nodes stride > foo on 10:2 procs # 10 tasks, 2 tasks on each node span > foo on 10:2:2 procs # 10 tasks, 2 tasks on alternative node
  7. 7. Composing data-flow graphs • Example 1: function B1() {} B1 function B2() {} A C function B() { B2 if (($_ASPECT_TASKID == 0 )) ; then B1 else B2 endif } A | B on 2 procs | C
  8. 8. Composing data-flow graphs • Example 2: function map() { reduce emit_tuple –k key –v value map } Key-value function reduce() DHT { consume_tuple –k key –v value map reduce num=${#value[@]} for ((i=0; i < $num; i++)) ; do # process key=$key, value=${value[$i]} done } map on all procs | reduce on keys
  9. 9. BASH Implementation
  10. 10. Startup Overlay • Script may have many instances requiring startup of parallel tasks • Motivation for overlay: – Fast startup of parallel shell workers – Handles node failures gracefully • Two level hierarchy: sectors and proxies • Overlay node addressing: 7 0 Compute node ID Sector id Proxy id
  11. 11. Fault-Tolerance • Proxy nodes monitor peers within sector, and sector heads monitor peer sectors • Node 0 maintains a list of available nodes in the overlay in a master_node file Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node
  12. 12. Starting shell workers with startup overlay
  13. 13. 1. Bash spawns agent. 2. Agent queries master_node and spawns node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec master_node (2) (1) BASH agent (2) Node I/O MUX
  14. 14. 3. Agent Invokes overlay to spawn CPU I/O multiplexor on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX
  15. 15. 4. CPU I/O multiplexor spawns a shell worker per CPU on node Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (3) (1) BASH agent (2) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  16. 16. 5. CPU I/O multiplexor calls back to node I/O multiplexor Overlay sector 0 Overlay sector 1 Proxy Node 3 Proxy Node 0 Proxy Node 6 Proxy Node 7 exec exec exec exec Node 2 Node 1 Node 4 Node 5 Proxy Proxy Proxy Proxy exec exec exec exec (3) (1) BASH agent (3) (2) (5) Node I/O CPU I/O MUX MUX (4) CPU CPUCPU
  17. 17. Implementation of pipeline fork
  18. 18. 1. Process B pipes stdin into stdin_file A | B on N procs stdin BASH stdout pipe (1) aspect-agent B stdin A reader stdin_file
  19. 19. 2. Constructs command files for each task A | B on N procs stdin BASH stdout pipe (1) aspect-agent B Cmd stdin dispatcher A reader (2) stdin_file Cmd files B cat stdin_file | B
  20. 20. 3. 4. and 5. Execute command files in shell workers and marshal results back to shell A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher (3) qu eue Cmd (5) files Node B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) Shell Shell worker worker B B
  21. 21. 6. Replay command files on failure A | B on N procs stdin BASH stdout pipe (1) control stdout aspect-agent B Cmd stdin dispatcher A reader I/O (2) flusher flusher MUX stdin_file flusher replayer (3) (6) qu e Local compute node ue Cmd (5) Shell Shell files Node worker worker B Node MUX Node MUX cat stdin_file | B MUX Compute node (4) B B Shell Shell worker worker B B
  22. 22. Implementation of key-value aggregation
  23. 23. 1. Agent inspects and hashes key A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher
  24. 24. 2. Routes key-value to compute node based on key hash, and stored in hash table A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table
  25. 25. 3. Each node constructs command files to pipe the key-value entry from its hash table into process B A | B on keys pipe BASH control control (1) aspect-agent B Key A dispatcher (2) Node MUX Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  26. 26. 4. Results from the command files execution are marshaled back to the shell A | B on keys pipe BASH control control (1) stdout control aspect-agent B Key I/O MUX A dispatcher (2) Node MUX (4) Compute node Compute node Distributed Hash Table Hash Hash gdbm table gdbm table emit_tuple emit_tuple (3) B B
  27. 27. Experimental Evaluation
  28. 28. Startup overlay performance (when compared to SSH default mechanism)
  29. 29. Syntactic benchmark I: performance of pipeline join
  30. 30. Syntactic benchmark II: performance of key-value aggregation
  31. 31. TeraSort benchmark: Parallel bucket sort • Step 1: spawn the data generator in parallel on each compute node, partitioning data across N nodes for task T if the first 2 bytes fall in the range:  16 T 16 T + 1 2 ∗ N , 2 ∗ N    • Step 2: perform sort on local data on each node • Step 3: merge results onto global file system
  32. 32. TeraSort benchmark: Sorting rate
  33. 33. Related Work • Ptolemy – embedded system design • Yahoo Pipes – web content filtering • Hadoop – Java implementation of MapReduce • Dryad - distributed DAG data flow computation
  34. 34. Conclusion • A debugger would be extremely helpful. Working on bashdb implementation. • Run-time simulator would be helpful to predict performance based on characteristics of cluster. • Still thinking about how to incorporate our extensions for named pipes (i.e. mkfifo).
  35. 35. Questions ?

×