Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Cascalog workshop

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
Happy Monthsary!
Happy Monthsary!
Wird geladen in …3
×

Hier ansehen

1 von 40 Anzeige

Weitere Verwandte Inhalte

Weitere von nathanmarz (17)

Anzeige

Aktuellste (20)

Cascalog workshop

  1. 1. Cascalog Workshop
  2. 2. Example query
  3. 3. Execution 1. Pre-aggregation 2. Aggregation 3. Post-aggregation
  4. 4. Variable dependencies
  5. 5. Pre-aggregation • Start from generator variables • Resolve as many variables as possible using: • Joins • Functions • Use as many filters as possible • Join all sources into one set of tuples
  6. 6. Aggregation • Group by resolved output variables • Apply all aggregators to each group
  7. 7. Post-aggregation • Resolve the rest of the variables • Apply rest of filters
  8. 8. Example query
  9. 9. Query planner Start with generators
  10. 10. Query planner [?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  11. 11. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Do a join
  12. 12. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] Add functions and filters until fixed point
  13. 13. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Do a join
  14. 14. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  15. 15. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Group by already satisfied output vars
  16. 16. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  17. 17. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Add functions and filters until fixed point
  18. 18. Query planner [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  19. 19. Cascading pipes • Each: can occur in Map or Reduce • GroupBy: Causes a Reduce step • Every: One or more follow GroupBy • CoGroup: Join implementation, causes Reduce step
  20. 20. To Cascading
  21. 21. To Cascading Each [?person2 ?age2 ?double-age2]
  22. 22. To Cascading [?person2 ?age2 ?double-age2] CoGroup [?person1 ?person2 ?age2 ?double-age2]
  23. 23. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] CoGroup [?person1 ?age1 ?person2 ?age2 ?double-age2]
  24. 24. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Each Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  25. 25. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta GroupBy [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  26. 26. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Every Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Execute aggregators on each group
  27. 27. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] Each [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta]
  28. 28. To Cascading [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Each Project fields to [?delta ?count]
  29. 29. To MapReduce [?person2 ?age2 ?double-age2] Job 1 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  30. 30. To MapReduce [?person2 ?age2 ?double-age2] Job 2 [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Project fields to [?delta ?count]
  31. 31. To MapReduce [?person2 ?age2 ?double-age2] [?person1 ?person2 ?age2 ?double-age2] [?person1 ?age1 ?person2 ?age2 ?double-age2] Group by ?delta [?delta ?count] [?person1 ?age1 ?person2 ?age2 ?double-age2 ?delta] Job 3 Project fields to [?delta ?count]
  32. 32. defmapop [A1, B1, C1] [A1, B1, C1, D1, E1] [A2, B2, C2] [A2, B2, C2, D2, E2] [A3, B3, C3] [A3, B3, C3, D3, E3] Appends fields to tuple
  33. 33. deffilterop [A1, B1, C1] true [A1, B1, C1] [A2, B2, C2] false [A3, B3, C3] [A3, B3, C3] true
  34. 34. defmapcatop [ [“a red dog”, “a”] [“a red dog”, “a”] [“a red dog”] [“a red dog”, “red”] [“a red dog”, “dog”] ] [“a red dog”, “red”] [“ ”] [] [“a red dog”, “dog”] [“hello”, “hello”] [“hello”] [ [“hello”, “hello”] ] Map Concat
  35. 35. Aggregators [“key1”, 1] [“key1”, 1] [“key1”, 3] [“key3”, 3] [“key1”, 2] Map Task 1 Reduce Task 1 [“key2”, 3] [“key2”, 3] [“key2”, 3] [“key1”, 2] [“key3”, 3] [“key3”, 4] [“key3”, 1] [“key3”, 1] Map Task 2 Reduce Task 2 Regular aggregators - all data goes to reducers
  36. 36. defparallelagg [“nathan”] [“nathan”, 1] [“nathan”, 2] [“alice”] [“alice”, 1] [“nathan”, 3] [“alice”, 1] [“nathan”] [“nathan”, 1] Map Task 1 Map Task 1 Map Task 1 Reduce Task 1 Combine Combine Init (Map) (Reduce) [“sally”, 1] [“nathan”] [“nathan”, 1] [“nathan”, 1] [“alice”, 1] [“sally”] [“sally”, 1] [“sally”, 1] Map Task 2 Map Task 2 Map Task 2 Reduce Task 2 Parallel aggregators - partial aggregation done in mappers
  37. 37. combine [1] [3] [2] [4] [3] [5] [1] [2] [3] [3] [4] [5]
  38. 38. union [1] [3] [2] [4] [3] [5] [1] [2] [3] [4] [5]
  39. 39. ElephantDB Shard 0 Shard 1 Shard 2 Distributed Key/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  40. 40. ElephantDB DFS ElephantDB Server Shard 0 Shard 1 Shard 2 ElephantDB Server Shard 3 Shard 4 Shard 5 ElephantDB Server Serving domain of data

Hinweis der Redaktion

  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

×