Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Ad

Cascading
 Nathan Marz
  BackType

Ad

What is Cascading?

Cascading is a Java library that makes development of
   complex Hadoop MapReduce workflows easy

Ad

Why Hadoop?


• Process large amounts of data in a scalable,
  fault-tolerant way

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Wird geladen in …3
×

Hier ansehen

1 von 16 Anzeige
1 von 16 Anzeige
Anzeige

Weitere Verwandte Inhalte

Anzeige
Anzeige

Cascading

  1. 1. Cascading Nathan Marz BackType
  2. 2. What is Cascading? Cascading is a Java library that makes development of complex Hadoop MapReduce workflows easy
  3. 3. Why Hadoop? • Process large amounts of data in a scalable, fault-tolerant way
  4. 4. Why Cascading? Tool How you feel Hadoop MapReduce Cascading
  5. 5. Tuples Cascading represents all data as “Tuples” (“the man sat” , 25) (“hello dolly” , 42) (“say hello” ,1 ) (“the woman sat”, 10)
  6. 6. Tuples Tuples are named, ordered fields [“sentence”, “value”] (“the man sat” , 25) (“hello dolly” , 42) (“say hello” ,1 ) (“the woman sat”, 10)
  7. 7. Flow A flow is a sequence of manipulations on pipes of tuple streams • Flow compiles to one or more MapReduce jobs • Inputs and outputs called “Taps”. • Each Tap produces or receives a pipe of tuples with the same format • Multiple inputs, multiple outputs
  8. 8. Example [“sentence”, “value”] [“word”, “sum”] Get the sum of the values for each word
  9. 9. Example [“sentence”, “value”] Split(“sentence”) -> “word” [“word”, “value”] GroupBy(“word”) [“word”, list<[“value”]>] Sum(“value”) -> “sum” [“word”, “sum”]
  10. 10. Example Split(“sentence”) -> “word” [“sentence”, “value”] [“word”, “value”] (“the” , 25) (“the man sat” , 25) (“man” , 25) (“hello dolly” , 42) (“sat” , 25) (“say hello” ,1 ) (“hello” , 42) (“the woman sat”, 10) (“dolly” , 42) (“say” ,1 ) (“hello” , 1 ) (“the” , 10) (“woman” , 10) (“sat” , 10)
  11. 11. Example GroupBy(“word”) [“word”, “value”] [“word”, list<[“value”]>] (“the” , 25) (“man” , 25) (“the” , [25, 10]) (“sat” , 25) (“man” , [25] ) (“hello” , 42) (“sat” , [25, 10]) (“dolly” , 42) (“hello” , [42, 1] ) (“say” ,1 ) (“dolly” , [42] ) (“hello” , 1 ) (“say” , [1] ) (“the” , 10) (“woman” , [10] ) (“woman” , 10) (“sat” , 10)
  12. 12. Example Sum(“value”) -> “sum” [“word”, list<[“value”]>] [“word”, “sum”] (“the” , [25, 10]) (“the” , 35) (“man” , [25] ) (“man” , 25) (“sat” , [25, 10]) (“sat” , 35) (“hello” , [42, 1] ) (“hello” , 43) (“dolly” , [42] ) (“dolly” , 42) (“say” , [1] ) (“say” ,1 ) (“woman” , [10] ) (“woman” , 10)
  13. 13. More functionality • Inner and outer joins natively supported • Seamlessly branch and merge pipes of tuples • Integrate diverse data sources
  14. 14. Why not Pig? • Pig is a custom language for writing MapReduce workflows • Because it’s a custom language, intermixing “plain logic” in between flows is painful • Not nearly as flexible as Cascading for custom needs
  15. 15. Learn more • Tutorial: http://blog.rapleaf.com/dev/?p=33 • Website: http://www.cascading.org
  16. 16. Questions?

Hinweis der Redaktion

















×