Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Cascalog <ul><li>Nathan Marz, BackType </li></ul>Powerful and easy-to-use data analysis tool for Hadoop
About Me <ul><li>Tech Lead at BackType </li></ul><ul><li>Have been working on many-terabyte scale systems for two years </...
Presentation Overview <ul><li>High level introduction to Cascalog </li></ul><ul><li>Demo </li></ul><ul><li>Cascalog at Bac...
What is Cascalog? <ul><li>Query language for Hadoop </li></ul><ul><li>Queries are written as regular Clojure code </li></u...
What is Clojure? <ul><li>Functional language that compiles to Java bytecode </li></ul><ul><li>Lisp-based </li></ul><ul><li...
Features <ul><li>Inner and outer joins </li></ul><ul><li>Aggregators </li></ul><ul><li>Functions </li></ul><ul><li>Subquer...
What sets Cascalog apart?
What sets Cascalog apart? Fully integrated in a general purpose programming language
What sets Cascalog apart? Full power of Clojure available at all  times
What sets Cascalog apart? Full power of Clojure available at all  times
What sets Cascalog apart? <ul><li>Custom operations </li></ul><ul><ul><li>No UDF interface </li></ul></ul><ul><ul><li>Just...
What sets Cascalog apart? <ul><li>Dynamic queries </li></ul><ul><ul><li>Write functions that return queries </li></ul></ul...
What sets Cascalog apart? <ul><li>Use Cascalog side by side with other code </li></ul><ul><ul><li>Appends and Distributed ...
Easy Experimentation <ul><li>Ships with test dataset that can be queried locally (the “playground”) </li></ul><ul><li>5 mi...
Demo time!
Cascalog at BackType <ul><li>BackType collects data about conversations around the web </li></ul><ul><ul><li>Tweets </li><...
Cascalog at BackType <ul><li>Cascalog is used to: </li></ul><ul><ul><li>Identify influencers </li></ul></ul><ul><ul><li>De...
Cascalog at BackType <ul><li>Input and output  </li></ul><ul><ul><li>Cascalog reads from MySQL databases </li></ul></ul><u...
Cascalog at BackType <ul><li>Rapid development </li></ul><ul><ul><li>Local playground dataset for development </li></ul></...
Cascalog Roadmap <ul><li>Optimized joins: </li></ul><ul><ul><li>Replicated joins </li></ul></ul><ul><ul><li>Bloom joins </...
Questions? <ul><li>Project page:  http://www.github.com/nathanmarz/cascalog </li></ul><ul><li>Tutorial:  http://nathanmarz...
Clojure and Cascalog <ul><li>Provided by Clojure: </li></ul><ul><ul><li>Module system </li></ul></ul><ul><ul><li>Dynamic q...
Cascading and Cascalog <ul><li>Provided by Cascading: </li></ul><ul><ul><li>Tuple abstraction and tuple manipulation </li>...
Nächste SlideShare
Wird geladen in …5
×

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

29.489 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie
  • Login to see the comments

Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data analysis with Cascalog, Nathan Marz, BackType

  1. 1. Cascalog <ul><li>Nathan Marz, BackType </li></ul>Powerful and easy-to-use data analysis tool for Hadoop
  2. 2. About Me <ul><li>Tech Lead at BackType </li></ul><ul><li>Have been working on many-terabyte scale systems for two years </li></ul><ul><ul><li>ETL workflows </li></ul></ul><ul><ul><li>Data warehouses </li></ul></ul>
  3. 3. Presentation Overview <ul><li>High level introduction to Cascalog </li></ul><ul><li>Demo </li></ul><ul><li>Cascalog at BackType </li></ul>
  4. 4. What is Cascalog? <ul><li>Query language for Hadoop </li></ul><ul><li>Queries are written as regular Clojure code </li></ul><ul><li>Alternative to Pig and Hive </li></ul>
  5. 5. What is Clojure? <ul><li>Functional language that compiles to Java bytecode </li></ul><ul><li>Lisp-based </li></ul><ul><li>First-class integration with Java </li></ul>
  6. 6. Features <ul><li>Inner and outer joins </li></ul><ul><li>Aggregators </li></ul><ul><li>Functions </li></ul><ul><li>Subqueries </li></ul><ul><li>Sorting </li></ul><ul><li>Arbitrary inputs and outputs </li></ul>
  7. 7. What sets Cascalog apart?
  8. 8. What sets Cascalog apart? Fully integrated in a general purpose programming language
  9. 9. What sets Cascalog apart? Full power of Clojure available at all times
  10. 10. What sets Cascalog apart? Full power of Clojure available at all times
  11. 11. What sets Cascalog apart? <ul><li>Custom operations </li></ul><ul><ul><li>No UDF interface </li></ul></ul><ul><ul><li>Just Clojure functions </li></ul></ul>
  12. 12. What sets Cascalog apart? <ul><li>Dynamic queries </li></ul><ul><ul><li>Write functions that return queries </li></ul></ul><ul><ul><li>Manipulate queries as first-class entities in the language </li></ul></ul>
  13. 13. What sets Cascalog apart? <ul><li>Use Cascalog side by side with other code </li></ul><ul><ul><li>Appends and Distributed Copies </li></ul></ul><ul><ul><li>Consolidation </li></ul></ul><ul><ul><li>Application logic </li></ul></ul>
  14. 14. Easy Experimentation <ul><li>Ships with test dataset that can be queried locally (the “playground”) </li></ul><ul><li>5 minutes to setup Hadoop, Clojure, and Cascalog locally - see README </li></ul>
  15. 15. Demo time!
  16. 16. Cascalog at BackType <ul><li>BackType collects data about conversations around the web </li></ul><ul><ul><li>Tweets </li></ul></ul><ul><ul><li>Blog comments </li></ul></ul><ul><ul><li>Social news </li></ul></ul><ul><ul><li>People </li></ul></ul>
  17. 17. Cascalog at BackType <ul><li>Cascalog is used to: </li></ul><ul><ul><li>Identify influencers </li></ul></ul><ul><ul><li>Determine number of people exposed to URLs on Twitter </li></ul></ul><ul><ul><li>Identify “interesting tweets” </li></ul></ul><ul><ul><li>Study social engagement of domains over time </li></ul></ul><ul><ul><li>Etc, etc. </li></ul></ul>
  18. 18. Cascalog at BackType <ul><li>Input and output </li></ul><ul><ul><li>Cascalog reads from MySQL databases </li></ul></ul><ul><ul><li>Cascalog writes to Cassandra </li></ul></ul>
  19. 19. Cascalog at BackType <ul><li>Rapid development </li></ul><ul><ul><li>Local playground dataset for development </li></ul></ul><ul><ul><li>Develop queries in the REPL </li></ul></ul>
  20. 20. Cascalog Roadmap <ul><li>Optimized joins: </li></ul><ul><ul><li>Replicated joins </li></ul></ul><ul><ul><li>Bloom joins </li></ul></ul><ul><li>Negations </li></ul><ul><li>Recursion </li></ul>
  21. 21. Questions? <ul><li>Project page: http://www.github.com/nathanmarz/cascalog </li></ul><ul><li>Tutorial: http://nathanmarz.com/blog/introducing-cascalog </li></ul><ul><li>Follow me on Twitter: @nathanmarz </li></ul>
  22. 22. Clojure and Cascalog <ul><li>Provided by Clojure: </li></ul><ul><ul><li>Module system </li></ul></ul><ul><ul><li>Dynamic queries </li></ul></ul><ul><ul><li>Custom operations </li></ul></ul><ul><ul><li>Interactive REPL </li></ul></ul>
  23. 23. Cascading and Cascalog <ul><li>Provided by Cascading: </li></ul><ul><ul><li>Tuple abstraction and tuple manipulation </li></ul></ul><ul><ul><li>Workflow to MapReduce translation </li></ul></ul><ul><ul><li>Read and write from anywhere with Taps </li></ul></ul>

×