Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Clojure at BackType

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
ElephantDB
ElephantDB
Wird geladen in …3
×

Hier ansehen

1 von 59 Anzeige

Clojure at BackType

Herunterladen, um offline zu lesen

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

Presentation to a combined meetup of Bay Area Lisp and Bay Area Clojure groups. Presented three Clojure projects at BackType:

Cascalog - Batch processing in Clojure
ElephantDB - Database written in Clojure
Storm - Distributed, fault-tolerant, reliable stream processing and RPC

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Anzeige

Ähnlich wie Clojure at BackType (20)

Anzeige

Clojure at BackType

  1. Clojure at BackType How we learned to stop worrying and love the parentheses Nathan Marz BackType @nathanmarz
  2. BackType Data Services (APIs) Social Media Analytics Dashboard
  3. APIs • Conversational graph for url • Comment search • #Tweets / URL • Influence scores • Top sites • Trending links stream • etc.
  4. URL Profiles
  5. Site comparisons
  6. Influencer Profiles
  7. Twitter Account Analytics
  8. Topic Analysis
  9. Topic Analysis
  10. BackType’s Challenges
  11. BackType’s Challenges Complex analytics
  12. BackType’s Challenges Complex analytics on lots of data (> 30TB)
  13. BackType’s Challenges Complex analytics on lots of data (> 30TB) in realtime
  14. Clojure at BackType • Cascalog • ElephantDB • Storm
  15. Let’s build an app
  16. Let’s build an app
  17. Cascalog Cascalog Variables and logic Abstraction Cascading Tuples, data workflows Key/value pairs, MapReduce aggregation
  18. Cascalog basics The “age” dataset
  19. Cascalog basics
  20. Cascalog basics Define and execute a query
  21. Cascalog basics Where to emit results Define and execute a query
  22. Cascalog basics Where to emit results Output variables Define and execute a query
  23. Cascalog basics Where to “Predicates”: constrain emit results the output variables Output variables Define and execute a query
  24. Predicates
  25. Predicates Input fields
  26. Predicates Input fields Output fields
  27. Predicates Fields can be constants or variables
  28. Predicates Fields can be constants or variables Variables are prefixed with ? or !
  29. Predicates
  30. Predicates • Functions • Filters • Aggregators • Generators: finite sources of tuples
  31. Example #1 Generator Filter
  32. Example #2 Generator Function
  33. Example #3 Generator Aggregator Filter
  34. Join example
  35. Join example Triggers a join
  36. Join example
  37. Join example Joins are an implementation detail
  38. Cascalog demo!
  39. Composability “Predicate macro”
  40. Composability expands to Using a predicate macro
  41. Contrast to Pig Pig’s AVG is 300 lines of code
  42. Let’s build an app
  43. Graph Schema Reshare: true Gender: female Property Tweet: 456 Property Reaction Reactor Reactor Tweet: 123 Alice Bob Property Property Content: RT @bob Content: Data is fun! Data is fun!
  44. ElephantDB Shard 0 Shard 1 Shard 2 Distributed Key/Value pairs Shard 3 Filesystem Pre-shard Shard 4 and index in Shard 5 MapReduce Generation of domain of data
  45. ElephantDB DFS ElephantDB Server Shard 0 Shard 1 Shard 2 ElephantDB Server Shard 3 Shard 4 Shard 5 ElephantDB Server Serving domain of data
  46. Storm Stream Processing Distributed RPC
  47. Stream processing • Automatically distributes computation • Horizontally scalable • Fault-tolerant • Guarantees processing of messages
  48. Stream processing DB Queue DB DB Storm cluster
  49. Raw data What is a query? View
  50. Tweets What is a query? # Tweets for a URL
  51. Tweets What is a query? Influence Score for a person
  52. Raw data Computing a query Fully precompute view DB Query
  53. Raw data Computing a query Do a live compute from scratch Query
  54. Computing a query DB Raw data Precompute subviews Compute query from DB Query intermediate dbs DB
  55. Distributed RPC Application Queue “I want to know X, and return the results to me at Y”
  56. Distributed RPC DBs Queue App queries Storm cluster
  57. (BackType is hiring)
  58. Questions?

Hinweis der Redaktion

  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

×