2. 2
About me and why Clojure 4 Big Data
●
Make Software since 2005, work with Big Data since 2012
●
Work for ADITION Technologies AG
– Leading european adserving provider
– Part of european tech stack VirtualMinds
– >2.5 bln events per day processed in real-time
– Extra ~12 bln data points in (batch) ETL daily
– 250 TB of data in hadoop data lake
– Several own data centers
– Low latency requirements
– Written mostly in Clojure
6. 6
●
Makes you think diferent and approach problems
diferently and solve them faster
●
Immutability, functions and map-reduce
●
Powerful, interactive, small, concise
●
Makes it hard to fall back to imperative style
13. 13
Storm Pros and Cons
●
No “exactly once” guarantee
●
Fast, simple
●
Multitenance and debugging
●
Integrations
14. 14
Trident
●
The “Cascading” of Storm
●
High level abstraction processing library on top of Storm
●
Rich API with joins, aggregations, grouping, etc.
●
Provides stateful, exactly-once processing primitives
23. 23
●
Cascading - a Java API
– defning complex data fows
– integrating those fows with back-end systems
– query planner for mapping and executing logical fows onto
a computing platform
●
Cascalog – Clojure DSL for Cascading