Presentation I gave at Bay Area Clojure Meetup Group on May 6th, 2010. Also demoed examples from introductory tutorial:
http://nathanmarz.com/blog/introducing-cascalog/
1. Cascalog
Nathan Marz, BackType
Po wer fu l a n d ea sy-t o- us e data a n a lysi s to ol fo r H adoo p
2. About Me
Tech Lead at BackType
Have been working on many-terabyte scale
systems for two years
ETL workflows
Data warehouses
3. What is Hadoop?
Distributed Filesystem
MapReduce Framework
Scales to thousands of machines and petabytes of
data
4. What is Cascalog?
Clojure-based query language for Hadoop with
Datalog-inspired syntax
Queries compile to one or more MapReduce jobs
The tool I wish I had two years ago
6. What sets Cascalog apart?
Super simple
Full power of Clojure always available
Easy to extend with custom operations
Dynamic queries
Arbitrary inputs and outputs
7. What sets Cascalog apart?
Super simple
Full power of Clojure always available
Easy to extend with custom operations
Dynamic queries
Arbitrary inputs and outputs
8. Experiment with Cascalog
Ships with test
dataset that can be
queried locally (the
“playground”)
5 minutes to setup
Hadoop, Clojure, and
Cascalog locally - see
README
9. News feed generator
Ranks events in
social network
for each person
based on
“importance”
and recency
38 lines of code