5. Use cases
‣ realtime traffic/engagement analytics
‣ systems monitoring
6. Time Series Data
‣ write heavy
‣ stored temporally
‣ viewed temporarily
‣ hierarchical aggregation
7. Data Model
‣ Distributed Counters (CASSANDRA-1072)
‣ each time series is a row (or rows) of counters
‣ slice over rows to get recent data
8. Data Model
‣ An example (not exactly the way we do it):
2011-01-12T10:00 2011-01-12T10:01 ...
host:web1:load1 5 4 ...
host:web2:load1 4 3 ...
cluster:web:load1:sum 576 505 ...
cluster:web:load1:count 100 95 ...
9. Aggregation
‣ Measured every minute (or continuously)
‣ Rollup to courser granularities
‣ More Counters! (aka, let’s do it live)
11. Aggregation
‣ other dimensions besides time:
‣ clusters
‣ racks / dcs, etc
‣ And combinations of the above
12. Pros / Cons
‣ Pros
‣ real-time data (average 30s between measurement and visibility)
‣ real time aggregation
‣ flexible data retention (once counters and TTLs work together)
‣ Cons
‣ Storage-intensive
‣ Slow reads