Cassandra at Twitter - Distributed Counters

Cassandra at Twitter
(one use case)
Ryan King

Cassandra Meetup
January 12, 2011

TM

History
‣ Started port of Tweets to Cassandra June 2009
‣ Started other Cassandra projects in 2009 (more on this later)
‣ Abandoned Tweet in 2010

Use cases
‣ realtime traffic/engagement analytics
‣ systems monitoring

Time Series Data
‣ write heavy
‣ stored temporally
‣ viewed temporarily
‣ hierarchical aggregation

Data Model
‣ Distributed Counters (CASSANDRA-1072)
‣ each time series is a row (or rows) of counters
‣ slice over rows to get recent data

Data Model
‣ An example (not exactly the way we do it):

2011-01-12T10:00 2011-01-12T10:01 ...
host:web1:load1 5 4 ...
host:web2:load1 4 3 ...
cluster:web:load1:sum 576 505 ...
cluster:web:load1:count 100 95 ...

Aggregation
‣ Measured every minute (or continuously)
‣ Rollup to courser granularities
‣ More Counters! (aka, let’s do it live)

Aggregation
Minutes 2011-01-12T10:00 2011-01-12T10:01 ...
host:web1:load1:sum 5 4 ...
host:web1:load1:count 1 1 ...
Hours 2011-01-12T10 2011-01-12T11 ...

Aggregation
‣ other dimensions besides time:
‣ clusters
‣ racks / dcs, etc
‣ And combinations of the above

Pros / Cons
‣ Pros
‣ real-time data (average 30s between measurement and visibility)
‣ real time aggregation
‣ flexible data retention (once counters and TTLs work together)
‣ Cons
‣ Storage-intensive
‣ Slow reads

Questions?

ryan@twitter.com
twitter.com/rk

TM

Obligatory Plug.
twitter.com/jobs

TM

Cassandra at Twitter - Distributed Counters

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Cassandra at Twitter - Distributed Counters

Ähnlich wie Cassandra at Twitter - Distributed Counters (20)

Cassandra at Twitter - Distributed Counters

Hinweis der Redaktion