14. • 7 new committers added
• Dozens of contributors
• 200+ (!) people on IRC
• Hundreds of closed issues (bugs, features, etc)
• 4 major releases; a number of stable point releases
• Graduation to TLP
22. Querying
• get(): retrieve by column name
• multiget(): by column name for a set of keys
• get slice(): by column name, or a range of names
• returning columns
• returning super columns
• multiget slice(): a subset of columns for a set of keys
• get count: number of columns or sub-columns
• get range slice(): subset of columns for a range of keys
23. Updating
• insert(): add/update column (by key)
• batch insert(): add/update multiple columns (by key)
• remove(): remove a column
• batch mutate(): like batch insert() but can also delete
(new for 0.6, deprecates batch insert())
31. Case 1: Digg
Digg is a social news site that allows people to discover and share
content from anywhere on the Internet by submitting stories and
links, and voting and commenting on submitted stories and links.
Ranked 98th by Alexa.com.
33. Problem
• Terabytes of data; high transaction rate (reads dominated)
• Multiple clusters; heavily sharded
• Management nightmare (high effort, error prone)
• Unsatisfied availability requirements (geographic isolation)
34. Solution
• Currently production on ”Green Badges”
• Cassandra as primary data store RSN
• Datacenter and rack-aware replication
35. Case 2: Twitter
Twitter is a social networking and microblogging service that
enables its users to send and read tweets, text-based posts of up to
140 characters.
Ranked 12th by Alexa.com.
37. MySQL
• Terabytes of data, ˜1,000,000 ops/s
• Calls for heavy sharding, light replication
• Schema changes are very difficult, (if possible at all)
• Manual sharding is very high effort
• Automated sharding and replication is Hard
38. Case 3: Facebook
Facebook is a social networking site where users can create a
profile, add friends, and send them messages. Users can also join
groups organized by location or other points of common interest.
Ranked #2 by Alexa.com.
39. Inbox Search
• 100 TB
• 160 nodes
• 1/2 billion writes per day (2yr old number?)