Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache Incubator. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7975
2. Motivation
● Scaling reads to a relational database is
hard
● Scaling writes to a relational database is
virtually impossible
● … and when you do, it usually isn't relational
anymore
3. The new face of data
● Scale out, not up
● Online load balancing, cluster growth
● Flexible schema
● Key-oriented queries
● CAP-aware
4. CAP theorem
● Pick two of Consistency, Availability,
Partition tolerance
5. Two famous papers
● Bigtable: A distributed storage system for
structured data, 2006
● Dynamo: amazon's highly available key-
value store, 2007
6. Two approaches
● Bigtable: “How can we build a distributed
db on top of GFS?”
● Dynamo: “How can we build a distributed
hash table appropriate for the data
center?”
7. 10,000 ft summary
● Dynamo partitioning and replication
● Log-structured ColumnFamily data model
similar to Bigtable's
8. Cassandra highlights
● High availability
● Incremental scalability
● Eventually consistent
● Tunable tradeoffs between consistency
and latency
● Minimal administration
● No SPF
21. Remove
● Deletion marker (tombstone) necessary
to suppress data in older SSTables, until
compaction
● Read repair complicates things a little
● Eventually consistent complicates things
more
● Solution: configurable delay before
tombstone GC, after which tombstones
are not repaired
23. Read path
● Any node
● Partitioner
● Wait for R responses
● Wait for N – R responses in the
background and perform read repair
24. Cassandra read properties
● Read multiple SSTables
● Slower than writes (but still fast)
● Seeks can be mitigated with more RAM
● Scales to billions of rows
25. Consistency in a BASE world
● If W + R > N, you will have consistency
● W=1, R=N
● W=N, R=1
● W=Q, R=Q where Q = N / 2 + 1
26. vs MySQL with 50GB of data
● MySQL
● ~300ms write
● ~350ms read
● Cassandra
● ~0.12ms write
● ~15ms read
● Achtung!
35. Example: a multiuser blog
Two queries
- the most recent posts belonging to a
given blog, in reverse chronological order
- a single post and its comments, in
chronological order
36. First try
JBE Cassandra is teh awesome BASE FTW
blog
post comment comment post comment comment
Evan I like kittens And Ruby
blog
post comment comment post comment comment
<ColumnFamily
Type="Super"
CompareWith="TimeString"
CompareSubcolumnsWith="UUID"
Name="Blog"/>
37. Second try
JBE blog Cassandra BASE FTW Cassandr comment comment
is teh a is teh
awesome awesome
Evan blog I like kittens And Ruby Base FTW comment comment
I like comment comment
kittens
And Ruby comment comment
<ColumnFamily <ColumnFamily
CompareWith="UUIDType" CompareWith="UUIDType"
Name="Blog"/> Name="Comment"/>
39. Cassandra 0.3
● Remove support
● OPP / Range queries
● Test suite
● Workarounds for JDK bugs
● Rudimentary multi-datacenter support
40. Cassandra 0.4
● Branched May 18
● Data file format change to support billions
of rows per node instead of millions
● API changes (no more colon delimiters)
● Multi-table (keyspace) support
● LRU key cache
● fsync support
● Bootstrap
● Web interface
41. Cassandra 0.5
● Bootstrap
● Load balancing
● Closely related to “bootstrap done right”
● Merkle tree repair
● Millions of columns per row
● This will require another data format change
● Multiget
● Callout support
43. More
● Eventual consistency:
http://www.allthingsdistributed.com/2008/12/
● Introduction to distributed databases by
Todd Lipcon at NoSQL 09:
http://www.vimeo.com/5145059
● Other articles/videos about Cassandra:
http://wiki.apache.org/cassandra/ArticlesAndP
● #cassandra on irc.freenode.net