2. Motivation
Scaling
How do you scale your database?
● reads
● writes
3.
4. Influential Papers
● Bigtable: A distributed storage system for structured data,
2006
● Dynamo: amazon's highly available key-value store, 2007
Cassandra:
● partition and replication - Dynamo
● log structure column family - Bigtable
5. Cassandra Highlights
● Symmetric - all nodes are exactly the same
○ No single point of failure
○ Linearly scalable
○ Ease of administration
● High availability with multiple datacenters
● Consistency vs Latency
● Read/Write anywhere
● Flexible Schema
● Column TTL
● Distributed Counters
9. Consistency
N = Replication factor
R = Number of replicas to block when read <= N
W = Number of replicas to block when write <= N
Quorum = N/2 + 1
When W + R > N there is a full consistency
examples:
● W = 1, R = N
● W = N, R = 1
● W = Quorum, R = Quorum
10. Consistency Level
● Every request defines consistency level
○ Any
○ One
○ Two
○ Three
○ Quorum
○ Local Quorum
○ Each Quorum
○ All
20. Cache
● There is no need to use memcached
● There is an internal configurable cache
○ Key cache
○ Row cache
21. Sorting
When you preform get the result is sorted
● Rows are sorted according to the partitioner
● Columns in a row are sorted according to the type of the
column name
22. Partitioner
● RandomPartitioner - Uses hash values as tokens. useful for
distributing the load on all nodes.
If you use it, set the nodes tokens manually
● OrderPreservePartioner - You can get sorted rows but it will
cost you with an even cluster
26. Cascal - Scala client
Insert column:
session.insert("app" "users" "shimi" "passwd" "mypass")
val key = "app" "users" "shimi"
session.insert(key "email" "shimi.k@...")
Get column value:
val pass = session.get(key "passwd")
27. Cascal
Get multiple columns:
val row = session.list(key)
val cols = session.list(key, RangePredicate("email", "passwd"))
val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
28. Cascal
Get multiple rows:
val family = "app" "users"
val rows = session.list(family, RangePredicate("dan", "shimi"))
val rows = session.list(family, KeyPrdicate("dan", "shimi"))
30. Guidelines
● Keep together the data you query together
● Think about your use case and how you should fetch your
data.
● Don't try to normalize your data
● You can't win the disk
● Be ready to get your hands dirty
● There is no single solution for everything. You might
consider using different solutions together
31. The End
Useful links:
● Cassandra, http://cassandra.apache.org/
● Wiki http://wiki.apache.org/cassandra/
● Cassandra mailing list
● IRC
● Bigtable, http://labs.google.com/papers/bigtable.html
● Dynamo http://www.allthingsdistributed.
com/2007/10/amazons_dynamo.html
● Cascal, https://github.com/shimi/cascal