Cassandra is a NoSQL database designed by Facebook to manage large amounts of structured data across hundreds of nodes with high write throughput using cheap hardware. It provides high scalability, reliability and performance. Cassandra uses a decentralized architecture with consistent hashing to partition and replicate data across multiple nodes. It has a simple API with insert, get, and delete methods. Facebook successfully uses Cassandra to power their inbox search with over 50 TB of data distributed across 150 nodes in different datacenters, achieving millisecond read latencies at scale.
3. Introduction
• Cassandra is a NoSQL database.
• Originally developed by Facebook.
• Cassandra was designed to:
• Manage large amounts of structured data.
• Run at tope of a system of hundreds of nodes.
• Handle high write throughput.
• Run on cheap hardware (scale out).
• Provide high scalability, reliability and performance.
3
4. Data Model
4
Source: http://www.inmensia.com/blog/20100327/desmitificando_a_cassandra.html
5. Data Model (Example)
5
Source: http://www.divconq.com/2010/how-to-add-and-retrieve-data-from-a-cassandra-database/
6. API (Application Programming Interface)
• Three simple methods:
• insert (table, key, rowMutation)
• get (table, key, columnName)
• delete (table, key, columnName)
6
7. System architecture
• Partitioning
• Data is partitioned dynamically over a set of nodes.
• Uses consistent hashing.
• Replication
• Each data item is replicated at N hosts.
• Failure detection
• Every node knows if the rest of nodes in the system
are up or down.
7
8. Performance example
• Facebook Inbox Search
• More than 50 TB
• 150 nodes
• Different datacentres (west and east cost)
• Read performance:
Latency Search interactions Term search
Min 7.69 ms 7.78 ms
Median 15.69 ms 18.27 ms
Max 26.13 ms 44.41 ms
8
9. Conclusion
• They successfully implemented a system
which provides:
• Scalability
• High performance
• Wide applicability
9
10. Paper
A. Lakshman and P. Malik, “Cassandra: a
decentralized structured storage system”, ACM
SIGOPS Operating Systems Review, vol. 44, n. 2,
pp. 35-40, April 2010.
10