6. SQL Specialized data structures (think B-trees) Shines with complicated queries Focus on fast query & analysis quickly Not necessarily on large datasets
14. Outline History Scaling Replication Model Data Model Tuning Write Path Read Path Client Access Practical Considerations
15. Distributed and Scalable Horizontal! All nodes are identical No master or SPOF Adding is simple Automatic cluster maintenance
16. Outline History Scaling Replication Model Data Model Tuning Write Path Read Path Client Access Practical Considerations
17. Replication Replication factor How many nodes data is replicated on Consistency level Zero, One, Quorum, All Sync or async for writes Reliability of reads Read repair
18. Ring Topology RF=3 Conceptual Ring One token per node Multiple ranges per node a j d g
19. Ring Topology RF=2 Conceptual Ring One token per node Multiple ranges per node a j d g
20. New Node RF=3 Token assignment Range adjustment Bootstrap Arrival only affects immediate neighbors a m j d g
21. Ring Partition RF=3 Node dies Available? Hinting Handoff Achtung! Plan for this a j d g
22. Outline History Scaling Replication Model Data Model Tuning Write Path Read Path Client Access Practical Considerations
23. Schema-free Sparse-table Flexible column naming You define the sort order Not required to have a specific column just because another row does
37. Inserting: Writes Commit log for durability Configurable fsync Sequential writes only Memtable – no disk access (no reads or seeks) Sstables are final (become read only) Indexes Bloom filter Raw data Bottom line: FAST!!!
38. Outline History Scaling Replication Model Data Model Tuning WritePath Read Path Client Access Practical Considerations
39. Querying: Overview You need a key or keys: Single: key=‘a’ Range: key=‘a’ through ’f’ And columns to retrieve: Slice: cols={bar through kite} By name: key=‘b’ cols={bar, cat, llama} Nothing like SQL “WHERE col=‘faz’” But secondary indices are being worked on (see CASSANDRA-749)
40. Querying: Reads Practically lock free Sstable proliferation New in 0.6: Row cache (avoid sstable lookup, not write-through) Key cache (avoid index scan)
41. Outline History Scaling Replication Model Data Model Tuning WritePath Read Path Client Access Practical Considerations
42. Client API (Low Level) Fat Client Live non-storage node Reduced RPC overhead Thrift (12 language bindings!) http://incubator.apache.org/thrift/ No streaming Avro Work in progress
45. Outline History Scaling Replication Model Data Model Tuning WritePath Read Path Client Access Practical Considerations
46. Practical Considerations Partitioner-Random or Order Preserving Range queries Provisioning Virtual or bare metal Cluster size Data model Think in terms of access Giving up transactions, ad-hoc queries, arbitrary indexes and joins (you may already do this with an RDBMS!)
48. Future Direction Vector clocks (server side conflict resolution) Alter keyspace/column families on a live cluster Compression Multi-tenant features Less memory restrictions
49. Wrapping Up Use Cassandra if you want/need High write throughput Near-linear scalability Automated replication/fault tolerance Can tolerate missing RDBMS features
32 core processor machines are expensiveCosts go way up when you try to scale these databasesAlso-instability.
Terabytes of data~1,000,000 ops/secondSchema changes are difficult (impossible)Manual sharding takes a lot of effortAutomated sharding + replication is difficult
100 M users, 25 TB data
Horizontal – commodity hardware, not specialized boxes
Cluster is a logical storage ringNode placement divides the ring into ranges that represent start/stop points for keysAutomatic or manual token assignment (use another slide for that) Closer together means less responsibility and data
Token
Bootstrapping
Hinting not designed for long failures.
RDBMS focus on consistency. Limits scale.
No multi-key transactions
Sstable proliferation degrades performance.
DistributedScalableSchema-freeSparse tableEventually consistentTunable (throughput and fault-tolerance)