1. Oh why is my
cluster’s read
performance
terrible?
2. What it is.
• In one sentence:
– A data store that resembles:
• A Hash table (k/v) that’s evenly distributed across a
cluster of servers. Practically speaking, not n level k/v.
• Or, An excel spread sheet that is chopped up and
housed on different servers.
• Basic Nomenclature..
3. Where did it come from?
• Legacy:
– Google Big Table and Amazon Dynamo.
– Open sourced by Facebook in 2009.
– They had designed it for ‘Inbox search’
4. What it is not.
• Not a general purpose data store.
– Highly specialized use cases.
• The Business use case must align with Cassandra architecture.
– No transactions
– No joins. (make multiple roundtrips). De-normalize data.
– No stored procedures.
– No range queries across keys (default)
– No Referential Integrity (pk constraint, foreign keys).
– No locking.
– Uses timestamp to upsert data.
– Charlie must be aghast.
5. Who uses it?
• Web scale companies.
– Netflix, Twitter.
• Capture clickstream data.
• User activity/gaming.
• Backing store for search tools (lucene)
– Structured/Unstructured data.
• Trend: Web scale companies moving from
distributed Mysql to Cassandra.
8. Where do people use it?
• Mostly in analytic/reporting ‘stacks’.
– Fire hose (value proposition 1) in vast amounts of ‘log like’
data.
– Hopefully your data model ensures that your data
is physically clustered (value proposition 2) on read.
- Data that is physically clustered is conducive to
reporting.
- Can be used ‘real time’ but not its strength.
9. Important to know right up front.
• Designed for High Write rates (all activity is sequential io).
– If improperly used, read performance will suffer.
– Always strive to minimize disk seek on read.
• Millions of Inserts should result in 10’s of thousands of Row Keys. (not
millions of keys)
• Main usage pattern: High Write / Low Read (rates). See Netflix slide.
• Anti-pattern: (oltp like) Millions of inserts / Millions of reads. (for main
data tables)
• If your Cassandra use is kosher, then you will find that IO is the bottleneck.
Need better performance? Simply add more boxes (more io bandwidth to
your cluster)
• It’s all about physically clustering your data for efficient reads.
• You have a query in mind, well, design a Cassandra table that satisfies your
query. (lots of data duplication all over the place). Make sure that your
query is satisfied by navigating to a single row key.
• Favors throughput over latency.
10. • With analytics/reporting in mind:
– Let’s explore RDMS storage inefficiency (For large
query) and Cassandra’s value proposition # 2.
11. Data in an RDBMS (physical)
Block size 8k Symbol Price Time
Select * …
db block 1 MSFT 28.01 t1 1 k row size
Where
…
Symbol=MSFT …
MSFT 28.03 t5
Minimum IO = 24K (8k x blocks visited)
3 seeks. db block 20 …
Slow. …
MSFT 28.03 t7
…
db block
1000 …
…
MSFT 28.01 t22
…
12. Data in a Cassandra (physical)
Select *
Where KEY Col/Value
Symbol=MSFT MSFT t1 => 28.03 t5 => 28.03 t7=>28.03 t22=>28.01
Minimum IO = 8K (8k x 1)
- 1 seek to KEY (+ overhead), then sequentially read the data.
- You want to make sure that you are getting a lot of value per seek!
Cassandra likes “Wide Rows”.
- Your data is physically clustered and sorted (t1,t5…).
- Millions of inserts have resulted in thousands of keys. (high write/ low
read)
- Fast
13. What it is, in depth:
• Log Structured Data store. (all activity is
sequentially written)
• Favors Availability and Partition tolerance over
Consistency.
– Consistency is tunable, but if you tune for high
consistency you trade off performance.
– Cassandra consistency is not the same as database
consistency. It is ReadYourWrites consistency.
• Column oriented.
• TTL (time to live / expire data )
• Compaction (coalesce data in files)
14. System Properties
• Distributed / elastic scalability. (value proposition 3)
• Fault Tolerant – Rack aware, Inter/intra
datacenter data replication. (value proposition 4)
• Peer to peer, no single point of failure. (value
proposition 5) (write/read from any node, it will act
as the proxy to the cluster). No master node.
• Durable.
15. Evenly distributes data (default)
• Consistent hashing.
• Token Range: 0 – 2^127-1
• Your ‘key’ gets
Assigned a token.
Eg. Key = smith = token
15, place it on the Eastern
Node.
19. ACID?
• A/I/D ( in bits and bobs)
• BASE. Basically Available Soft-state Eventual
consistency
20. Cassandra/Future
• Will slowly take on more rdbms like features.
– Cassandra 1.1 has row level isolation. Previously
you could read some one else’s inflight data.
21. Reference: CAP Theorem.
• Consistency (all nodes see the same data at
the same time)
• Availability (a guarantee that every request
receives a response about whether it was
successful or failed)
• Partition tolerance (the system continues to
operate despite arbitrary message loss or
failure of part of the system)