SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Cassandra Internals
Cassandra London Meetup – July 2013
Nicolas Favre-Felix
Software Engineer
@yowgi – @acunu
1
Nicolas Favre-Felix – Cassandra London July 2013
A lot to talk about
• Memtable
• SSTable
• Commit log
• Row Cache
• Key Cache
• Compaction
• Secondary indexes
• Bloom Filters
• Index samples
• Column indexes
• Thrift
• CQL
2
Nicolas Favre-Felix – Cassandra London July 2013
1. High latency in a read-heavy workload
2. High CPU usage with little activity on the cluster
3. nodetool repair taking too long to complete
4. Optimising for the highest insert throughput
Four real-world problems
3
Nicolas Favre-Felix – Cassandra London July 2013
• Acunu professional services for Apache Cassandra
• 24x7 support for questions and emergencies
• Cluster “health check” sessions
• CassandraTraining & Workshop
Context
4
Nicolas Favre-Felix – Cassandra London July 2013
“Reading takes too long”
5
Nicolas Favre-Felix – Cassandra London July 2013
Symptoms
• High latency observed in read operations
• Thousands of read requests per second
6
Nicolas Favre-Felix – Cassandra London July 2013
Staged Event-Driven
Architecture (SEDA)
7
Nicolas Favre-Felix – Cassandra London July 2013
SEDA in Cassandra
• Stages in Cassandra have different roles
• MutationStage for writes
• ReadStage for reads
• ... 10 or so in total
• Each Stage is backed by a thread pool
• Not all task queues are bounded
8
Nicolas Favre-Felix – Cassandra London July 2013
ReadStage
• Not all reads are equal:
• Some served from in-memory data structures
• Some served from the Linux page cache
• Some need to hit disk, possibly more than once
• Read operations can be disk-bound
• Avoid saturating disk with random reads
• Recommended pool size: 16×number_of_drives
9
Nicolas Favre-Felix – Cassandra London July 2013
nodetool tpstats
Pool Name Active Pending Completed
ReadStage 16 3197 733819430
RequestResponseStage 0 0 3381277
MutationStage 5 0 1130984
ReadRepairStage 0 0 80095473
ReplicateOnWriteStage 0 0 4728857
GossipStage 0 0 20252373
AntiEntropyStage 0 0 2228
MigrationStage 0 0 19
MemtablePostFlusher 0 0 839
StreamStage 0 0 40
FlushWriter 0 0 2349
MiscStage 0 0 0
commitlog_archiver 0 0 0
AntiEntropySessions 0 0 11
InternalResponseStage 0 0 7
HintedHandoff 0 0 6018
10
Nicolas Favre-Felix – Cassandra London July 2013
Solution
• iostat: little I/O activity
• free: large amount of memory used to cache pages
• → Increased concurrent_reads to 32
• → Latency dropped to reasonable levels
• Recommendations:
• Reduce the number of reads
• Keep an eye on I/O as data grows
• Buy more disks or RAM when falling out of cache
11
Nicolas Favre-Felix – Cassandra London July 2013
“Cassandra is busy doing nothing”
12
Nicolas Favre-Felix – Cassandra London July 2013
Context
• 2-node cluster
• Little activity on the cluster
• Very high CPU usage on the nodes
• Storing metadata on published web content
13
Nicolas Favre-Felix – Cassandra London July 2013
nodetool cfhistograms
• Node-local histogram stored per CF, per node
• Distribution of number of files accessed per read
• Distribution of read and write latencies
• Distribution of row sizes and column counts
• Buckets are approximate but still very useful
14
Nicolas Favre-Felix – Cassandra London July 2013
SSTables accessed per read
0
1,000,000
2,000,000
3,000,000
0 1 2 3 4 5 6 7 8 9 10
Number of reads
SSTables accessed
15
Nicolas Favre-Felix – Cassandra London July 2013
Row size distribution (bytes)
0
1
2
3
4
5
0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000
Number of rows
Row size in bytes
16
Nicolas Favre-Felix – Cassandra London July 2013
Column count distribution
0
2
4
6
8
10
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000
Number of rows
Number of columns
17
Nicolas Favre-Felix – Cassandra London July 2013
Read latency distribution (µsec)
0
180,000
360,000
540,000
720,000
900,000
1 100 10,000 1,000,000
Number of reads
Number of reads
Latency (µsec)
18
Nicolas Favre-Felix – Cassandra London July 2013
Data model issue
• Row key was “views”
• Column names were item names, values counters
• Cassandra stored only a few massive rows
• → Reading from many SSTables
• → De-serialising large column indexes
views post-1234: 77: post-1240: 8 post-1250: 3
19
Nicolas Favre-Felix – Cassandra London July 2013
CF read latency & column index
(taken from Aaron Morton’s talk at Cassandra SF 2012)
0
1,500
3,000
4,500
6,000
85th 95th 99th
Latency(microseconds)
Percentile
First column from 1,200
First column from 1,000,000
20
Nicolas Favre-Felix – Cassandra London July 2013
Solution
• “Transpose” the table:
• Make the item name the row key
• Have a few counters per item
• Distribute the rows across the whole cluster
post-123 : views: 9078 comments: 3
21
Nicolas Favre-Felix – Cassandra London July 2013
“nodetool repair takes ages”
22
Nicolas Favre-Felix – Cassandra London July 2013
Nodetool repair
• “Active Anti-Entropy” mechanism in Cassandra
• Synchronises replicas
• Running repair is important to replicate tombstones
• Should run at least once every 10 days
• Repair was taking a week to complete
23
Nicolas Favre-Felix – Cassandra London July 2013
Two phases
1. Contact replicas, ask for MerkleTrees
1. They scan their local data and send a tree back
2. Compare MerkleTrees between replicas
1. Identify differences
2. Stream blocks of data out to other nodes
3. Stream data in and merge locally
24
Nicolas Favre-Felix – Cassandra London July 2013
MerkleTrees
top hash
hash-0 hash-1
hash-00 hash-01 hash-10 hash-11
data
block 0
data
block 1
data
block 2
data
block 3
•Hashes of hashes of ... data
•215 = 32,768 leaf nodes
(memory)
(disk)
25
Nicolas Favre-Felix – Cassandra London July 2013
Cassandra logs
• MerkleTree requests and responses
• Check how long it took
• Differences found, in number of leaf nodes
• More differences more data to stream
• Streaming sessions starting and ending
26
Nicolas Favre-Felix – Cassandra London July 2013
Diagnostic
• Building MerkleTrees: 20-30 minutes
• “4,700 ranges out of sync” (~14% of 32,768)
• Streaming session to repair the range: 4.5 hours
• Much slower rate than expected
27
Nicolas Favre-Felix – Cassandra London July 2013
Solutions
• Increase consistency level from ONE
• Rely on read repair to decrease entropy
• Fix problem of dropped writes
• Review data model and cluster size
• Add more disks and RAM, maybe more nodes
• Investigate network issues (speed, partitions?)
• Monitor both phases of the repair process
28
Nicolas Favre-Felix – Cassandra London July 2013
“How can we write faster?”
29
Nicolas Favre-Felix – Cassandra London July 2013
Context
• Time-series data from 1 million sensors
• 40 data points (e.g. temperature, pressure...)
• Sent in one batch every 5 minutes
• 40M cols / 5 min = 133,000 cols/sec
• One node...
30
Nicolas Favre-Felix – Cassandra London July 2013
Data model 1
• One row per (sensor, day)
• Metrics columns grouped by minute within the row
• Range queries between minutes A and B within a day
CREATE TABLE sensor_data (
sensor_id text,
day integer,
hour integer,
minute integer,
metric1 integer,
[...]
metric40 integer,
PRIMARY KEY ((sensor_id, day), minute);
31
Nicolas Favre-Felix – Cassandra London July 2013
Data model 1
• At 12:00, insert 40 cols into row (sensor1, 2013-07-11)
• At 12:05, insert 40 cols into row (sensor1, 2013-07-11)
• These columns might not be written to the same file
• Compaction process needs to merge them together:
• Large amounts of overlap between SSTables
• Rate is around 500 KB/sec
• 30% CPU usage spent compacting; no issues with I/O
32
Nicolas Favre-Felix – Cassandra London July 2013
Data model 2
• One row per (sensor, day, minute)
• No range query within the day (need to enumerate)
• Compaction now reaching 7 MB/sec
• Tests show a 10-20% increase in throughput
- PRIMARY KEY ((sensor_id, day), minute);
+ PRIMARY KEY ((sensor_id, day, minute));
33
Nicolas Favre-Felix – Cassandra London July 2013
Next steps
• Workload is CPU-bound, disks are not a problem
• Larger memtable mean lower write amplification
• Managed to flush after 400k ops instead of 200k
• Track time spent in GC with jstat -gcutil
• At this rate, consider adding more nodes
34
Nicolas Favre-Felix – Cassandra London July 2013
1. Interactions between Cassandra and the hardware
2. Implications of a bad data model at the storage layer
3. Internal data structures and processes
4. Work involved in arranging data on disk
Four problems, four solutions
35
Nicolas Favre-Felix – Cassandra London July 2013
Guidelines
• Monitor Cassandra, OS, JVM, hardware
• Learn how to use nodetool
• Follow best practices in data modelling and sizing
• Keep an eye on the Cassandra logs
• Consider available resources as sharing “work”
36
Nicolas Favre-Felix – Cassandra London July 2013
Thank you!
37

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
DataStax
 

Was ist angesagt? (20)

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
Cassandra at Instagram 2016 (Dikang Gu, Facebook) | Cassandra Summit 2016
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
Optimizing Your Cluster with Coordinator Nodes (Eric Lubow, SimpleReach) | Ca...
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Compa...
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 

Andere mochten auch

Andere mochten auch (11)

Cassandra Internals: The Read Path (Tyler Hobbs, DataStax) | Cassandra Summit...
Cassandra Internals: The Read Path (Tyler Hobbs, DataStax) | Cassandra Summit...Cassandra Internals: The Read Path (Tyler Hobbs, DataStax) | Cassandra Summit...
Cassandra Internals: The Read Path (Tyler Hobbs, DataStax) | Cassandra Summit...
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
 
Modeling the IoT with TitanDB and Cassandra
Modeling the IoT with TitanDB and CassandraModeling the IoT with TitanDB and Cassandra
Modeling the IoT with TitanDB and Cassandra
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 
Cassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra InternalsCassandra Community Webinar: Apache Cassandra Internals
Cassandra Community Webinar: Apache Cassandra Internals
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
 
Service discovery in a microservice architecture using consul
Service discovery in a microservice architecture using consulService discovery in a microservice architecture using consul
Service discovery in a microservice architecture using consul
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Functional go
Functional goFunctional go
Functional go
 

Ähnlich wie Understanding Cassandra internals to solve real-world problems

Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
Lucidworks
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
Datadog
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 

Ähnlich wie Understanding Cassandra internals to solve real-world problems (20)

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Solr 4
Solr 4Solr 4
Solr 4
 
Scaling Elasticsearch at Synthesio
Scaling Elasticsearch at SynthesioScaling Elasticsearch at Synthesio
Scaling Elasticsearch at Synthesio
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
LesFurets.com: From 0 to Cassandra on AWS in 30 days - Tsunami Alerting Syste...
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Webinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with FusionWebinar: Faster Log Indexing with Fusion
Webinar: Faster Log Indexing with Fusion
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
The Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQLThe Data Mullet: From all SQL to No SQL back to Some SQL
The Data Mullet: From all SQL to No SQL back to Some SQL
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Devops kc
Devops kcDevops kc
Devops kc
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Scalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data SystemsScalable Storage for Massive Volume Data Systems
Scalable Storage for Massive Volume Data Systems
 
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
10 Ways to Scale Your Website Silicon Valley Code Camp 201910 Ways to Scale Your Website Silicon Valley Code Camp 2019
10 Ways to Scale Your Website Silicon Valley Code Camp 2019
 
10 Ways to Scale with Redis - LA Redis Meetup 2019
10 Ways to Scale with Redis - LA Redis Meetup 201910 Ways to Scale with Redis - LA Redis Meetup 2019
10 Ways to Scale with Redis - LA Redis Meetup 2019
 
SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.SQL Now! How Optiq brings the best of SQL to NoSQL data.
SQL Now! How Optiq brings the best of SQL to NoSQL data.
 

Mehr von Acunu

Mehr von Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Understanding Cassandra internals to solve real-world problems

  • 1. Cassandra Internals Cassandra London Meetup – July 2013 Nicolas Favre-Felix Software Engineer @yowgi – @acunu 1
  • 2. Nicolas Favre-Felix – Cassandra London July 2013 A lot to talk about • Memtable • SSTable • Commit log • Row Cache • Key Cache • Compaction • Secondary indexes • Bloom Filters • Index samples • Column indexes • Thrift • CQL 2
  • 3. Nicolas Favre-Felix – Cassandra London July 2013 1. High latency in a read-heavy workload 2. High CPU usage with little activity on the cluster 3. nodetool repair taking too long to complete 4. Optimising for the highest insert throughput Four real-world problems 3
  • 4. Nicolas Favre-Felix – Cassandra London July 2013 • Acunu professional services for Apache Cassandra • 24x7 support for questions and emergencies • Cluster “health check” sessions • CassandraTraining & Workshop Context 4
  • 5. Nicolas Favre-Felix – Cassandra London July 2013 “Reading takes too long” 5
  • 6. Nicolas Favre-Felix – Cassandra London July 2013 Symptoms • High latency observed in read operations • Thousands of read requests per second 6
  • 7. Nicolas Favre-Felix – Cassandra London July 2013 Staged Event-Driven Architecture (SEDA) 7
  • 8. Nicolas Favre-Felix – Cassandra London July 2013 SEDA in Cassandra • Stages in Cassandra have different roles • MutationStage for writes • ReadStage for reads • ... 10 or so in total • Each Stage is backed by a thread pool • Not all task queues are bounded 8
  • 9. Nicolas Favre-Felix – Cassandra London July 2013 ReadStage • Not all reads are equal: • Some served from in-memory data structures • Some served from the Linux page cache • Some need to hit disk, possibly more than once • Read operations can be disk-bound • Avoid saturating disk with random reads • Recommended pool size: 16×number_of_drives 9
  • 10. Nicolas Favre-Felix – Cassandra London July 2013 nodetool tpstats Pool Name Active Pending Completed ReadStage 16 3197 733819430 RequestResponseStage 0 0 3381277 MutationStage 5 0 1130984 ReadRepairStage 0 0 80095473 ReplicateOnWriteStage 0 0 4728857 GossipStage 0 0 20252373 AntiEntropyStage 0 0 2228 MigrationStage 0 0 19 MemtablePostFlusher 0 0 839 StreamStage 0 0 40 FlushWriter 0 0 2349 MiscStage 0 0 0 commitlog_archiver 0 0 0 AntiEntropySessions 0 0 11 InternalResponseStage 0 0 7 HintedHandoff 0 0 6018 10
  • 11. Nicolas Favre-Felix – Cassandra London July 2013 Solution • iostat: little I/O activity • free: large amount of memory used to cache pages • → Increased concurrent_reads to 32 • → Latency dropped to reasonable levels • Recommendations: • Reduce the number of reads • Keep an eye on I/O as data grows • Buy more disks or RAM when falling out of cache 11
  • 12. Nicolas Favre-Felix – Cassandra London July 2013 “Cassandra is busy doing nothing” 12
  • 13. Nicolas Favre-Felix – Cassandra London July 2013 Context • 2-node cluster • Little activity on the cluster • Very high CPU usage on the nodes • Storing metadata on published web content 13
  • 14. Nicolas Favre-Felix – Cassandra London July 2013 nodetool cfhistograms • Node-local histogram stored per CF, per node • Distribution of number of files accessed per read • Distribution of read and write latencies • Distribution of row sizes and column counts • Buckets are approximate but still very useful 14
  • 15. Nicolas Favre-Felix – Cassandra London July 2013 SSTables accessed per read 0 1,000,000 2,000,000 3,000,000 0 1 2 3 4 5 6 7 8 9 10 Number of reads SSTables accessed 15
  • 16. Nicolas Favre-Felix – Cassandra London July 2013 Row size distribution (bytes) 0 1 2 3 4 5 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 Number of rows Row size in bytes 16
  • 17. Nicolas Favre-Felix – Cassandra London July 2013 Column count distribution 0 2 4 6 8 10 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 Number of rows Number of columns 17
  • 18. Nicolas Favre-Felix – Cassandra London July 2013 Read latency distribution (µsec) 0 180,000 360,000 540,000 720,000 900,000 1 100 10,000 1,000,000 Number of reads Number of reads Latency (µsec) 18
  • 19. Nicolas Favre-Felix – Cassandra London July 2013 Data model issue • Row key was “views” • Column names were item names, values counters • Cassandra stored only a few massive rows • → Reading from many SSTables • → De-serialising large column indexes views post-1234: 77: post-1240: 8 post-1250: 3 19
  • 20. Nicolas Favre-Felix – Cassandra London July 2013 CF read latency & column index (taken from Aaron Morton’s talk at Cassandra SF 2012) 0 1,500 3,000 4,500 6,000 85th 95th 99th Latency(microseconds) Percentile First column from 1,200 First column from 1,000,000 20
  • 21. Nicolas Favre-Felix – Cassandra London July 2013 Solution • “Transpose” the table: • Make the item name the row key • Have a few counters per item • Distribute the rows across the whole cluster post-123 : views: 9078 comments: 3 21
  • 22. Nicolas Favre-Felix – Cassandra London July 2013 “nodetool repair takes ages” 22
  • 23. Nicolas Favre-Felix – Cassandra London July 2013 Nodetool repair • “Active Anti-Entropy” mechanism in Cassandra • Synchronises replicas • Running repair is important to replicate tombstones • Should run at least once every 10 days • Repair was taking a week to complete 23
  • 24. Nicolas Favre-Felix – Cassandra London July 2013 Two phases 1. Contact replicas, ask for MerkleTrees 1. They scan their local data and send a tree back 2. Compare MerkleTrees between replicas 1. Identify differences 2. Stream blocks of data out to other nodes 3. Stream data in and merge locally 24
  • 25. Nicolas Favre-Felix – Cassandra London July 2013 MerkleTrees top hash hash-0 hash-1 hash-00 hash-01 hash-10 hash-11 data block 0 data block 1 data block 2 data block 3 •Hashes of hashes of ... data •215 = 32,768 leaf nodes (memory) (disk) 25
  • 26. Nicolas Favre-Felix – Cassandra London July 2013 Cassandra logs • MerkleTree requests and responses • Check how long it took • Differences found, in number of leaf nodes • More differences more data to stream • Streaming sessions starting and ending 26
  • 27. Nicolas Favre-Felix – Cassandra London July 2013 Diagnostic • Building MerkleTrees: 20-30 minutes • “4,700 ranges out of sync” (~14% of 32,768) • Streaming session to repair the range: 4.5 hours • Much slower rate than expected 27
  • 28. Nicolas Favre-Felix – Cassandra London July 2013 Solutions • Increase consistency level from ONE • Rely on read repair to decrease entropy • Fix problem of dropped writes • Review data model and cluster size • Add more disks and RAM, maybe more nodes • Investigate network issues (speed, partitions?) • Monitor both phases of the repair process 28
  • 29. Nicolas Favre-Felix – Cassandra London July 2013 “How can we write faster?” 29
  • 30. Nicolas Favre-Felix – Cassandra London July 2013 Context • Time-series data from 1 million sensors • 40 data points (e.g. temperature, pressure...) • Sent in one batch every 5 minutes • 40M cols / 5 min = 133,000 cols/sec • One node... 30
  • 31. Nicolas Favre-Felix – Cassandra London July 2013 Data model 1 • One row per (sensor, day) • Metrics columns grouped by minute within the row • Range queries between minutes A and B within a day CREATE TABLE sensor_data ( sensor_id text, day integer, hour integer, minute integer, metric1 integer, [...] metric40 integer, PRIMARY KEY ((sensor_id, day), minute); 31
  • 32. Nicolas Favre-Felix – Cassandra London July 2013 Data model 1 • At 12:00, insert 40 cols into row (sensor1, 2013-07-11) • At 12:05, insert 40 cols into row (sensor1, 2013-07-11) • These columns might not be written to the same file • Compaction process needs to merge them together: • Large amounts of overlap between SSTables • Rate is around 500 KB/sec • 30% CPU usage spent compacting; no issues with I/O 32
  • 33. Nicolas Favre-Felix – Cassandra London July 2013 Data model 2 • One row per (sensor, day, minute) • No range query within the day (need to enumerate) • Compaction now reaching 7 MB/sec • Tests show a 10-20% increase in throughput - PRIMARY KEY ((sensor_id, day), minute); + PRIMARY KEY ((sensor_id, day, minute)); 33
  • 34. Nicolas Favre-Felix – Cassandra London July 2013 Next steps • Workload is CPU-bound, disks are not a problem • Larger memtable mean lower write amplification • Managed to flush after 400k ops instead of 200k • Track time spent in GC with jstat -gcutil • At this rate, consider adding more nodes 34
  • 35. Nicolas Favre-Felix – Cassandra London July 2013 1. Interactions between Cassandra and the hardware 2. Implications of a bad data model at the storage layer 3. Internal data structures and processes 4. Work involved in arranging data on disk Four problems, four solutions 35
  • 36. Nicolas Favre-Felix – Cassandra London July 2013 Guidelines • Monitor Cassandra, OS, JVM, hardware • Learn how to use nodetool • Follow best practices in data modelling and sizing • Keep an eye on the Cassandra logs • Consider available resources as sharing “work” 36
  • 37. Nicolas Favre-Felix – Cassandra London July 2013 Thank you! 37