2. Agenda
01
● A little about Cassandra
● Cassandra Internal Concepts
○ Read Path
○ Write Path
○ Compactions
○ JVM GC
● Quick Q’s on Metrics, Monitoring
● Monitoring Cassandra
● Categories wise - Metrics with Visuals
3. Quick Overview - About Cassandra
● Apache Cassandra is a free and open-source, distributed, wide column store,
NoSQL database management system designed to handle large amounts of
data across many commodity servers, providing high availability with no single
point of failure.
● A Cassandra cluster has no special nodes i.e. the cluster has no masters, no slaves
or elected leaders.
● It provides various features:
○ Fault-tolerant
○ Scalable
○ Distributed
○ Tunable consistency
4. Let’s Take a PATH - WRITE PATH
● A write request for Cassandra follows the following path at node level:
5. WRITE PATH - Cont.
● A write request for Cassandra follows the following path at cluster level:
6. Let’s Take a PATH - READ PATH
● The read request follows the following path at SSTable level:
7. Compactions is necessary!
○ Why?
■to save disk space.
■to clean redundant data.
■to make response faster.
○ How cassandra stores data?
■adds data by timstamping.
■generates mulitple SSTables.
○ Why is it bad?
■Data utilization is high.
■response time is high.
○ Solution:
■remove redundant data with the help of
Tombstones
■generates ONE sstable throughout cluster.
8. JVM GC to the rescue!
● Cassandra process is run on JVM, using off-heap and in-heap
memory for its components.
● One situation that you definitely want to minimize
○ garbage collection pause aka stop-the-world event
● A pause occurs when a region of memory is full and its effects:
○ operations are suspended.
○ node can appear as down to other nodes.
○ the JVM needs to make space to continue.
○ Read/Write operations will wait, increases Read/Write
Latency.
12. ●To make that something successful for long
run
●To detect anomalies and see future
●To optimize that something
●To improve that something
●To save some $$$$
13. “ Necessity is the mother of
Invention and to make that
Invention successful, we need
to Monitor it. ”
14. Now Let’s MONITOR Cassandra
● What we can monitor in cassandra?
○ Performance
■Write Path
■Read Path
○ Capacity (Resources)
■Server/Node
■JVM
○ Operations
■Compactions
■Eventual Consistency
15. Performance based - Metrics
● Read and Write Latency in READ and WRITE PATH
○ The recent read latency and write latency counters are important in making sure operations are happening
in a consistent manner.
○ Effected by -
■Replication factor
■Compactions (Improves READ only)
■MemTable Flush
16. Cont:
● LiveSStable count - No. of SStables for a table in all keyspaces, or you can be more generic like to a specific
keyspace and table.
○ (Read Latency, Disk Usage) is directly proportional to Live SSTable count.
○ Compaction follows a strategy and uses a large amount of CPU
17. Cont:
● ThreadPool
○ Each of the thread pools provides statistics on the number of tasks that are active, pending, and
completed.
○ Metrics:
■PendingTasks
■CompletedTasks
○ Trends on these pools for increases in the pending task column indicate when to add additional capacity.
18. ● MemTable flush count
○ Configuring memtable thresholds can improve write performance by creating SSTables.
○ it executes when the commit log space threshold or the memtable cleanup threshold has been exceeded.
○ How you tune memtable thresholds depends on your data and write load. Increase memtable thresholds
under either of these conditions:
■The write load includes a high volume of updates on a smaller set of data.
■A steady stream of continuous writes occurs.
Cont:
19. Operations based: Metrics
● Compactions
○ Metrics
■PendingTasks
■CompletedTasks
○ Monitoring compaction performance is an important aspect of knowing when to add capacity to your
cluster.
○ Compactions affects both paths READ and WRITE, indirectly or directly.
20. ● Eventual Consistency
tells us how consistent is our data replication across nodes.
○ HintedHandoff Metrics
keeps track of data which has been replicated to nodes yet.
■Hints_created: number of hinted key space created.)
○ Every data that is stored in cassandra would be replicated across the cluster based on its replication
factor.
○ When coordinator node couldn’t write to replicas node, it would store the data on it self, then re-attempt to
write to replica node if the replica node available.
Cont:
21. Resources based - Metrics
● JVM based resources
○ Metrics:
■JVM_gc_collection_seconds_count
■JVM_heap_memory_bytes_used
■JVM_memory_pool_bytes_used