SlideShare a Scribd company logo
1 of 41
Download to read offline
Apache Cassandra:
diagnostics and monitoring
Alex Thompson
Solution Architect - APAC
DataStax Australia
Intro
This presentation is intended as a field guide for users of Apache Cassandra.
This guide specifically covers an explanation diagnostics tools and monitoring
tools and methods used in conjunction with C*, it is written in a pragmatic order
with the most important tools first.
Diagnostics
>nodetool tpstats
Probably the most important “at a
glance” summary of the health of a
node and the first diagnostics
command to run.
>nodetool tpstats is better described
as “nodetool thread statistics”; it gives
us a real-time measure of each thread
in C* and its current workload.
Note: if you restart a C* instance these statistics
are cleared to zero, so you have to run it on a node
that has been up for a while to be able to diagnose
workload.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
First thing to check is Pending work on
threads, this node is showing
compactions getting behind, this may
be OK but is usually an indication with
other diagnostics of an overloaded
node.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
Next up is to check All time blocked: in this
case Native-Transport-Requests which are
calls to the binary CQL port (reads or
writes) that have not been completed
due to overload. Also note the high
Completed This node is servicing a lot of
requests.
In combination with Pending mentioned
in the prior slide this is starting to look
like an overloaded node, but let’s dig
deeper...
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
OK, now the nasty part, Dropped
messages.
These are messages of various types
that the node has received that is has
not been able to process due to
overload, to save itself from going
down C* has gone into “emergency
mode” and shed the messages, we
should never see any dropped
messages. Period.
Lets go thru these messages one by
one….
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
So that’s 4x READ messages that were
dropped, they were CQL SELECT
statements that C* could not process
due to overload of this node
Other nodes with replicas would have
stepped in to satisfy the query*.
*As long as the driver was correctly configured
and the correct consistency level was applied
to the CQL SELECT statement.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
5095x TRACE messages have been
dropped.
This is a problem. Someone has either:
1) turned TRACE on on the server
using: >nodetool settraceprobablity 1
2) more worryingly has checked in CQL
code in at the application tier with
TRACE ON.
TRACE puts an enormous weight on a
node and should never be on in
production!
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
With TRACE on on this node, all bets
are off, this could be the sole cause of
this node’s problems, TRACE is such a
heavy hitting process that it can retard
a node if activated on a production
node or retard an entire cluster if
activated on all nodes.
To turn it off run on all nodes:
>nodetool settraceprobability 0
If it’s in checked in CQL code you need
to audit all app tier code to identify the
offending statement/s.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
TRACE on on a production node earns
my dill award.
>nodetool tpstats
180x MUTATION message drops,
MUTATIONS are writes, the server has
not had the headroom to perform
these writes.
REQUEST_RESPONSE drops are self
explanatory.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool tpstats
What to look for:
On a typical node you should not really
see Thread Pools going into Pending
state.
Under 10 in Pending for
CompactionExecutor can be OK, but
when you get into larger numbers it
usually indicates a problem.
As for dropped messages you should
not see any, it means there is a real
issue in peak workloads that needs to
be addressed.
Pool Name Active Pending Completed Blocked All time blocked
MutationStage 0 0 25159974 0 0
ViewMutationStage 0 0 0 0 0
ReadStage 0 0 3231222 0 0
RequestResponseStage 0 0 36609517 0 0
ReadRepairStage 0 0 410293 0 0
CounterMutationStage 0 0 0 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 8 108 287003 0 0
MemtableReclaimMemory 0 0 444 0 0
PendingRangeCalculator 0 0 27 0 0
GossipStage 0 0 464348 0 0
SecondaryIndexManagement 0 0 13 0 0
HintsDispatcher 0 0 396 0 0
MigrationStage 0 0 25 0 0
MemtablePostFlush 0 0 1114 0 0
ValidationExecutor 0 0 321 0 0
Sampler 0 0 0 0 0
MemtableFlushWriter 0 0 444 0 0
InternalResponseStage 0 0 68544 0 0
AntiEntropyStage 0 0 1209 0 0
CacheCleanupExecutor 0 0 0 0 0
Native-Transport-Requests 0 0 35849149 0 536
Message type Dropped
READ 4
RANGE_SLICE 0
_TRACE 5095
HINT 0
MUTATION 180
COUNTER_MUTATION 0
BATCH_STORE 0
BATCH_REMOVE 0
REQUEST_RESPONSE 23
PAGED_RANGE 0
READ_REPAIR 0
>nodetool netstats
Aside from >nodetool tpstats,
>nodetool netstats is your second
go-to diagnostic that gives a good
view on how healthy a node is.
The first thing to check is “Read Repair
Statistics”, these indicate
inconsistencies in data found on this
node when compared to other nodes
when a query executes, they usually
indicate again that the node or cluster
is under stress and may not be
properly provisioned for the workload
it is expected to do.
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 408271
Mismatch (Blocking): 78
Mismatch (Background): 602
Pool Name Active Pending Completed Dropped
Large messages n/a 0 12252 913
Small messages n/a 0 63614651 0
Gossip messages n/a 0 480331 0
>nodetool netstats
The specific counts we are interested
in are the Mismatch values.
You can see here that compared to the
number of read repairs attempted
(408271) we have some minor repairs
occurring: 78/602
These are minor numbers but do
indicate at times that this node is
under stress.
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 408271
Mismatch (Blocking): 78
Mismatch (Background): 602
Pool Name Active Pending Completed Dropped
Large messages n/a 0 12252 913
Small messages n/a 0 63614651 0
Gossip messages n/a 0 480331 0
>nodetool netstats
This is more worrying though and quite
unusual. The amount of dropped large
messages indicates to me that
someone is doing something silly here
and either attempting to perform
overly large writes or query for overly
large SELECTs.
As soon as I saw this, I would start
asking questions as to where these
messages are coming from and put a
stop to the misuse.
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 408271
Mismatch (Blocking): 78
Mismatch (Background): 602
Pool Name Active Pending Completed Dropped
Large messages n/a 0 12252 913
Small messages n/a 0 63614651 0
Gossip messages n/a 0 480331 0
>nodetool netstats
What to look for:
Large Mismatch values indicate a
node that in the past has been under
severe stress and incapable of keeping
up with write workloads.
Dropped Large Messages probably
means that someone is performing
ridiculous queries or writes against
your system, find them and terminate
them with extreme prejudice.
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 408271
Mismatch (Blocking): 78
Mismatch (Background): 602
Pool Name Active Pending Completed Dropped
Large messages n/a 0 12252 913
Small messages n/a 0 63614651 0
Gossip messages n/a 0 480331 0
>nodetool cfstats
Rounding out the top 3 diagnostics
commands is >nodetool cfstats, or
more verbosely: nodetool
columnfamily statistics.
It’s a large file detailing statistics on
each table in your cluster, for brevity's
sake let’s take a look at one table’s
output from cfstats….
Table: rollups60
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Rounding out the top 3 diagnostics
commands is >nodetool cfstats, or
more verbosely: nodetool
columnfamily statistics.
It’s a large file detailing statistics on
each table in your cluster, for brevity's
sake let’s take a look at one table’s
output from cfstats.
There is a lot of useful information
here, but at a glance there are a couple
of key metrics...
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
SStable count.
The amount of sstables that make up
this table on this node, this should be
in the 10’s to possibly 100’s, if you see
it higher than that it usually means
there are problems with compaction
on the node, problems with
compaction are usually caused by too
many writes for the underlying I/O
capability of the node.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Number of keys (estimate).
This is the number of partition keys for
this table on this node, if this node
table has large amounts of data on
this node and the key count is very low
it usually means there may be a data
modelling issue...more on this later.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Local read count, Local write count.
Interesting on their own, but more
interesting when viewed together, you
can see there are a lot more writes
than reads on this cluster, that is the
workload is very heavily write oriented.
In fact running a calculation there are
85 writes for every read! One caveat
here is that we do not know 1) how
long the node has been up and 2)
whether their traffic peaks during the
day, so we may have missed read
traffic which would alter the ratio.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Local read latency, Local write latency.
You can see that their latencies are
quite good, writes are faster than
reads in C* which is what we would
expect and with reads under 1ms this
is a good result.
If you start to see large read latencies
you need to investigate if there are
large queries running or potential I/O
issues on the node at hardware level.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Compacted partition maximum bytes.
This is the amount of data under an
individual partition key on on this node,
in this case the largest found is 2.8mb
which is good.
You really want to keep this number
under 100mb, some say 1gb but you
would really need to know what you’re
doing if you go to 1gb.
If you see large values under here that
are over a couple of 100mb then you
may have a data modelling issue.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Compacted partition mean bytes.
This is the average amount of data
under all partition keys on on this
node.
You really want to keep this number
under 100mb.
If you see large values under here you
know you have a data modelling issue.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Average live cells per slice.
This is a measure of the amount of
data you are pulling back for the
average query (SELECT).
Pulling 10’s or 100’s of cells (values) is
fine, in fact pulling back 1000’s of cells
on average is fine if that’s what you
intended to do, but if it’s not what you
intended your solution to do then you
might want to look at who is doing lazy
SELECT * queries on your cluster!
Be aware that larger queries are going
to increase read latency significantly
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Maximum live cells per slice.
Self explanatory, the largest query
seen in the last 5 minutes.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Average tombstones per slice.
Tombstones are not returned in
queries but they have to be read off
disk and filtered thru the JVM so they
can add significant relative overhead
to a query.
If you are pulling back 1x live cell and
100 tombstones in a query its going to
impact your performance.
Tombstones are the result of deletes
and deletes need to be very carefully
managed and modelled in C*.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
>nodetool cfstats
Maximum tombstones per slice.
Self explanatory, the largest amount of
tombstones seen in a query in the last
5 minutes.
Table: mytablename
SSTable count: 10
Space used (live): 1757632985
Space used (total): 1757632985
Space used by snapshots (total): 0
Off heap memory used (total): 520044
SSTable Compression Ratio: 0.5405234880604174
Number of keys (estimate): 14317
Memtable cell count: 1251073
Memtable data size: 57091879
Memtable off heap memory used: 0
Memtable switch count: 2
Local read count: 211506
Local read latency: 0.923 ms
Local write count: 18096351
Local write latency: 0.028 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used: 89280
Bloom filter off heap memory used: 89200
Index summary off heap memory used: 38420
Compression metadata off heap memory used: 392424
Compacted partition minimum bytes: 5723
Compacted partition maximum bytes: 2816159
Compacted partition mean bytes: 47670
Average live cells per slice (last five minutes): 2.7963433445814063
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 1.0
Maximum tombstones per slice (last five minutes): 1
Summary so far...
That rounds out the top 3 diagnostic nodetool commands in Apache Cassandra:
● nodetool tpstats
● nodetool netstats
● nodetool cfstats
With those 3 commands you can get a very good grasp of the health of a node and possible issues, if you then see a
pattern cluster wide you know you have a general issue (usually workload), if however you only see poor health on a
single node it’s probably* time to start looking at hardware as the culprit.
*I say probably because there are circumstances where a hot partition on a single node can get hammered with requests, the times i have seen
this is where someone has accidentally turned a tool against C* that focuses on a single partition (thanks security guy).
Security guy:
>system.log
On package installs lives in:
/var/log/cassandra
What to look for:
● Exceptions
● GC events
● Other nodes going UP and DOWN in gossip
● Dropped messages
● WARNs on large partitions / wide rows
● Tombstone warnings
● Repair session failures
● Compactions with large amounts of sstables in them
● Startup problems and warnings
● Topology warnings
Monitoring Automation
JMX
Cassandra exposes its metics via
MBeans, here you see Jconsole
connected to a Cassandra node listing
all the MBeans available for
interrogation.
These JMX MBeans can be
instrumented in Java and Python
interfaces plus some commercial
products.
DataStax uses these same MBeans to
instrument OpsCenter.
JMX
Cassandra exposes its metics via
MBeans, here you see Jconsole
connected to a Cassandra node listing
all the MBeans available for
interrogation.
These JMX MBeans can be
instrumented in Java and Python
interfaces plus some commercial
products.
A list of alternatives to Jconsole is here: JMX Clients with Apache Cassandra
JMX
Invoking an MBean in Java
This is sample code for a simple
method call against an MBean with no
return value, you would need to return
data in a useful application and
present the result to a screen or store
the result for analysis.
This code was stripped from the following link for
educational and training purposes and all copyright
belongs to their respective owners:
http://stackoverflow.com/questions/16583859/execute-a
-method-with-jmx-without-jconsole
import javax.management.*;
import javax.management.remote.*;
import com.sun.messaging.AdminConnectionFactory;
import com.sun.messaging.jms.management.server.*;
public class InvokeOp
{
public static void main (String[] args){
try{
// Create administration connection factory
AdminConnectionFactory acf = new AdminConnectionFactory();
// Get JMX connector, supplying user name and password
JMXConnector jmxc = acf.createConnection("AliBaba", "sesame");
// Get MBean server connection
MBeanServerConnection mbsc = jmxc.getMBeanServerConnection();
// Create object name
ObjectName serviceConfigName = MQObjectName.createServiceConfig("jms");
// Invoke operation
mbsc.invoke(serviceConfigName, ServiceOperations.PAUSE, null, null);
// Close JMX connector
jmxc.close();
}
catch (Exception e){
System.out.println( "Exception occurred: " + e.toString() );
e.printStackTrace();
}
}
}
JMX
Invoking an MBean in jython, Python
running on the Java JVM.
This code was stripped from the following link for
educational and training purposes and all copyright
belongs to their respective owners:
https://egkatzioura.com/2014/09/22/connecting-to-jmx-t
hrough-jython/59/execute-a-method-with-jmx-without-jcon
sole
from javax.management.remote import JMXConnector
from javax.management.remote import JMXConnectorFactory
from javax.management.remote import JMXServiceURL
from javax.management import MBeanServerConnection
from javax.management import MBeanInfo
from javax.management import ObjectName
from java.lang import String
from jarray import array
import sys
if __name__=='__main__':
if len(sys.argv)> 5:
serverUrl = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
beanName = sys.argv[4]
action = sys.argv[5]
else:
sys.exit(-1)
credentials = array([username,password],String)
environment = {JMXConnector.CREDENTIALS:credentials}
jmxServiceUrl = JMXServiceURL('service:jmx:rmi:///jndi/rmi://'+serverUrl+':9999/jmxrmi');
jmxConnector = JMXConnectorFactory.connect(jmxServiceUrl,environment);
mBeanServerConnection = jmxConnector.getMBeanServerConnection()
objectName = ObjectName(beanName);
mBeanServerConnection.invoke(objectName,action,None,None)
jmxConnector.close()
JMX
Invoking an MBean in C Python using
Jokolia, a JMX library for python
https://jolokia.org/
This approach is a little more complex
as agents need to be installed on
nodes.
There are some other Python JMX libraries but I have not
used them so cannot vouch for them.
This code was stripped from the following link for
educational and training purposes and all copyright
belongs to their respective owners:
https://jolokia.org/tutorial.html
import org.jolokia.client.*;
import org.jolokia.client.request.*;
import java.util.Map;
public class JolokiaDemo {
public static void main(String[] args) throws Exception {
J4pClient j4pClient = new J4pClient("http://localhost:8080/jolokia");
J4pReadRequest req = new J4pReadRequest("java.lang:type=Memory",
"HeapMemoryUsage");
J4pReadResponse resp = j4pClient.execute(req);
Map<String,Long> vals = resp.getValue();
long used = vals.get("used");
long max = vals.get("max");
int usage = (int) (used * 100 / max);
System.out.println("Memory usage: used: " + used +
" / max: " + max + " = " + usage + "%");
}
}
JMX + Node.js
jmx npm
https://www.npmjs.com/package/jmx
Can’t vouch for this one, but node.js is a great way to serve
javascript directly into a GUI, the meteor project is also an
excellent pub/sub/push system built on node.js that would
make a great C* Operats GUI.
https://www.meteor.com/
This code was stripped from the following link for
educational and training purposes and all copyright
belongs to their respective owners:
https://www.npmjs.com/package/jmx
var jmx = require("jmx");
client = jmx.createClient({
host: "localhost", // optional
port: 3000
});
client.connect();
client.on("connect", function() {
client.getAttribute("java.lang:type=Memory", "HeapMemoryUsage", function(data) {
var used = data.getSync('used');
console.log("HeapMemoryUsage used: " + used.longValue);
// console.log(data.toString());
});
client.setAttribute("java.lang:type=Memory", "Verbose", true, function() {
console.log("Memory verbose on"); // callback is optional
});
client.invoke("java.lang:type=Memory", "gc", [], function(data) {
console.log("gc() done");
});
});
JMX + Node.js
jokolia npm
https://www.npmjs.com/package/jolokia
Can’t vouch for this one, but node.js is a great way to serve
javascript directly into a GUI, the meteor project is also an
excellent pub/sub/push system built on node.js that would
make a great C* Ops GUI.
https://www.meteor.com/
This code was stripped from the following link for
educational and training purposes and all copyright
belongs to their respective owners:
https://www.npmjs.com/package/jolokia
// In Node.js or using Browserify
var Jolokia = require('jolokia');
// In browser
var Jolokia = window.Jolokia;
// Or using RequireJs
require(['./path/to/jolokia'], function(Jolokia) {
// code below
});
var jolokia = new Jolokia({
url: '/jmx', // use full url when in Node.js environment
method: 'post', // force specific HTTP method
});
jolokia.list().then(function(value) {
// do something with list of JMX domains
}, function(error) {
// handle error
});
Thanks!
Contact us:
DataStax
Sydney, Australia
alex.thompson@datastax.com
www.datastax.com

More Related Content

What's hot

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymenthyeongchae lee
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesBruno Borges
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streamingBalaji Mohanam
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggleconfluent
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisHostedbyConfluent
 
Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelOlivier Bonaventure
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...confluent
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyAlexander Kukushkin
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and KafkaN Masahiro
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 

What's hot (20)

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Secrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on KubernetesSecrets of Performance Tuning Java on Kubernetes
Secrets of Performance Tuning Java on Kubernetes
 
Rocks db state store in structured streaming
Rocks db state store in structured streamingRocks db state store in structured streaming
Rocks db state store in structured streaming
 
Espresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom QuiggleEspresso Database Replication with Kafka, Tom Quiggle
Espresso Database Replication with Kafka, Tom Quiggle
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernel
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Fluentd and Kafka
Fluentd and KafkaFluentd and Kafka
Fluentd and Kafka
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 

Similar to Apache Cassandra - Diagnostics and monitoring

Cassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraCassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraDataStax Academy
 
Performance tuning
Performance tuningPerformance tuning
Performance tuningJon Haddad
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringNETWAYS
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionMasahito Zembutsu
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?Enkitec
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestPythian
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringNETWAYS
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerNETWAYS
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterAlexey Lesovsky
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPGConf APAC
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity IgniteArtur Bergman
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxyBo-Yi Wu
 
Generic Framework for Knowledge Classification-1
Generic Framework  for Knowledge Classification-1Generic Framework  for Knowledge Classification-1
Generic Framework for Knowledge Classification-1Venkata Vineel
 
Cassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewCassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewJoshua McKenzie
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentationwebhostingguy
 

Similar to Apache Cassandra - Diagnostics and monitoring (20)

Cassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache CassandraCassandra Day SV 2014: Basic Operations with Apache Cassandra
Cassandra Day SV 2014: Basic Operations with Apache Cassandra
 
Performance tuning
Performance tuningPerformance tuning
Performance tuning
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and MonitoringOSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
 
NodeJs
NodeJsNodeJs
NodeJs
 
AWR Sample Report
AWR Sample ReportAWR Sample Report
AWR Sample Report
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distribution
 
How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?How Many Ways Can I Manage Oracle GoldenGate?
How Many Ways Can I Manage Oracle GoldenGate?
 
Maximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digestMaximizing SQL Reviews and Tuning with pt-query-digest
Maximizing SQL Reviews and Tuning with pt-query-digest
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoringOSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner FischerOSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability Improvements
 
Varnish @ Velocity Ignite
Varnish @ Velocity IgniteVarnish @ Velocity Ignite
Varnish @ Velocity Ignite
 
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
2014 OSDC Talk: Introduction to Percona XtraDB Cluster and HAProxy
 
Generic Framework for Knowledge Classification-1
Generic Framework  for Knowledge Classification-1Generic Framework  for Knowledge Classification-1
Generic Framework for Knowledge Classification-1
 
Cassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewCassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, Overview
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 

More from Alex Thompson

The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveAlex Thompson
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache CassandraAlex Thompson
 
Apache Cassandra - Data modelling
Apache Cassandra - Data modellingApache Cassandra - Data modelling
Apache Cassandra - Data modellingAlex Thompson
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche SparkAlex Thompson
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleAlex Thompson
 

More from Alex Thompson (6)

The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep dive
 
Deconstructing Apache Cassandra
Deconstructing Apache CassandraDeconstructing Apache Cassandra
Deconstructing Apache Cassandra
 
Apache Cassandra - Data modelling
Apache Cassandra - Data modellingApache Cassandra - Data modelling
Apache Cassandra - Data modelling
 
Apache Cassandra and Apche Spark
Apache Cassandra and Apche SparkApache Cassandra and Apche Spark
Apache Cassandra and Apche Spark
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scale
 

Recently uploaded

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 

Recently uploaded (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Apache Cassandra - Diagnostics and monitoring

  • 1. Apache Cassandra: diagnostics and monitoring Alex Thompson Solution Architect - APAC DataStax Australia
  • 2. Intro This presentation is intended as a field guide for users of Apache Cassandra. This guide specifically covers an explanation diagnostics tools and monitoring tools and methods used in conjunction with C*, it is written in a pragmatic order with the most important tools first.
  • 4. >nodetool tpstats Probably the most important “at a glance” summary of the health of a node and the first diagnostics command to run. >nodetool tpstats is better described as “nodetool thread statistics”; it gives us a real-time measure of each thread in C* and its current workload. Note: if you restart a C* instance these statistics are cleared to zero, so you have to run it on a node that has been up for a while to be able to diagnose workload. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 5. >nodetool tpstats First thing to check is Pending work on threads, this node is showing compactions getting behind, this may be OK but is usually an indication with other diagnostics of an overloaded node. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 6. >nodetool tpstats Next up is to check All time blocked: in this case Native-Transport-Requests which are calls to the binary CQL port (reads or writes) that have not been completed due to overload. Also note the high Completed This node is servicing a lot of requests. In combination with Pending mentioned in the prior slide this is starting to look like an overloaded node, but let’s dig deeper... Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 7. >nodetool tpstats OK, now the nasty part, Dropped messages. These are messages of various types that the node has received that is has not been able to process due to overload, to save itself from going down C* has gone into “emergency mode” and shed the messages, we should never see any dropped messages. Period. Lets go thru these messages one by one…. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 8. >nodetool tpstats So that’s 4x READ messages that were dropped, they were CQL SELECT statements that C* could not process due to overload of this node Other nodes with replicas would have stepped in to satisfy the query*. *As long as the driver was correctly configured and the correct consistency level was applied to the CQL SELECT statement. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 9. >nodetool tpstats 5095x TRACE messages have been dropped. This is a problem. Someone has either: 1) turned TRACE on on the server using: >nodetool settraceprobablity 1 2) more worryingly has checked in CQL code in at the application tier with TRACE ON. TRACE puts an enormous weight on a node and should never be on in production! Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 10. >nodetool tpstats With TRACE on on this node, all bets are off, this could be the sole cause of this node’s problems, TRACE is such a heavy hitting process that it can retard a node if activated on a production node or retard an entire cluster if activated on all nodes. To turn it off run on all nodes: >nodetool settraceprobability 0 If it’s in checked in CQL code you need to audit all app tier code to identify the offending statement/s. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 11. >nodetool tpstats TRACE on on a production node earns my dill award.
  • 12. >nodetool tpstats 180x MUTATION message drops, MUTATIONS are writes, the server has not had the headroom to perform these writes. REQUEST_RESPONSE drops are self explanatory. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 13. >nodetool tpstats What to look for: On a typical node you should not really see Thread Pools going into Pending state. Under 10 in Pending for CompactionExecutor can be OK, but when you get into larger numbers it usually indicates a problem. As for dropped messages you should not see any, it means there is a real issue in peak workloads that needs to be addressed. Pool Name Active Pending Completed Blocked All time blocked MutationStage 0 0 25159974 0 0 ViewMutationStage 0 0 0 0 0 ReadStage 0 0 3231222 0 0 RequestResponseStage 0 0 36609517 0 0 ReadRepairStage 0 0 410293 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 CompactionExecutor 8 108 287003 0 0 MemtableReclaimMemory 0 0 444 0 0 PendingRangeCalculator 0 0 27 0 0 GossipStage 0 0 464348 0 0 SecondaryIndexManagement 0 0 13 0 0 HintsDispatcher 0 0 396 0 0 MigrationStage 0 0 25 0 0 MemtablePostFlush 0 0 1114 0 0 ValidationExecutor 0 0 321 0 0 Sampler 0 0 0 0 0 MemtableFlushWriter 0 0 444 0 0 InternalResponseStage 0 0 68544 0 0 AntiEntropyStage 0 0 1209 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 35849149 0 536 Message type Dropped READ 4 RANGE_SLICE 0 _TRACE 5095 HINT 0 MUTATION 180 COUNTER_MUTATION 0 BATCH_STORE 0 BATCH_REMOVE 0 REQUEST_RESPONSE 23 PAGED_RANGE 0 READ_REPAIR 0
  • 14. >nodetool netstats Aside from >nodetool tpstats, >nodetool netstats is your second go-to diagnostic that gives a good view on how healthy a node is. The first thing to check is “Read Repair Statistics”, these indicate inconsistencies in data found on this node when compared to other nodes when a query executes, they usually indicate again that the node or cluster is under stress and may not be properly provisioned for the workload it is expected to do. Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 408271 Mismatch (Blocking): 78 Mismatch (Background): 602 Pool Name Active Pending Completed Dropped Large messages n/a 0 12252 913 Small messages n/a 0 63614651 0 Gossip messages n/a 0 480331 0
  • 15. >nodetool netstats The specific counts we are interested in are the Mismatch values. You can see here that compared to the number of read repairs attempted (408271) we have some minor repairs occurring: 78/602 These are minor numbers but do indicate at times that this node is under stress. Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 408271 Mismatch (Blocking): 78 Mismatch (Background): 602 Pool Name Active Pending Completed Dropped Large messages n/a 0 12252 913 Small messages n/a 0 63614651 0 Gossip messages n/a 0 480331 0
  • 16. >nodetool netstats This is more worrying though and quite unusual. The amount of dropped large messages indicates to me that someone is doing something silly here and either attempting to perform overly large writes or query for overly large SELECTs. As soon as I saw this, I would start asking questions as to where these messages are coming from and put a stop to the misuse. Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 408271 Mismatch (Blocking): 78 Mismatch (Background): 602 Pool Name Active Pending Completed Dropped Large messages n/a 0 12252 913 Small messages n/a 0 63614651 0 Gossip messages n/a 0 480331 0
  • 17. >nodetool netstats What to look for: Large Mismatch values indicate a node that in the past has been under severe stress and incapable of keeping up with write workloads. Dropped Large Messages probably means that someone is performing ridiculous queries or writes against your system, find them and terminate them with extreme prejudice. Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 408271 Mismatch (Blocking): 78 Mismatch (Background): 602 Pool Name Active Pending Completed Dropped Large messages n/a 0 12252 913 Small messages n/a 0 63614651 0 Gossip messages n/a 0 480331 0
  • 18. >nodetool cfstats Rounding out the top 3 diagnostics commands is >nodetool cfstats, or more verbosely: nodetool columnfamily statistics. It’s a large file detailing statistics on each table in your cluster, for brevity's sake let’s take a look at one table’s output from cfstats…. Table: rollups60 SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 19. >nodetool cfstats Rounding out the top 3 diagnostics commands is >nodetool cfstats, or more verbosely: nodetool columnfamily statistics. It’s a large file detailing statistics on each table in your cluster, for brevity's sake let’s take a look at one table’s output from cfstats. There is a lot of useful information here, but at a glance there are a couple of key metrics... Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 20. >nodetool cfstats SStable count. The amount of sstables that make up this table on this node, this should be in the 10’s to possibly 100’s, if you see it higher than that it usually means there are problems with compaction on the node, problems with compaction are usually caused by too many writes for the underlying I/O capability of the node. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 21. >nodetool cfstats Number of keys (estimate). This is the number of partition keys for this table on this node, if this node table has large amounts of data on this node and the key count is very low it usually means there may be a data modelling issue...more on this later. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 22. >nodetool cfstats Local read count, Local write count. Interesting on their own, but more interesting when viewed together, you can see there are a lot more writes than reads on this cluster, that is the workload is very heavily write oriented. In fact running a calculation there are 85 writes for every read! One caveat here is that we do not know 1) how long the node has been up and 2) whether their traffic peaks during the day, so we may have missed read traffic which would alter the ratio. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 23. >nodetool cfstats Local read latency, Local write latency. You can see that their latencies are quite good, writes are faster than reads in C* which is what we would expect and with reads under 1ms this is a good result. If you start to see large read latencies you need to investigate if there are large queries running or potential I/O issues on the node at hardware level. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 24. >nodetool cfstats Compacted partition maximum bytes. This is the amount of data under an individual partition key on on this node, in this case the largest found is 2.8mb which is good. You really want to keep this number under 100mb, some say 1gb but you would really need to know what you’re doing if you go to 1gb. If you see large values under here that are over a couple of 100mb then you may have a data modelling issue. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 25. >nodetool cfstats Compacted partition mean bytes. This is the average amount of data under all partition keys on on this node. You really want to keep this number under 100mb. If you see large values under here you know you have a data modelling issue. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 26. >nodetool cfstats Average live cells per slice. This is a measure of the amount of data you are pulling back for the average query (SELECT). Pulling 10’s or 100’s of cells (values) is fine, in fact pulling back 1000’s of cells on average is fine if that’s what you intended to do, but if it’s not what you intended your solution to do then you might want to look at who is doing lazy SELECT * queries on your cluster! Be aware that larger queries are going to increase read latency significantly Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 27. >nodetool cfstats Maximum live cells per slice. Self explanatory, the largest query seen in the last 5 minutes. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 28. >nodetool cfstats Average tombstones per slice. Tombstones are not returned in queries but they have to be read off disk and filtered thru the JVM so they can add significant relative overhead to a query. If you are pulling back 1x live cell and 100 tombstones in a query its going to impact your performance. Tombstones are the result of deletes and deletes need to be very carefully managed and modelled in C*. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 29. >nodetool cfstats Maximum tombstones per slice. Self explanatory, the largest amount of tombstones seen in a query in the last 5 minutes. Table: mytablename SSTable count: 10 Space used (live): 1757632985 Space used (total): 1757632985 Space used by snapshots (total): 0 Off heap memory used (total): 520044 SSTable Compression Ratio: 0.5405234880604174 Number of keys (estimate): 14317 Memtable cell count: 1251073 Memtable data size: 57091879 Memtable off heap memory used: 0 Memtable switch count: 2 Local read count: 211506 Local read latency: 0.923 ms Local write count: 18096351 Local write latency: 0.028 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used: 89280 Bloom filter off heap memory used: 89200 Index summary off heap memory used: 38420 Compression metadata off heap memory used: 392424 Compacted partition minimum bytes: 5723 Compacted partition maximum bytes: 2816159 Compacted partition mean bytes: 47670 Average live cells per slice (last five minutes): 2.7963433445814063 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 1.0 Maximum tombstones per slice (last five minutes): 1
  • 30. Summary so far... That rounds out the top 3 diagnostic nodetool commands in Apache Cassandra: ● nodetool tpstats ● nodetool netstats ● nodetool cfstats With those 3 commands you can get a very good grasp of the health of a node and possible issues, if you then see a pattern cluster wide you know you have a general issue (usually workload), if however you only see poor health on a single node it’s probably* time to start looking at hardware as the culprit. *I say probably because there are circumstances where a hot partition on a single node can get hammered with requests, the times i have seen this is where someone has accidentally turned a tool against C* that focuses on a single partition (thanks security guy).
  • 32. >system.log On package installs lives in: /var/log/cassandra What to look for: ● Exceptions ● GC events ● Other nodes going UP and DOWN in gossip ● Dropped messages ● WARNs on large partitions / wide rows ● Tombstone warnings ● Repair session failures ● Compactions with large amounts of sstables in them ● Startup problems and warnings ● Topology warnings
  • 34. JMX Cassandra exposes its metics via MBeans, here you see Jconsole connected to a Cassandra node listing all the MBeans available for interrogation. These JMX MBeans can be instrumented in Java and Python interfaces plus some commercial products. DataStax uses these same MBeans to instrument OpsCenter.
  • 35. JMX Cassandra exposes its metics via MBeans, here you see Jconsole connected to a Cassandra node listing all the MBeans available for interrogation. These JMX MBeans can be instrumented in Java and Python interfaces plus some commercial products. A list of alternatives to Jconsole is here: JMX Clients with Apache Cassandra
  • 36. JMX Invoking an MBean in Java This is sample code for a simple method call against an MBean with no return value, you would need to return data in a useful application and present the result to a screen or store the result for analysis. This code was stripped from the following link for educational and training purposes and all copyright belongs to their respective owners: http://stackoverflow.com/questions/16583859/execute-a -method-with-jmx-without-jconsole import javax.management.*; import javax.management.remote.*; import com.sun.messaging.AdminConnectionFactory; import com.sun.messaging.jms.management.server.*; public class InvokeOp { public static void main (String[] args){ try{ // Create administration connection factory AdminConnectionFactory acf = new AdminConnectionFactory(); // Get JMX connector, supplying user name and password JMXConnector jmxc = acf.createConnection("AliBaba", "sesame"); // Get MBean server connection MBeanServerConnection mbsc = jmxc.getMBeanServerConnection(); // Create object name ObjectName serviceConfigName = MQObjectName.createServiceConfig("jms"); // Invoke operation mbsc.invoke(serviceConfigName, ServiceOperations.PAUSE, null, null); // Close JMX connector jmxc.close(); } catch (Exception e){ System.out.println( "Exception occurred: " + e.toString() ); e.printStackTrace(); } } }
  • 37. JMX Invoking an MBean in jython, Python running on the Java JVM. This code was stripped from the following link for educational and training purposes and all copyright belongs to their respective owners: https://egkatzioura.com/2014/09/22/connecting-to-jmx-t hrough-jython/59/execute-a-method-with-jmx-without-jcon sole from javax.management.remote import JMXConnector from javax.management.remote import JMXConnectorFactory from javax.management.remote import JMXServiceURL from javax.management import MBeanServerConnection from javax.management import MBeanInfo from javax.management import ObjectName from java.lang import String from jarray import array import sys if __name__=='__main__': if len(sys.argv)> 5: serverUrl = sys.argv[1] username = sys.argv[2] password = sys.argv[3] beanName = sys.argv[4] action = sys.argv[5] else: sys.exit(-1) credentials = array([username,password],String) environment = {JMXConnector.CREDENTIALS:credentials} jmxServiceUrl = JMXServiceURL('service:jmx:rmi:///jndi/rmi://'+serverUrl+':9999/jmxrmi'); jmxConnector = JMXConnectorFactory.connect(jmxServiceUrl,environment); mBeanServerConnection = jmxConnector.getMBeanServerConnection() objectName = ObjectName(beanName); mBeanServerConnection.invoke(objectName,action,None,None) jmxConnector.close()
  • 38. JMX Invoking an MBean in C Python using Jokolia, a JMX library for python https://jolokia.org/ This approach is a little more complex as agents need to be installed on nodes. There are some other Python JMX libraries but I have not used them so cannot vouch for them. This code was stripped from the following link for educational and training purposes and all copyright belongs to their respective owners: https://jolokia.org/tutorial.html import org.jolokia.client.*; import org.jolokia.client.request.*; import java.util.Map; public class JolokiaDemo { public static void main(String[] args) throws Exception { J4pClient j4pClient = new J4pClient("http://localhost:8080/jolokia"); J4pReadRequest req = new J4pReadRequest("java.lang:type=Memory", "HeapMemoryUsage"); J4pReadResponse resp = j4pClient.execute(req); Map<String,Long> vals = resp.getValue(); long used = vals.get("used"); long max = vals.get("max"); int usage = (int) (used * 100 / max); System.out.println("Memory usage: used: " + used + " / max: " + max + " = " + usage + "%"); } }
  • 39. JMX + Node.js jmx npm https://www.npmjs.com/package/jmx Can’t vouch for this one, but node.js is a great way to serve javascript directly into a GUI, the meteor project is also an excellent pub/sub/push system built on node.js that would make a great C* Operats GUI. https://www.meteor.com/ This code was stripped from the following link for educational and training purposes and all copyright belongs to their respective owners: https://www.npmjs.com/package/jmx var jmx = require("jmx"); client = jmx.createClient({ host: "localhost", // optional port: 3000 }); client.connect(); client.on("connect", function() { client.getAttribute("java.lang:type=Memory", "HeapMemoryUsage", function(data) { var used = data.getSync('used'); console.log("HeapMemoryUsage used: " + used.longValue); // console.log(data.toString()); }); client.setAttribute("java.lang:type=Memory", "Verbose", true, function() { console.log("Memory verbose on"); // callback is optional }); client.invoke("java.lang:type=Memory", "gc", [], function(data) { console.log("gc() done"); }); });
  • 40. JMX + Node.js jokolia npm https://www.npmjs.com/package/jolokia Can’t vouch for this one, but node.js is a great way to serve javascript directly into a GUI, the meteor project is also an excellent pub/sub/push system built on node.js that would make a great C* Ops GUI. https://www.meteor.com/ This code was stripped from the following link for educational and training purposes and all copyright belongs to their respective owners: https://www.npmjs.com/package/jolokia // In Node.js or using Browserify var Jolokia = require('jolokia'); // In browser var Jolokia = window.Jolokia; // Or using RequireJs require(['./path/to/jolokia'], function(Jolokia) { // code below }); var jolokia = new Jolokia({ url: '/jmx', // use full url when in Node.js environment method: 'post', // force specific HTTP method }); jolokia.list().then(function(value) { // do something with list of JMX domains }, function(error) { // handle error });