Discussion about the evolution of metrics in Cassandra from 1.0 to 3.0, how the metric changes impact operational tooling, pros and cons for different metric representations, and how and why DataStax OpsCenter collects and stores metrics. Includes a deep dive on how DataStax OpsCenter represents and stores the different kinds of metrics to provide visibility beyond simple cluster averages both behind the scenes and in the rendering.
About the Speaker
Chris Lohfink Software Engineer, DataStax
I am a Java, Python, and Clojure developer who has been using Cassandra in an application development and operational context for the last five years. The last nearly two years I have been working with the OpsCenter Monitoring team at DataStax to improve the accuracy and breadth of the visualization tooling available.
43. Metrics Reservoirs
• Random sampling, what can it miss?
– Min
– Max
– Everything in 99th percentile?
– The more rare, the less likely to be included
43
44. Metrics Reservoirs
• “Good enough” for basic adhoc viewing but too non-deterministic for many
• Commonly resolved using replacement reservoirs (i.e. HdrHistogram)
44
45. Metrics Reservoirs
• “Good enough” for basic adhoc viewing but too non-deterministic for many
• Commonly resolved using replacement reservoirs (i.e. HdrHistogram)
– org.apache.cassandra.metrics.EstimatedHistogramReservoir
45
46. Cassandra 2.2
• CASSANDRA-5657 – upgrade metrics library (and extend it)
– Replaced reservoir with EH
• Also exposed raw bin counts in values operation
– Deleted deprecated metrics
• Non EH latencies from LatencyTracker
46
47. Cassandra 2.2
• No recency in histograms
• Requires delta’ing on the total bin counts currently which is beyond
some simple tooling
• CASSANDRA-11752 (fixed 2.2.8, 3.0.9, 3.8)
47
49. Storing the data
• We have data, now to store it. Approaches tend to follow:
– Store all data points
• Provide aggregations either pre-computed as entered, MR, or on query
– Round Robin Database
• Only store pre-computed aggregations
• Choice depends heavily on requirements
49
50. Round Robin Database
• Store state required to generate the aggregations, and only store the
aggregations
– Sum & Count for Average
– Current min, max
– “One pass” or “online” algorithms
• Constant footprint
50
51. Round Robin Database
• Store state required to generate the aggregations, and only store the aggregations
– Sum & Count for Average
– Current min, max
– “One pass” or “online” algorithms
• Constant footprint
51
60 300 3600
Sum 0 0 0
Count 0 0 0
Min 0 0 0
Max 0 0 0
52. Round Robin Database
> 10ms @ 00:00
52
60 300 3600
Sum 10 10 10
Count 1 1 1
Min 10 10 10
Max 10 10 10
53. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
53
60 300 3600
Sum 22 22 22
Count 2 2 2
Min 10 10 10
Max 12 12 12
54. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
> 14ms @ 00:59
54
60 300 3600
Sum 36 36 36
Count 3 3 3
Min 10 10 10
Max 14 14 14
55. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
> 14ms @ 00:59
> 13ms @ 01:10
55
60 300 3600
Sum 36 36 36
Count 3 3 3
Min 10 10 10
Max 14 14 14
56. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
> 14ms @ 00:59
> 13ms @ 01:10
56
60 300 3600
Sum 36 36 36
Count 3 3 3
Min 10 10 10
Max 14 14 14
Average 12
Min 10
Max 14
57. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
> 14ms @ 00:59
> 13ms @ 01:10
57
60 300 3600
Sum 0 36 36
Count 0 3 3
Min 0 10 10
Max 0 14 14
58. Round Robin Database
> 10ms @ 00:00
> 12ms @ 00:30
> 14ms @ 00:59
> 13ms @ 01:10
58
60 300 3600
Sum 13 49 49
Count 1 4 4
Min 13 10 10
Max 13 14 14
59. Max is a lie
• The issue with the deprecated LatencyTracker metrics is that the 1 minute interval
does not have a min/max. So we cannot compute true min/max
the rollups min/max will be the minimum and maximum average
59
60. Histograms to the rescue (again)
• The histograms of the data does not have this issue. But storage is
more complex. Some options include:
– Store each bin of the histogram as a metric
– Store the percentiles/min/max each as own metric
– Store raw long[90] (possibly compressed)
60
61. Histogram Storage Size
• Some things to note:
– “Normal” clusters have over 100 tables.
– Each table has at least two histograms we want to record
• Read latency
• Write latency
• Tombstones scanned
• Cells scanned
• Partition cell size
• Partition cell count
61
62. Histogram Storage
Because we store the extra histograms we have a 600 (minimum) with upper
bounds seen to be over 24,000 histograms per minute.
• Storing 1 per bin means [54000] metrics (expensive to store, expensive to
read)
• Storing raw histograms is [600] metrics
• Storing min, max, 50th, 90th, 99th is [3000] metrics
– Additional problems with this
• Cant compute 10th, 95th, 99.99th etc
• Aggregations
62
66. Histogram Storage
Because we store the extra histograms we have a 600 (minimum) with upper
bounds seen to be over 24,000 histograms per minute.
• Storing 1 per bin means [54000] metrics (expensive to store, expensive to
read)
• Storing raw histograms is [600] metrics
• Storing min, max, 50th, 90th, 99th is [3000] metrics
– Additional problems with this
• Cant compute 10th, 95th, 99.99th etc
• Aggregations
66
67. Raw Histogram storage
• Storing raw histograms 160 (default) longs is a minimum of 1.2kb
bytes per rollup and hard sell
– 760kb per minute (600 tables)
– 7.7gb for the 7 day TTL we want to keep our 1 min rollups at
– ~77gb with 10 nodes
– ~2.3 Tb on 10 node clusters with 3k tables
– Expired data isn’t immediately purged so disk space can be much worse
67
68. Raw Histogram storage
• Goal: We wanted this to be comparable to other min/max/avg metric
storage (12 bytes each)
– 700mb on expected 10 node cluster
– 2gb on extreme 10 node cluster
• Enter compression
68
69. Compressing Histograms
• Overhead of typical compression makes it a non-starter.
– headers (ie 10 bytes for gzip) alone nearly exceeds the length used by
existing rollup storage (~12 bytes per metric)
• Instead we opt to leverage known context to reduce the size of the
data along with some universal encoding.
69
70. Compressing Histograms
• Instead of storing every bin, only store the value of each bin with a value > 0
since most bin will have no data (ie, very unlikely for a read histogram to be
between 1-10 microseconds which is first 10 bins)
• Write the count of offset/count pairs
• Use varint for the bin count
– To reduce the value of the varint as much as possible we sort the offset/count
pairs by the count and represent it as a delta sequence
70
82. Note on HdrHistogram
• Comes up every couple months
• Very awesome histogram, popular replacement for Metrics reservoir.
– More powerful and general purpose than EH
– Only slightly slower for all it offers
A issue comes up a bit with storage:
• Logged HdrHistograms are ~31kb each (30,000x more than our average use)
• Compressed version: 1kb each
• Perfect for many many people when tracking 1 or two metrics. Gets painful when
tracking hundreds or thousands
82