- Repair is a maintenance operation that restores consistency in Cassandra by comparing and syncing data across nodes. It is needed due to eventual consistency and to ensure safe deletes.
- Traditional full repair reads and compares all data partitions, while incremental repair only repairs data that has changed since the last repair.
- Automated repair tools like Spotify's Cassandra Reaper help orchestrate repairs across large clusters to limit their impact on performance and availability. Future improvements may further reduce the need to manually manage repairs.
2. CASSANDRA SUMMIT - SEPTEMBER 2016
Alexander Dejanovski
@alexanderdeja
Consultant
www.thelastpickle.com
Datastax MVP for Apache Cassandra
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
3. AboutThe Last Pickle
We help people deliver and improve Apache
Cassandra based solutions.
With staff in 5 countries and over 50 years
combined experience in Apache Cassandra.
4.
5.
6. What and why ?
Full repair
Incremental repair
How to make it work
Automated repairs
www.thelastpickle.com
7. What is repair ?
A maintenance operation that (briefly)
restores strong consistency throughout the
cluster
www.thelastpickle.com
8. Why do we need repair ?
- Eventual consistency
- Downtime / failure
recovery
- Safe deletes
www.thelastpickle.com
9. Tombstones need repair too
Missing tombstones can lead to zombie data
(repair within gc_grace_seconds)
www.thelastpickle.com
10. What and why ?
Full repair
Incremental repair
How to make it work
Automated repairs
www.thelastpickle.com
33. The early days of your cluster
Node density is low,
repair works just fine
however you run it.
www.thelastpickle.com
34. The early days of your cluster
So maybe like I did,
you run « nodetool repair »
on all nodes… at the same
time
www.thelastpickle.com
35. The (not so) early days of your cluster
As nodes gets higher
in density, repair takes
longer… and longer…
www.thelastpickle.com
36. The (not so) early days of your cluster
… and latencies rise
as repair is a CPU and
I/O intensive operation
www.thelastpickle.com
37. Your cluster is a grown up now
… until it breaks your
cluster
www.thelastpickle.com
38. How can it break ?
Load gets too high
www.thelastpickle.com
39. How can it break ?
Load gets too high
You don’t meet your latency SLA anymore
www.thelastpickle.com
40. How can it break ?
Load gets too high
www.thelastpickle.com
41. How can it break ?
Load gets too high
Streams get stuck
www.thelastpickle.com
42. How can it break ?
Load gets too high
Streams get stuck
and out of nowhere, all nodes start to eat
all your CPU doing nothing
www.thelastpickle.com
43. The fun part ?
You need to run repair
to recover from
the repair outage !
www.thelastpickle.com
44. The cluster keeps growing
And you realize orchestration is needed
to stop blowing up your cluster
www.thelastpickle.com
62. Incremental repair caveats
Carefully prepare your switch to
incremental repair
i.e. do not run « nodetool repair -inc »
straight away…
www.thelastpickle.com
67. Incremental repair caveats
Validator.java:261 -
Failed creating a merkle tree for [repair #e4c782d0-11fc-11e6-
b616-51a3849870bb on table_v2/table_attributes,
[(8835460833482333317,8838777311566358575],
(-7300486781514672850,-7298192396576668423],
(-959298474675167225,-959177964106074209]]], /10.10.10.33
(see log for details)
www.thelastpickle.com
71. Incremental repair caveats
Do not use -pr with incremental repair
Useless : data is repaired once only
www.thelastpickle.com
72. Incremental repair caveats
Do not use -pr with incremental repair
Useless : data is repaired once only
Expensive : anticompaction overhead
www.thelastpickle.com
74. Incremental repair will not…
Prevent you from having to run full repair
www.thelastpickle.com
75. Reaper does not support incremental repair
But this fork does :
https://github.com/adejanovski/cassandra-
reaper/tree/inc-repair-that-works
www.thelastpickle.com
76. Reaper does not support incremental repair
And this one embeds the modded UI :
https://github.com/adejanovski/cassandra-
reaper/tree/inc-repair-support-with-ui
www.thelastpickle.com
77. Reaper does not support incremental repair
Not enough time to write those urls ?
github.com/adejanovski
www.thelastpickle.com