Understanding AntiEntropy in Cassandra

When Bad Things
Happen to Good Data
Understanding Anti-Entropy in Cassandra
#cassandra13
Jason Brown
@jasobrown jasedbrown@gmail.com

About me
• Senior Software Engineer, Netflix
• Apache Cassandra committer
• E-commerce Architect, Major League Baseball Advanced
Media
• Wireless Developer (J2ME and BREW)
#cassandra13

Maintaining consistent state is hard in a distributed system
CAP theorem is working against you
#cassandra13

Inconsistencies creep in
• Node is down
• Network partition
• Dropped Mutations
• Process crash before flush
• File corruption
#cassandra13

Anti-Entropy Overview
• Write time
• Tunable consistency
• Atomic batches
• Hinted handoff
• Read time
• Consistent reads
• Read repair
• Maintenance time
• Node repair
#cassandra13

C* Write Basics
• Determine all replica nodes, in all DCs
• Send to all replicas in local DC
• Send to one replica in remote DCs
• It will forward to peers
• All respond back to coordinator
#cassandra13

Writes – request path
#cassandra13

Writes – response path
#cassandra13

Tunable consistency
Coordinator blocks for specified count of replicas to respond
consistency levels:
• ANY
• ONE / TWO / THREE
• LOCAL_QUORUM
• EACH_QUORUM
• ALL
#cassandra13

Hinted Handoff
Save a copy of the write for down nodes, and replay later
Hint = target replica ID + mutation data
#cassandra13

Hinted Handoff - storing
• On coordinator, store hint for nodes not up
• Also, if a replica doesn’t respond within
write_request_timeout_in_ms, store a hint
• max_hint_window_in_ms – max time a node will create
hints for a dead node
#cassandra13

Hinted Handoff - replay
• Try to send hints to nodes
• Runs every ten minutes
• Multithreaded (c* 1.2)
• Throttleable (kb per second)
#cassandra13

Hinted Handoff – down node
#cassandra13

Hinted Handoff – replay
#cassandra13

What if coordinator dies?
#cassandra13

Atomic Batches
• Coordinator stores incoming mutation to two peers in
same DC
• Deletes batch from peers on successful completion
• Peers will play batch if not deleted
• Runs every 60 seconds
• With c* 1.2, all mutates use atomic batch
#cassandra13

Cassandra reads - setup
• Determine replicas to invoke
• consistency level vs. read repair
• First data node responds with full data set, other send
digest
• Coordinator waits for consistency_level nodes to respond
#cassandra13

LOCAL_QUORUM read
#cassandra13

Consistent reads
• Compare digests
• If any mismatches
• re-request to same nodes (full data set)
• compare full data sets, send updates
• block until out of date replicas respond successfully
• Return merged data set to client
#cassandra13

Read repair
• Synchronizes the client-requested data amongst all
replicas
• Piggy-backs on normal reads, but waits for all replicas to
responds (asynchronously)
• Compares the digests and follow same alg as consistent
read
#cassandra13

Read Repair
#cassandra13
Green lines = LOCAL_QUORUM nodes
Blue lines = nodes for read repair

Read repair configuration
• Setting per column family
• Percentage of all reads to CF
• Local DC vs. Global
#cassandra13

Read repair fixes data that is actually
requested,
…but what about data that isn’t requested?
#cassandra13

Node repair - introduction
• Repairs inconsistencies across all replicas for a given
range
• nodetool repair
• repairs the ranges the node contains
• one or more column families (within the same keyspace)
• can choose local datacenter only (c* 1.2)
#cassandra13

Node Repair - cautions
• Should be part of standard c* operations
• Especially if you delete data
• Repair is IO and CPU intensive
#cassandra13

Node Repair – details, 1
• Determine peer nodes with matching ranges
• Triggers a major (validation) compaction on peer nodes
• read and generate hash for every row in CF
• add result to a Merkle Tree
• return tree to initiator
#cassandra13

Node Repair – details, 2
• Initiator awaits trees from participating nodes
• Compares every tree to every other tree
• If any differences detected, the differing nodes exchange
conflicting range(s)
• Written out as new, local SSTables
#cassandra13

Read Repair – example
#cassandra13

Anti-Entropy – Wrap Up
• CAP Theorem lives, tradeoffs must be understood and
made
• C* contains processes to make diverging data sets
consistent
• Tunable controls exist at write and read times, as well on-
demand
#cassandra13

Thank you!
Q & A time
@jasobrown
#cassandra13

Understanding AntiEntropy in Cassandra

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Understanding AntiEntropy in Cassandra

Ähnlich wie Understanding AntiEntropy in Cassandra (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Understanding AntiEntropy in Cassandra