Boost Fertility New Invention Ups Success Rates.pdf
Â
Understanding AntiEntropy in Cassandra
1. When Bad Things
Happen to Good Data
Understanding Anti-Entropy in Cassandra
#cassandra13
Jason Brown
@jasobrown jasedbrown@gmail.com
2. About me
⢠Senior Software Engineer, Netflix
⢠Apache Cassandra committer
⢠E-commerce Architect, Major League Baseball Advanced
Media
⢠Wireless Developer (J2ME and BREW)
#cassandra13
4. Inconsistencies creep in
⢠Node is down
⢠Network partition
⢠Dropped Mutations
⢠Process crash before flush
⢠File corruption
#cassandra13
7. C* Write Basics
⢠Determine all replica nodes, in all DCs
⢠Send to all replicas in local DC
⢠Send to one replica in remote DCs
⢠It will forward to peers
⢠All respond back to coordinator
#cassandra13
10. Tunable consistency
Coordinator blocks for specified count of replicas to respond
consistency levels:
⢠ANY
⢠ONE / TWO / THREE
⢠LOCAL_QUORUM
⢠EACH_QUORUM
⢠ALL
#cassandra13
11. Hinted Handoff
Save a copy of the write for down nodes, and replay later
Hint = target replica ID + mutation data
#cassandra13
12. Hinted Handoff - storing
⢠On coordinator, store hint for nodes not up
⢠Also, if a replica doesnât respond within
write_request_timeout_in_ms, store a hint
⢠max_hint_window_in_ms â max time a node will create
hints for a dead node
#cassandra13
13. Hinted Handoff - replay
⢠Try to send hints to nodes
⢠Runs every ten minutes
⢠Multithreaded (c* 1.2)
⢠Throttleable (kb per second)
#cassandra13
17. Atomic Batches
⢠Coordinator stores incoming mutation to two peers in
same DC
⢠Deletes batch from peers on successful completion
⢠Peers will play batch if not deleted
⢠Runs every 60 seconds
⢠With c* 1.2, all mutates use atomic batch
#cassandra13
19. Cassandra reads - setup
⢠Determine replicas to invoke
⢠consistency level vs. read repair
⢠First data node responds with full data set, other send
digest
⢠Coordinator waits for consistency_level nodes to respond
#cassandra13
21. Consistent reads
⢠Compare digests
⢠If any mismatches
⢠re-request to same nodes (full data set)
⢠compare full data sets, send updates
⢠block until out of date replicas respond successfully
⢠Return merged data set to client
#cassandra13
22. Read repair
⢠Synchronizes the client-requested data amongst all
replicas
⢠Piggy-backs on normal reads, but waits for all replicas to
responds (asynchronously)
⢠Compares the digests and follow same alg as consistent
read
#cassandra13
24. Read repair configuration
⢠Setting per column family
⢠Percentage of all reads to CF
⢠Local DC vs. Global
#cassandra13
25. Read repair fixes data that is actually
requested,
âŚbut what about data that isnât requested?
#cassandra13
26. Node repair - introduction
⢠Repairs inconsistencies across all replicas for a given
range
⢠nodetool repair
⢠repairs the ranges the node contains
⢠one or more column families (within the same keyspace)
⢠can choose local datacenter only (c* 1.2)
#cassandra13
27. Node Repair - cautions
⢠Should be part of standard c* operations
⢠Especially if you delete data
⢠Repair is IO and CPU intensive
#cassandra13
28. Node Repair â details, 1
⢠Determine peer nodes with matching ranges
⢠Triggers a major (validation) compaction on peer nodes
⢠read and generate hash for every row in CF
⢠add result to a Merkle Tree
⢠return tree to initiator
#cassandra13
29. Node Repair â details, 2
⢠Initiator awaits trees from participating nodes
⢠Compares every tree to every other tree
⢠If any differences detected, the differing nodes exchange
conflicting range(s)
⢠Written out as new, local SSTables
#cassandra13
35. Anti-Entropy â Wrap Up
⢠CAP Theorem lives, tradeoffs must be understood and
made
⢠C* contains processes to make diverging data sets
consistent
⢠Tunable controls exist at write and read times, as well on-
demand
#cassandra13