Most Cassandra usages take advantage of its exceptional performance and ability to handle massive data sets. At PagerDuty, we use Cassandra for entirely different reasons: to reliably manage mutable application states and to maintain durability requirements even in the face of full data center outages. We achieve this by deploying Cassandra clusters with hosts in multiple WAN-separated data centers, configured with per-data center replica placement requirements, and with significant application-level support to use Cassandra as a consistent datastore. Accumulating several years of experience with this approach, we've learned to accommodate the impact of WAN network latency on Cassandra queries, how to horizontally scale while maintaining our placement invariants, why asymmetric load is experienced by nodes in different data centers, and more. This talk will go over our workload and design goals, detail the resultant Cassandra system design, and explain a number of our unintuitive operational learnings about this novel Cassandra usage paradigm.
5. 2015-10-01
PagerDuty: some history
•Monolithic Ruby on Rails + MySQL
•Hosted in AWS us-east-1
•AWS outages in 2010 and 2011
•…including correlated multi-AZ failures
•PagerDuty was heavily impacted
•Needed resiliency to this failure mode
SPAN THE WAN? YES YOU CAN!
6. 2015-10-01
Design goals
•Continuity during a DC drop (AZ or Region)
•No operator intervention
•Can’t lose data
•Can’t delay data (shelf life)
•Timely notifications - always
•Measured in 10’s of seconds
SPAN THE WAN? YES YOU CAN!
7. 2015-10-01
Design decisions
•Masterless: peer-based & clustered
•Can’t tolerate staleness: synchronous WAN replication
•Manage state: consistent reads
•Opted to use Cassandra
•…despite many of Cassandra’s features not being relevant
SPAN THE WAN? YES YOU CAN!
8. 2015-10-01
How Cassandra is often used
SPAN THE WAN? YES YOU CAN!
•Massive throughput
•Lots of data
•Horizontally scalable
•Eventually consistent
•High write:read ratio
•High performance individual operations
12. 2015-10-01
Quorum consistency systems
•Each item replicated N times
•Writes: require W of N replicas
•Reads: require R of N replicas
•W + R <= N: read can miss a write
•W + R > N: read can’t miss a write
SPAN THE WAN? YES YOU CAN!
WRITE READ
13. 2015-10-01
•Replication factor: N=5
•Three DCs
•DC-aware placement strategy
•W=3: all writes hit multiple DCs
•R=3: all reads hit multiple DCs
•3 + 3 > 5: consistent reads
Cassandra setup
SPAN THE WAN? YES YOU CAN!
Cass 5
Cass 1
Cass 2 Cass 4
Cass 3
DC-A
DC-C
DC-B
14. 2015-10-01
Data layer summary
•Data safe against DC failure
•Consistent reads (of acknowledged writes)
•Expensive multi-DC writes & reads
•Managing state: No ACID transactions!
•Enforce “transactions” in the application layer
SPAN THE WAN? YES YOU CAN!
15. 2015-10-01
Application layer: “transactions”
•Sequence of logic and Cassandra operations
•Implement sequence as idempotent
•Failure is not an option
•Enforce transaction ordering
•Expect (some) (transient) inconsistencies
SPAN THE WAN? YES YOU CAN!
17. 2015-10-01
What about the network?
SPAN THE WAN? YES YOU CAN!
Cass 5
Cass 1
Cass 2 Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
2
4
m
s
3
m
s
•Network diversity limits DC choices
•Result? Uneven network latencies
18. 2015-10-01
…and how you should think of the network
SPAN THE WAN? YES YOU CAN!
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
20. 2015-10-01
Read and write performance
SPAN THE WAN? YES YOU CAN!
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•R and W =3 means always hitting replicas in two DCs (by design)
•Reads coordinated from DC-B or DC-C nodes will take >3ms
•Reads coordinated from DC-A nodes will take >24ms
24. 2015-10-01
Writes: per-node volume
SPAN THE WAN? YES YOU CAN!
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•N=5, so there is a write op on each replica
•All replicas experience the same per-node write load
31. 2015-10-01
What about scaling out?
• Asymmetrical per-node read volumes
• So each DC has different CPU and disk IO needs
• Different node size?
• Different per-DC node count?
• What about DC degradation or loss?
• End up with same-sized nodes
SPAN THE WAN? YES YOU CAN!
34. 2015-10-01
Seamless data center migration (August 2015)
• Moved DC-C fleet from one provider to another
• Remove old node; add new node
• No application-level migration needed
• Zero customer impact
SPAN THE WAN? YES YOU CAN!
35. 2015-10-01
DC-A to DC-B fiber cut (September, 2015)
• DC-A to DC-B network latency 24ms -> 200ms, lasted 48 hours
• All Cass ops now take 24ms
SPAN THE WAN? YES YOU CAN!
FIBER CUT EAST-1
37. 2015-10-01
What have we learned?
• WAN-spanning synchronous replication is a thing
• Data layer consistent reads are practical
• Application layer consequences for managing state
• Network topology affects:
• Request performance
• Per-node load
• Trade off latency for reliability
SPAN THE WAN? YES YOU CAN!