PagerDuty: Span the WAN? Yes you can!

2015-10-01
Span the WAN? Yes you can!
paul@pagerduty.com
#CassandraSummit

2015-10-01MAKING PAGERDUTY MORE RELIABLE USING PXC
Span the WAN. Why?

2015-10-01SPAN THE WAN? YES YOU CAN!

2015-10-01
PagerDuty: some history
•Monolithic Ruby on Rails + MySQL
•Hosted in AWS us-east-1
•AWS outages in 2010 and 2011
•…including correlated multi-AZ failures
•PagerDuty was heavily impacted
•Needed resiliency to this failure mode
SPAN THE WAN? YES YOU CAN!

2015-10-01
Design goals
•Continuity during a DC drop (AZ or Region)
•No operator intervention
•Can’t lose data
•Can’t delay data (shelf life)
•Timely notifications - always
•Measured in 10’s of seconds

2015-10-01
Design decisions
•Masterless: peer-based & clustered
•Can’t tolerate staleness: synchronous WAN replication
•Manage state: consistent reads
•Opted to use Cassandra
•…despite many of Cassandra’s features not being relevant

2015-10-01
How Cassandra is often used
•Massive throughput
•Lots of data
•Horizontally scalable
•Eventually consistent
•High write:read ratio
•High performance individual operations

2015-10-01
Essential Cassandra features for PagerDuty
•Quorum operations
•Tuneable consistency
•Synchronous WAN replication

WAN-spanning system design

2015-10-01
System architecture
Shared cross-DC
datastore
(Cassandra)
Distributed Coordination
(ZooKeeper)
Clustered Application

2015-10-01
Quorum consistency systems
•Each item replicated N times
•Writes: require W of N replicas
•Reads: require R of N replicas
•W + R <= N: read can miss a write
•W + R > N: read can’t miss a write
WRITE READ

2015-10-01
•Replication factor: N=5
•Three DCs
•DC-aware placement strategy
•W=3: all writes hit multiple DCs
•R=3: all reads hit multiple DCs
•3 + 3 > 5: consistent reads
Cassandra setup
Cass 5
Cass 1
Cass 2 Cass 4
Cass 3
DC-A
DC-C
DC-B

2015-10-01
Data layer summary
•Data safe against DC failure
•Consistent reads (of acknowledged writes)
•Expensive multi-DC writes & reads
•Managing state: No ACID transactions!
•Enforce “transactions” in the application layer

2015-10-01
Application layer: “transactions”
•Sequence of logic and Cassandra operations
•Implement sequence as idempotent
•Failure is not an option
•Enforce transaction ordering
•Expect (some) (transient) inconsistencies

Tales from production

2015-10-01
What about the network?
Cass 5
Cass 1
Cass 2 Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
2
4
m
s
3
m
s
•Network diversity limits DC choices
•Result? Uneven network latencies

2015-10-01
…and how you should think of the network
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms

2015-10-01
Reads and writes
DC-A
DC-B
DC-C
Client
R1
R2
R3
R4
R5

2015-10-01
Read and write performance
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•R and W =3 means always hitting replicas in two DCs (by design)
•Reads coordinated from DC-B or DC-C nodes will take >3ms
•Reads coordinated from DC-A nodes will take >24ms

2015-10-01
Another latency effect? Per-node read volume

2015-10-01
Per-node read volume: why so skewed?

2015-10-01
Writes: Which replicas are involved? All 5
DC-A
DC-B
DC-C
Client
R1
R2
R3
R4
R5

2015-10-01
Writes: per-node volume
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•N=5, so there is a write op on each replica
•All replicas experience the same per-node write load

2015-10-01
Reads: Which replicas are involved? Only 3!
DC-A
DC-B
DC-C
Client
R1
R2
R3
R4
R5

2015-10-01
Reads: per-node volume
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•Coordinator chooses R fastest replicas (R=3)
•Network latency steers to the nearest replicas

2015-10-01
Reads: per-node volume (Cass 3 as coord)
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•Chooses 3, 4, and 5
•Same when Cass 4 or Cass 5 coordinates

2015-10-01
Reads: per-node volume (Cass 1 as coord)
Cass 5
Cass 1
Cass 2
Cass 4
Cass 3
DC-A
DC-C
DC-B
24 ms
24 ms
3ms
•Hits 1, 2 and (randomly) one of 3, 4, 5
•Same when Cass 2 coordinates

2015-10-01
Reads: per-node volume, uniform coord usage
Coordinator
Node
Cass 1 Cass 2 Cass 3 Cass 4 Cass 5
Cass 1 1 1 0.33 0.33 0.33
Cass 2 1 1 0.33 0.33 0.33
Cass 3 0 0 1 1 1
Cass 4 0 0 1 1 1
Cass 5 0 0 1 1 1
Total
requests
2 2 3.66 3.66 3.66

2015-10-01
Per-node read volume: reality vs. theory

2015-10-01
What about scaling out?
• Asymmetrical per-node read volumes
• So each DC has different CPU and disk IO needs
• Different node size?
• Different per-DC node count?
• What about DC degradation or loss?
• End up with same-sized nodes

When a data center vanishes…

2015-10-01
Major outage: DC-C (May, 2015)
• All hosts unreachable for ~5 hours

2015-10-01
Seamless data center migration (August 2015)
• Moved DC-C fleet from one provider to another
• Remove old node; add new node
• No application-level migration needed
• Zero customer impact

2015-10-01
DC-A to DC-B fiber cut (September, 2015)
• DC-A to DC-B network latency 24ms -> 200ms, lasted 48 hours
• All Cass ops now take 24ms
FIBER CUT EAST-1

And back to where we started

2015-10-01
What have we learned?
• WAN-spanning synchronous replication is a thing
• Data layer consistent reads are practical
• Application layer consequences for managing state
• Network topology affects:
• Request performance
• Per-node load
• Trade off latency for reliability

2015-10-01
Span the WAN?
Yes you can!

2015-10-01
paul@pagerduty.com
PAGERDUTY.COM/JOBS

2015-10-01
Questions?
paul@pagerduty.com

PagerDuty: Span the WAN? Yes you can!

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to PagerDuty: Span the WAN? Yes you can!

Similar to PagerDuty: Span the WAN? Yes you can! (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

PagerDuty: Span the WAN? Yes you can!