61. Analyser
Push new count
Front End Front End Front End
Write fan out
62.
63.
64. Sharding
Fan out: writes, reads
Replication/Caching
Shard selection: random, by user, by document
Consistency model: eventual, shard local
Sync -> Async
65. Sharding
Fan out: writes, reads
Replication/Caching
Shard selection: random, by user, by document
Consistency model: eventual, shard local
Sync -> Async
Questions?
tirsen@google.com
Hinweis der Redaktion
Hello
All generalizations are lies but there is a shift in what type of web apps we’re building today.
Moving from read-mostly to a read-write web. The old web displayed content to the user had limited points of interaction (shopping carts, TODO etc). The new web invites us to interact everywhere. This means that in general we’re going to see a larger proportion of writes to our reads.
We’re also moving from web apps that are largely private (TODO improve this argument) to social apps that have a lot of “cross user” behavior which as we will see is notoriously hard to scale.
And lastly, a lot of software we previously used on our “PC” now run on datacenters hosted by another company.
You may be aware of...
Image - edge cache
Search engine - read-only
Visitors today - not real number
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Read -> Increase -> Write
Writes bottleneck
Split
How do we implement this?
First write
Select by - random fine
Inbox, contacts, labels - by user
IM conversation - shared between two users
Doclist - by user
Documents shared - by document
Sum - need fan out
Side story
Serial fan out
Latency = O(n) with fan out size
Parallel fan out
Latency = O(1) with fan out size
Threads no good: 1000 r/s w 1s latency * 1000 shard fan out = need 1 million threads
Async I/O and RPC library
Futures
Write - your shard
Fan out - by follower
Writes scale by adding new shards
Reads - still hit all shards
Reads faster and no need for locks
Still limit to scale
We’ll get to that but not our biggest problem...
TADA! Let’s have a look at our availability
Availability as we grow
99% downtime one machine
Availability as we grow
99% downtime one machine
“What’s the probability all shards are down?”
Writes look good
1000 nines uptime
“What’s the probability all shards are down?”
Writes look good
1000 nines uptime
“What’s the no shard is down?”
50 - 60% - retry
500 - 0.6% - ouch
1000 - 0.004% - impossible
Probability -> inevitability
“What’s the no shard is down?”
50 - 60% - retry
500 - 0.6% - ouch
1000 - 0.004% - impossible
Probability -> inevitability
“What’s the no shard is down?”
50 - 60% - retry
500 - 0.6% - ouch
1000 - 0.004% - impossible
Probability -> inevitability
“What’s the no shard is down?”
50 - 60% - retry
500 - 0.6% - ouch
1000 - 0.004% - impossible
Probability -> inevitability
“What’s the no shard is down?”
50 - 60% - retry
500 - 0.6% - ouch
1000 - 0.004% - impossible
Probability -> inevitability
Write -> Replica
Read across 5 replicas
Same curve much higher up
Same as GFS and BigTable
Would work
Each shard has little data so...
Same curve much higher up
Same as GFS and BigTable
Would work
Each shard has little data so...
Same curve much higher up
Same as GFS and BigTable
Would work
Each shard has little data so...
Same curve much higher up
Same as GFS and BigTable
Would work
Each shard has little data so...
Cache at each shard
Each shard - return full sum
Disregard cache refresh
Same uptime as writes
Cache refresh error rate limits scale (very high!)
Twitter availability - cheap shot
People blame Rails
Caching was their problem
A series of writes
One read
Which write do we get?
Simple definition: This is “consistent”
“Eventual” consistency - stop writing and you read last write
Never stop writing
Our consistency
Multiple shards
System-wide state returned was never assumed
Has interesting property -> example
We get a request
Cache 299
Cache 52
Get some requests
52 -> 56
Get some requests
299 -> 302
Detail on one request
Bump up 200 -> 201
Get back 552
Cached: 299, 52
Look at that: it’s the last write!
Detail on one request
Bump up 200 -> 201
Get back 552
Cached: 299, 52
Look at that: it’s the last write!
Detail on one request
Bump up 200 -> 201
Get back 552
Cached: 299, 52
Look at that: it’s the last write!
Consistent in our own shard
“Shard local” consistency model
Same as AppEngine’s data store
Write message -> see it immediately
Others trickle in as those caches refresh
Shopping cart can’t be inconsistent
Withdraw money from paypal can’t be consistent
TADA! Let’s have a look at our availability
“Completely different”
Front end logs
Analysers sum up
Push to front end
“Not so different” - two new tricks
Detach from fulfilling request
Append cheaper
Shard logs
Shard analysers
Write fan out
Same availability as read fan out but not critical
At front end failure - skip and it serves out of date value
The “bought together”, “searched for” and so forth sections of the Amazon webpages are of course the results of massive log analysis. These don’t happen as part of the request but are rather executed in the backend and pushed to the shards serving those product pages.