2. What can go wrong?
• Network breaks in transit
• Server crashes while processing
• Server blows up after processing a write before
replication
• Server processes, crashes, and then a
conflicting write happens elsewhere
• All copies burn in a fire
• 20 years later, no one remembers how to read it
7. Single Server – How a Write
Works
• Client sends a write operation to server
• Received by server’s tcp stack
• MongoDB process queues write
• Write happens in memory
• Depending on what Write Concern asks for
– Respond immediately
– Wait for data to be journaled, then respond
8. Single Server – What can go
wrong
• Network can go down once message hits other side
• Client doesn’t know what happens without going back and checking
• Write could fail for logical reason (unique key exception)
• Server could crash before journaled
• Write is lost journaled
• Server could crash after journaled
• When server is recovered, write is replayed and safe
• Hard drive can crash irrecoverably
• Data center could lose power for large period of time
10. Replica Set - Reminders
• N nodes
• Each node has a fully copy of the data
• Replication is asynchronous
11. Replica Set -
Acknowledgements
• “w” : how many servers must apply write before
acknowledged
• w=2 : do not acknowledge until write is on two
servers
– If primary fails, election guaranteesnew primary has all writes
acknowledge w=2
• w=majority : do not acknowledge until writes is on a
majority of nodes in a replica set
– If any primaryis elected automatically,all writes acknowledged with
w=majoritywill be on primary.
12. Good, but not enough…
What if I lose an entire
data center?
13. Replica Set - tags
• A node can have a set of tags
– region=us-east
– color=blue
• Operator configures write level
– Critical– has to be in 3 regions
– Important – has to be in 2 regions
• w=critical
– Do not acknowledge write until its in 3 data centers
– Losing an entire data center causes no data loss
14. What about sharding?
• Same rules apply
• Given a series of writes, they may go to different
shards
– Aw=majority at the end means all writes on that socket
are acknowledge by a majority of the relevant replica set
• Config servers have no impact on fault
tolerance/durability, only on admin uptime (or
real uptime in a disaster)
16. Personal Blog
• Single server
• No replication
• Hourly backups
• If server crashes
– Down until back up
– All acknowledge writes safe
• If server is destroyed
– Have to recover from backup
– Lose up to 1 hour of writes
17. Departmental App
• Single replica set
• 3 nodes in a single server
• If any single node goes down
– System is still readable/writeable
– Writes done with w=2 are safe
• If 2 nodes go down at the same time
– Only writes with w=3 are safe (bad idea)
– No primary, last node is read-only
18. Core User Database
• Single replica set
• 3 data centers
– Primary data center: 3 node (p=2)
– 2 alternates with 2 nodes each (p=1)
• Different types of operations
– Password change (w=majority)
– Adds a “like” (w=2)
– Login count (w=1)
19. Core User Database – cont’d
• Lose any single server
– Can only lose a login count
• Lose any 2 servers
– Could lose a “like” if you are unlucky
• Lose a data center
– Still have a majority
– All password changes are safe
21. When to give a choice?
• Give choice over semantics
– Developers and Operators know their needs
• Tuning parameters are dangerous
– System should be smart enough to avoid thousands of
knobs
• Defaults should be
– Intuitive and sensible
– Changing is hard
– Always changing a little
23. Already have them in different
architecture components
• Caching
• Worker queues
• Asynchronous replication
• Synchronous replication
• Two-phase commit
24. MongoDB gives you the choice of
durability semantics from many
systems in one.
• Control per write
• One source of truth in architecture
25. What should you do?
• Pick a default write level for your app
• Only deviate with good reason
• Test disaster scenario so you know what’s
going to happen
The technical information in this talk is probably spread out across many different talks.The goal here is to put it in all in the same context.The key things are: - understanding how things technically work - understanding the choices you have - understanding the consequences of those decisions
Lots of words: durability, journaling, replication, fault tolerance, distributed, commit
DR, manual failover, restore from a backup, hexedit
It doesn’t matter if its theoretically possible to get the data, if a user can’t see it, its worthless.
In any db, no way to definitively know
Take one step back and talk about philosophy of mongo.Kernel engineers (and users) often complain about the lack of tuningBut they also complain about too many choicesHave to balance.