15. Why do we need Replication?
• Failover
• Backups
• Secondary Batch Jobs
• High Availability
15
16. Outages
• Planned
– Hardware upgrade
– OS or file-system tuning
– Software upgrade
– Relocation of data to new file-system / storage
• Un-planed
– Human Error
– Hardware Failure
– Data Center / Region Outage
– Application Corruption
16
17. Replica Sets
• Data Protection
– Multiple copies of data
– Data spread across data centers, AZ’s etc
• High Availability
– Automated Failover
– Automated Recovery
17
23. Sharding
• Data Location Transparent to Code
• Data Distribution is Automatic
– as well as re-distribution
• Aggregation System resources Horizontally
• No CODE Changes!!!
23
32. Consistency
• Eventual Consistency
– Allow updates when a system as been partitioned
– Resolve conflicts later
– Ex: Cassandra, CouchDB
• Immediate Consistency
– Single Master
– Avoids conflicts
– Example: MongoDB
32
33. Durability
• For how long is my data available?
• When do I know my data is safe?!
• Where is it safe?
• MongoDB style:
– Fire and Forget
– Get Last Error
– Journal Sync
– Replica Safe
33
40. JSON & Scale Out
• Embedding removes the need for:
– Distributed Joins
– Two Phase Commit
• Enables data to be distributed across many
nodes without penalty
40