SlideShare ist ein Scribd-Unternehmen logo
1 von 26
CTO, 10gen
Eliot Horowitz
#MongoDBDays
How to keep your data safe
in MongoDB
What can go wrong?
• Network breaks in transit
• Server crashes while processing
• Server blows up after processing a write before
replication
• Server processes, crashes, and then a
conflicting write happens elsewhere
• All copies burn in a fire
• 20 years later, no one remembers how to read it
What is data safety?
Version 1
Probability that a given
write or piece of data is
accessible given human
intervention and infinite
time.
Version 2
Probability that a given
write or piece of data is
visible in a query.
Single Server – How a Write
Works
• Client sends a write operation to server
• Received by server’s tcp stack
• MongoDB process queues write
• Write happens in memory
• Depending on what Write Concern asks for
– Respond immediately
– Wait for data to be journaled, then respond
Single Server – What can go
wrong
• Network can go down once message hits other side
• Client doesn’t know what happens without going back and checking
• Write could fail for logical reason (unique key exception)
• Server could crash before journaled
• Write is lost journaled
• Server could crash after journaled
• When server is recovered, write is replayed and safe
• Hard drive can crash irrecoverably
• Data center could lose power for large period of time
Any single server will fail
Replica Sets
Replica Set - Reminders
• N nodes
• Each node has a fully copy of the data
• Replication is asynchronous
Replica Set -
Acknowledgements
• “w” : how many servers must apply write before
acknowledged
• w=2 : do not acknowledge until write is on two
servers
– If primary fails, election guaranteesnew primary has all writes
acknowledge w=2
• w=majority : do not acknowledge until writes is on a
majority of nodes in a replica set
– If any primaryis elected automatically,all writes acknowledged with
w=majoritywill be on primary.
Good, but not enough…
What if I lose an entire
data center?
Replica Set - tags
• A node can have a set of tags
– region=us-east
– color=blue
• Operator configures write level
– Critical– has to be in 3 regions
– Important – has to be in 2 regions
• w=critical
– Do not acknowledge write until its in 3 data centers
– Losing an entire data center causes no data loss
What about sharding?
• Same rules apply
• Given a series of writes, they may go to different
shards
– Aw=majority at the end means all writes on that socket
are acknowledge by a majority of the relevant replica set
• Config servers have no impact on fault
tolerance/durability, only on admin uptime (or
real uptime in a disaster)
Examples
Personal Blog
• Single server
• No replication
• Hourly backups
• If server crashes
– Down until back up
– All acknowledge writes safe
• If server is destroyed
– Have to recover from backup
– Lose up to 1 hour of writes
Departmental App
• Single replica set
• 3 nodes in a single server
• If any single node goes down
– System is still readable/writeable
– Writes done with w=2 are safe
• If 2 nodes go down at the same time
– Only writes with w=3 are safe (bad idea)
– No primary, last node is read-only
Core User Database
• Single replica set
• 3 data centers
– Primary data center: 3 node (p=2)
– 2 alternates with 2 nodes each (p=1)
• Different types of operations
– Password change (w=majority)
– Adds a “like” (w=2)
– Login count (w=1)
Core User Database – cont’d
• Lose any single server
– Can only lose a login count
• Lose any 2 servers
– Could lose a “like” if you are unlucky
• Lose a data center
– Still have a majority
– All password changes are safe
Choice is a double
edged sword
When to give a choice?
• Give choice over semantics
– Developers and Operators know their needs
• Tuning parameters are dangerous
– System should be smart enough to avoid thousands of
knobs
• Defaults should be
– Intuitive and sensible
– Changing is hard
– Always changing a little
Semantics Knobs – too
many?
Already have them in different
architecture components
• Caching
• Worker queues
• Asynchronous replication
• Synchronous replication
• Two-phase commit
MongoDB gives you the choice of
durability semantics from many
systems in one.
• Control per write
• One source of truth in architecture
What should you do?
• Pick a default write level for your app
• Only deviate with good reason
• Test disaster scenario so you know what’s
going to happen
CTO, 10gen
Eliot Horowitz
#MongoDBDays
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

5 things you need to know about the Scala compiler
5 things you need to know about the Scala compiler5 things you need to know about the Scala compiler
5 things you need to know about the Scala compilerIulian Dragos
 
Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Jyrki Pulliainen
 
Db spof(mssql, my sql)
Db spof(mssql, my sql)Db spof(mssql, my sql)
Db spof(mssql, my sql)재원 최
 
Making Symfony Services async with RabbitMq (and more Symfony)
Making Symfony Services async with RabbitMq (and more Symfony)Making Symfony Services async with RabbitMq (and more Symfony)
Making Symfony Services async with RabbitMq (and more Symfony)Gaetano Giunta
 
Day 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application ArchitectureDay 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application ArchitectureBarry Jones
 
Asynchronous programming using CompletableFutures in Java
Asynchronous programming using CompletableFutures in JavaAsynchronous programming using CompletableFutures in Java
Asynchronous programming using CompletableFutures in JavaOresztész Margaritisz
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015fukamachi
 
OpenWhisk Go/Swift/Binaries Runtime
OpenWhisk Go/Swift/Binaries RuntimeOpenWhisk Go/Swift/Binaries Runtime
OpenWhisk Go/Swift/Binaries RuntimeMichele Sciabarrà
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service DiscoveryJohn Billings
 
Clack: glue for web apps
Clack: glue for web appsClack: glue for web apps
Clack: glue for web appsfukamachi
 
Real time operating systems (rtos) concepts 7
Real time operating systems (rtos) concepts 7Real time operating systems (rtos) concepts 7
Real time operating systems (rtos) concepts 7Abu Bakr Ramadan
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!MongoDB
 
Real time operating systems (rtos) concepts 3
Real time operating systems (rtos) concepts 3Real time operating systems (rtos) concepts 3
Real time operating systems (rtos) concepts 3Abu Bakr Ramadan
 
Woo: Writing a fast web server
Woo: Writing a fast web serverWoo: Writing a fast web server
Woo: Writing a fast web serverfukamachi
 
Cutting the pipe
Cutting the pipeCutting the pipe
Cutting the pipeYoungGi Kim
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deploymentAbhishek Singh
 

Was ist angesagt? (18)

5 things you need to know about the Scala compiler
5 things you need to know about the Scala compiler5 things you need to know about the Scala compiler
5 things you need to know about the Scala compiler
 
Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)Taming Pythons with ZooKeeper (Pyconfi edition)
Taming Pythons with ZooKeeper (Pyconfi edition)
 
Db spof(mssql, my sql)
Db spof(mssql, my sql)Db spof(mssql, my sql)
Db spof(mssql, my sql)
 
Making Symfony Services async with RabbitMq (and more Symfony)
Making Symfony Services async with RabbitMq (and more Symfony)Making Symfony Services async with RabbitMq (and more Symfony)
Making Symfony Services async with RabbitMq (and more Symfony)
 
Day 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application ArchitectureDay 9 - PostgreSQL Application Architecture
Day 9 - PostgreSQL Application Architecture
 
Asynchronous programming using CompletableFutures in Java
Asynchronous programming using CompletableFutures in JavaAsynchronous programming using CompletableFutures in Java
Asynchronous programming using CompletableFutures in Java
 
Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015Woo: Writing a fast web server @ ELS2015
Woo: Writing a fast web server @ ELS2015
 
Apache pulsar
Apache pulsarApache pulsar
Apache pulsar
 
OpenWhisk Go/Swift/Binaries Runtime
OpenWhisk Go/Swift/Binaries RuntimeOpenWhisk Go/Swift/Binaries Runtime
OpenWhisk Go/Swift/Binaries Runtime
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
 
Clack: glue for web apps
Clack: glue for web appsClack: glue for web apps
Clack: glue for web apps
 
Real time operating systems (rtos) concepts 7
Real time operating systems (rtos) concepts 7Real time operating systems (rtos) concepts 7
Real time operating systems (rtos) concepts 7
 
Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!Mobile 3: Launch Like a Boss!
Mobile 3: Launch Like a Boss!
 
Real time operating systems (rtos) concepts 3
Real time operating systems (rtos) concepts 3Real time operating systems (rtos) concepts 3
Real time operating systems (rtos) concepts 3
 
Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
Woo: Writing a fast web server
Woo: Writing a fast web serverWoo: Writing a fast web server
Woo: Writing a fast web server
 
Cutting the pipe
Cutting the pipeCutting the pipe
Cutting the pipe
 
Xen_and_Rails_deployment
Xen_and_Rails_deploymentXen_and_Rails_deployment
Xen_and_Rails_deployment
 

Ähnlich wie How to Keep Your Data Safe in MongoDB

MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...
MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...
MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...MongoDB
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Retaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingRetaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingScyllaDB
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity securityLen Bass
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Clusterpercona2013
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboolatsliwowicz
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99ashish61_scs
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptxAbcvDef
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanSakari Keskitalo
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP PerformanceBIOVIA
 
Doc 2011101412020074
Doc 2011101412020074Doc 2011101412020074
Doc 2011101412020074Rhythm Sun
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 

Ähnlich wie How to Keep Your Data Safe in MongoDB (20)

Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...
MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...
MongoDB World 2018: Active-Active Application Architectures: Become a MongoDB...
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Retaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate LimitingRetaining Goodput with Query Rate Limiting
Retaining Goodput with Query Rate Limiting
 
Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Migrating to XtraDB Cluster
Migrating to XtraDB ClusterMigrating to XtraDB Cluster
Migrating to XtraDB Cluster
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Spark war stories taboola
Spark war stories taboolaSpark war stories taboola
Spark war stories taboola
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
CQRS
CQRSCQRS
CQRS
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
10 replication
10 replication10 replication
10 replication
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Doc 2011101412020074
Doc 2011101412020074Doc 2011101412020074
Doc 2011101412020074
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

How to Keep Your Data Safe in MongoDB

  • 1. CTO, 10gen Eliot Horowitz #MongoDBDays How to keep your data safe in MongoDB
  • 2. What can go wrong? • Network breaks in transit • Server crashes while processing • Server blows up after processing a write before replication • Server processes, crashes, and then a conflicting write happens elsewhere • All copies burn in a fire • 20 years later, no one remembers how to read it
  • 3.
  • 4. What is data safety?
  • 5. Version 1 Probability that a given write or piece of data is accessible given human intervention and infinite time.
  • 6. Version 2 Probability that a given write or piece of data is visible in a query.
  • 7. Single Server – How a Write Works • Client sends a write operation to server • Received by server’s tcp stack • MongoDB process queues write • Write happens in memory • Depending on what Write Concern asks for – Respond immediately – Wait for data to be journaled, then respond
  • 8. Single Server – What can go wrong • Network can go down once message hits other side • Client doesn’t know what happens without going back and checking • Write could fail for logical reason (unique key exception) • Server could crash before journaled • Write is lost journaled • Server could crash after journaled • When server is recovered, write is replayed and safe • Hard drive can crash irrecoverably • Data center could lose power for large period of time
  • 9. Any single server will fail Replica Sets
  • 10. Replica Set - Reminders • N nodes • Each node has a fully copy of the data • Replication is asynchronous
  • 11. Replica Set - Acknowledgements • “w” : how many servers must apply write before acknowledged • w=2 : do not acknowledge until write is on two servers – If primary fails, election guaranteesnew primary has all writes acknowledge w=2 • w=majority : do not acknowledge until writes is on a majority of nodes in a replica set – If any primaryis elected automatically,all writes acknowledged with w=majoritywill be on primary.
  • 12. Good, but not enough… What if I lose an entire data center?
  • 13. Replica Set - tags • A node can have a set of tags – region=us-east – color=blue • Operator configures write level – Critical– has to be in 3 regions – Important – has to be in 2 regions • w=critical – Do not acknowledge write until its in 3 data centers – Losing an entire data center causes no data loss
  • 14. What about sharding? • Same rules apply • Given a series of writes, they may go to different shards – Aw=majority at the end means all writes on that socket are acknowledge by a majority of the relevant replica set • Config servers have no impact on fault tolerance/durability, only on admin uptime (or real uptime in a disaster)
  • 16. Personal Blog • Single server • No replication • Hourly backups • If server crashes – Down until back up – All acknowledge writes safe • If server is destroyed – Have to recover from backup – Lose up to 1 hour of writes
  • 17. Departmental App • Single replica set • 3 nodes in a single server • If any single node goes down – System is still readable/writeable – Writes done with w=2 are safe • If 2 nodes go down at the same time – Only writes with w=3 are safe (bad idea) – No primary, last node is read-only
  • 18. Core User Database • Single replica set • 3 data centers – Primary data center: 3 node (p=2) – 2 alternates with 2 nodes each (p=1) • Different types of operations – Password change (w=majority) – Adds a “like” (w=2) – Login count (w=1)
  • 19. Core User Database – cont’d • Lose any single server – Can only lose a login count • Lose any 2 servers – Could lose a “like” if you are unlucky • Lose a data center – Still have a majority – All password changes are safe
  • 20. Choice is a double edged sword
  • 21. When to give a choice? • Give choice over semantics – Developers and Operators know their needs • Tuning parameters are dangerous – System should be smart enough to avoid thousands of knobs • Defaults should be – Intuitive and sensible – Changing is hard – Always changing a little
  • 22. Semantics Knobs – too many?
  • 23. Already have them in different architecture components • Caching • Worker queues • Asynchronous replication • Synchronous replication • Two-phase commit
  • 24. MongoDB gives you the choice of durability semantics from many systems in one. • Control per write • One source of truth in architecture
  • 25. What should you do? • Pick a default write level for your app • Only deviate with good reason • Test disaster scenario so you know what’s going to happen

Hinweis der Redaktion

  1. The technical information in this talk is probably spread out across many different talks.The goal here is to put it in all in the same context.The key things are: - understanding how things technically work - understanding the choices you have - understanding the consequences of those decisions
  2. Lots of words: durability, journaling, replication, fault tolerance, distributed, commit
  3. DR, manual failover, restore from a backup, hexedit
  4. It doesn’t matter if its theoretically possible to get the data, if a user can’t see it, its worthless.
  5. In any db, no way to definitively know
  6. Take one step back and talk about philosophy of mongo.Kernel engineers (and users) often complain about the lack of tuningBut they also complain about too many choicesHave to balance.