SlideShare ist ein Scribd-Unternehmen logo
1 von 49
www.mongohq.com Scaling Checklist for MongoDB
Scaling Checklist for MongoDB
100GB & Beyond
www.mongohq.com Scaling Checklist for MongoDB
MongoHQ
www.mongohq.com | @mongohq
MongoHQ is a fully-managed platform used by
developers to deploy, host and scale open-source
databases.
Chris Winslett
chris@mongohq.com
I’ve spoken at a number of MongoDB conferences on
optimizing queries. I’ve been with MongoHQ for two
years – prior to that I built applications for the education
and technical sectors.
www.mongohq.com Scaling Checklist for MongoDB
TL;DR
• 100GB of data is relatively big data
• MongoDB has comparative advantages
• MongoDB has absolute constraints
• Know the MongoDB gauges
• Surpassing 100GB requires:
– Understanding absolute constraints.
– Knowledge of application’s data consumption
– Optimization of data consumption to comparative
advantages
www.mongohq.com Scaling Checklist for MongoDB
Audience Survey
What is your data size? Choose the biggest
bucket.
A. < 10GB
B. < 50GB
C. < 75GB
D. < 100GB
E. > 100 GB
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Identify your data behavior
1. Small v. Large – type of data
2. Fast v. Slow – behavior of data
3. Complex v. Simple – type of queries
4. Known v. Unknown – behavior of queries
5. Queuing v. Application data
This can happen at planning, staging, or production phase.
www.mongohq.com Scaling Checklist for MongoDB
Patterns of your Data
Small Large
Fast
Slow
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Modern applications have all patterns
Main
application
collections
Application
Metadata
Secondary
Application
Collections
Internal
metrics
Event logs
and event
data
Queues,
OLTP,
Messages
Rendered
in
background
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Where doesn’t MongoDB excel?
Main
application
collections
Application
Metadata
Secondary
Application
Collections
Internal
metrics
Event logs
and event
data
Queues,
OLTP,
Messages
Rendered
in
background
www.mongohq.com Scaling Checklist for MongoDB
4th dimension is time
Main
application
collections
Today’s Data
Last week’s data
Small Large
Fast
Slow
www.mongohq.com Scaling Checklist for MongoDB
Data-types to avoid with MongoDB
Main
application
collections
Application
Metadata
Secondary
Application
Collections
Internal
metrics
Event logs
and event
data
Queues,
OLTP,
Messages
Small Large
Fast
Slow
Rendered
in
background
www.mongohq.com Scaling Checklist for MongoDB
What type of queries do you have?
Unknown Known
Simple
Complex
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
Modern applications have all types of queries
Data
discovery
Application
search
Key
value
Single
Range
Query
User
generated
search
Internal
metrics
Multi-
Range
Query
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
Queries to Avoid with MongoDB
Data
discovery
Application
search
Key
value
Single
Range
Query
User
generated
search
Internal
metrics
Multi-
Range
Query
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
4th Dimension is Time
Real-time
core of
application
Today’s Data
Last week’s data
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
Queries and MongoDB
Elastic Search
SQL
Elastic Search
Unknown Known
Simple
Complex
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
MongoDB’s Technical Comparative
Advantage
• Expressive data structure allows simplification of
complex data relationships
• Create simple, known queries and return
expressive relationships
• On-the-fly addition of attributes / columns
• Total Cost of Ownership*
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
MongoDB Indexing Constraints
• Only one index can be used per query
• Only one range operator can be used per
index
• Range operator must be the last field on index
• Know how to use the right side of indexes
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
What does it mean to optimize?
Unknown Known
Simple
Complex
Scaling to 100GB
involves moving queries from
complex to simple and
unknown to known
Start
Finish
Start
www.mongohq.com Scaling Checklist for MongoDB
Example of simplifying a query.
Naïve Query:
db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})
Find the most recent messages for a person’s message stream.
Second attempt:
db.messages.find({participant_ids: <id>}).sort({_id: -1})
Best approach
db.users.find({_id: <id>})
www.mongohq.com Scaling Checklist for MongoDB
Naïve Query
{
_id: <id>,
message: “Wow, this pizza is good!”,
sender_id: <user_id>,
recipient_id: <user_id>
}
db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1})
Document
Query
www.mongohq.com Scaling Checklist for MongoDB
Second Attempt
Document
{
_id: <id>,
message: “Wow, this pizza is good!”,
sender_id: <sender_id>,
recipient_id: <recipient_id>,
participant_ids: [<sender_id>,<recipient_id>]
}
db.messages.find({participant_ids: <id>}).sort({_id: -1})
Query
www.mongohq.com Scaling Checklist for MongoDB
Best approach
Document
Hint: use the $push, $sort, $slice for the last 50
{
_id: <id>,
name: “Clarke Kent”,
recent_messages: [
<…50 denormalized messages…>
]
}
db.users.find({_id: <id>})
Query
www.mongohq.com Scaling Checklist for MongoDB
How did we optimize?
Unknown Known
Simple
Complex
We took a known, complex
query and made it simple.
Finish
Start
www.mongohq.com Scaling Checklist for MongoDB
Methods for Simplifying Queries
• Bucket values
• Create summary attributes
• Pre-compute values
• Use expressive documents structures
• Sort and filter at the application level
• Create summary documents
• Divide and measure (more on this later)
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Small Large
Fast
Slow
Remove “unrefactorable” data
Main
application
collections
Application
Metadata
Secondary
Application
Collections
Internal
metrics
Event logs
and event
data
Queues,
OLTP,
Messages
Rendered
in
background
Redis
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
Move up and right, or find another tool
Unknown Known
Simple
Complex
Data
discovery
Application
search
User
generated
search
Multi-
Range
Query
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Unknown Known
Simple
Complex
4th Dimension is Time
Real-time
core of
application
Today’s Data (fast)
Last week’s data (slower)
www.mongohq.com Scaling Checklist for MongoDB
Separate Data with Cross Purposes
• If this today’s data must be fast, and last
week’s data can be slow:
– Rollout today’s data using TTL collections
– Use another database for last weeks data
– Use high-RAM ratio and SSD backed machines for
this today’s data
– Use cheaper hardware for last week’s data
www.mongohq.com Scaling Checklist for MongoDB
MongoDB Doesn’t have Joins
Data doesn’t have to be adjacent.
Divide, measure, conquer.
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Stop Use `mongodump`
`mongodump` is long running tablescan that
exports all documents. This disrupts RAM and
causes performance issues.
Self-hosting: use the MongoDB MMS and
Backup
As-a-service: ask your vendor about backup
alternatives
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Configure MMS Now!
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Avoid Page Faults like the Plague
0
1000
2000
3000
4000
5000
6000
7000
8000
50% Table Scans 1% Table Scans 0% Table Scans
MongoDB Operations / Second
www.mongohq.com Scaling Checklist for MongoDB
MongoDB
What type of queries cause page faults?
Unknown Known
Simple
Complex
Data
discovery
Application
search
User
generated
search
Multi-
Range
Query
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Track & Remove Slow Queries
• system.profile collection – link
• MongoDB professor – link
• Dex – link
• MongoHQ Slow Query Tracker and Profiler -
link
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Buying time with hardware has a
limited life
• Don’t get addicted to buying more hardware.
• Before any purchasing decision, always
– consider optimization
– investigate separating, paring data
www.mongohq.com Scaling Checklist for MongoDB
100 GB Checklist
1. Identify your data behavior
2. Use MongoDB for comparative advantages
3. Know the MongoDB indexing constraints
4. Refactor schema to simplify queries
5. Remove data that does not fit MongoDB
6. Separate hot and cold data
7. Stop using `mongodump`
8. Check your gauges
9. Avoid queries causing page faults
10. Track and monitor slow queries
11. Buying time with hardware
www.mongohq.com Scaling Checklist for MongoDB
Thank you!
For any questions:
chris@mongohq.com
www.mongohq.com
@mongohq

Weitere ähnliche Inhalte

Was ist angesagt?

MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
MongoDB
 
Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack
MongoDB
 

Was ist angesagt? (15)

Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
 
Webinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDBWebinar: Live Data Visualisation with Tableau and MongoDB
Webinar: Live Data Visualisation with Tableau and MongoDB
 
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
How Thermo Fisher Is Reducing Mass Spectrometry Experiment Times from Days to...
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
MongoDB.local Seattle 2019: Building Your First MongoDB App Using Atlas & Stitch
MongoDB.local Seattle 2019: Building Your First MongoDB App Using Atlas & StitchMongoDB.local Seattle 2019: Building Your First MongoDB App Using Atlas & Stitch
MongoDB.local Seattle 2019: Building Your First MongoDB App Using Atlas & Stitch
 
eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas
 
Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack Getting Started with MongoDB Using the Microsoft Stack
Getting Started with MongoDB Using the Microsoft Stack
 
Web Scraping With Python
Web Scraping With PythonWeb Scraping With Python
Web Scraping With Python
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
 
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
 

Andere mochten auch

Andere mochten auch (6)

Scaling MongoDB in the cloud with Microsoft Azure
Scaling MongoDB in the cloud with Microsoft AzureScaling MongoDB in the cloud with Microsoft Azure
Scaling MongoDB in the cloud with Microsoft Azure
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
different kinds of checklist ppt
different kinds of checklist pptdifferent kinds of checklist ppt
different kinds of checklist ppt
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 

Ähnlich wie Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
MongoDB
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
MongoDB
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
MongoSF
 

Ähnlich wie Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond (20)

how_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptxhow_can_businesses_address_storage_issues_using_mongodb.pptx
how_can_businesses_address_storage_issues_using_mongodb.pptx
 
how_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdfhow_can_businesses_address_storage_issues_using_mongodb.pdf
how_can_businesses_address_storage_issues_using_mongodb.pdf
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
 
3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!3 scenarios when to use MongoDB!
3 scenarios when to use MongoDB!
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
 
Building A Relevancy Engine Using MongoDB and Go
Building A Relevancy Engine Using MongoDB and GoBuilding A Relevancy Engine Using MongoDB and Go
Building A Relevancy Engine Using MongoDB and Go
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Webinar: When to Use MongoDB
Webinar: When to Use MongoDBWebinar: When to Use MongoDB
Webinar: When to Use MongoDB
 
Webinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDBWebinar: How to Drive Business Value in Financial Services with MongoDB
Webinar: How to Drive Business Value in Financial Services with MongoDB
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
Mongodb
MongodbMongodb
Mongodb
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2Webminar - Novedades de MongoDB 3.2
Webminar - Novedades de MongoDB 3.2
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
Mongo db operations_v2
Mongo db operations_v2Mongo db operations_v2
Mongo db operations_v2
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

  • 1. www.mongohq.com Scaling Checklist for MongoDB Scaling Checklist for MongoDB 100GB & Beyond
  • 2. www.mongohq.com Scaling Checklist for MongoDB MongoHQ www.mongohq.com | @mongohq MongoHQ is a fully-managed platform used by developers to deploy, host and scale open-source databases. Chris Winslett chris@mongohq.com I’ve spoken at a number of MongoDB conferences on optimizing queries. I’ve been with MongoHQ for two years – prior to that I built applications for the education and technical sectors.
  • 3. www.mongohq.com Scaling Checklist for MongoDB TL;DR • 100GB of data is relatively big data • MongoDB has comparative advantages • MongoDB has absolute constraints • Know the MongoDB gauges • Surpassing 100GB requires: – Understanding absolute constraints. – Knowledge of application’s data consumption – Optimization of data consumption to comparative advantages
  • 4. www.mongohq.com Scaling Checklist for MongoDB Audience Survey What is your data size? Choose the biggest bucket. A. < 10GB B. < 50GB C. < 75GB D. < 100GB E. > 100 GB
  • 5. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 6. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 7. www.mongohq.com Scaling Checklist for MongoDB Identify your data behavior 1. Small v. Large – type of data 2. Fast v. Slow – behavior of data 3. Complex v. Simple – type of queries 4. Known v. Unknown – behavior of queries 5. Queuing v. Application data This can happen at planning, staging, or production phase.
  • 8. www.mongohq.com Scaling Checklist for MongoDB Patterns of your Data Small Large Fast Slow
  • 9. www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Modern applications have all patterns Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background
  • 10. www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Where doesn’t MongoDB excel? Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background
  • 11. www.mongohq.com Scaling Checklist for MongoDB 4th dimension is time Main application collections Today’s Data Last week’s data Small Large Fast Slow
  • 12. www.mongohq.com Scaling Checklist for MongoDB Data-types to avoid with MongoDB Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Small Large Fast Slow Rendered in background
  • 13. www.mongohq.com Scaling Checklist for MongoDB What type of queries do you have? Unknown Known Simple Complex
  • 14. www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex Modern applications have all types of queries Data discovery Application search Key value Single Range Query User generated search Internal metrics Multi- Range Query
  • 15. www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex Queries to Avoid with MongoDB Data discovery Application search Key value Single Range Query User generated search Internal metrics Multi- Range Query
  • 16. www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex 4th Dimension is Time Real-time core of application Today’s Data Last week’s data
  • 17. www.mongohq.com Scaling Checklist for MongoDB MongoDB Queries and MongoDB Elastic Search SQL Elastic Search Unknown Known Simple Complex
  • 18. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 19. www.mongohq.com Scaling Checklist for MongoDB MongoDB’s Technical Comparative Advantage • Expressive data structure allows simplification of complex data relationships • Create simple, known queries and return expressive relationships • On-the-fly addition of attributes / columns • Total Cost of Ownership*
  • 20. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 21. www.mongohq.com Scaling Checklist for MongoDB MongoDB Indexing Constraints • Only one index can be used per query • Only one range operator can be used per index • Range operator must be the last field on index • Know how to use the right side of indexes
  • 22. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 23. www.mongohq.com Scaling Checklist for MongoDB What does it mean to optimize? Unknown Known Simple Complex Scaling to 100GB involves moving queries from complex to simple and unknown to known Start Finish Start
  • 24. www.mongohq.com Scaling Checklist for MongoDB Example of simplifying a query. Naïve Query: db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1}) Find the most recent messages for a person’s message stream. Second attempt: db.messages.find({participant_ids: <id>}).sort({_id: -1}) Best approach db.users.find({_id: <id>})
  • 25. www.mongohq.com Scaling Checklist for MongoDB Naïve Query { _id: <id>, message: “Wow, this pizza is good!”, sender_id: <user_id>, recipient_id: <user_id> } db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1}) Document Query
  • 26. www.mongohq.com Scaling Checklist for MongoDB Second Attempt Document { _id: <id>, message: “Wow, this pizza is good!”, sender_id: <sender_id>, recipient_id: <recipient_id>, participant_ids: [<sender_id>,<recipient_id>] } db.messages.find({participant_ids: <id>}).sort({_id: -1}) Query
  • 27. www.mongohq.com Scaling Checklist for MongoDB Best approach Document Hint: use the $push, $sort, $slice for the last 50 { _id: <id>, name: “Clarke Kent”, recent_messages: [ <…50 denormalized messages…> ] } db.users.find({_id: <id>}) Query
  • 28. www.mongohq.com Scaling Checklist for MongoDB How did we optimize? Unknown Known Simple Complex We took a known, complex query and made it simple. Finish Start
  • 29. www.mongohq.com Scaling Checklist for MongoDB Methods for Simplifying Queries • Bucket values • Create summary attributes • Pre-compute values • Use expressive documents structures • Sort and filter at the application level • Create summary documents • Divide and measure (more on this later)
  • 30. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 31. www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Remove “unrefactorable” data Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background Redis
  • 32. www.mongohq.com Scaling Checklist for MongoDB MongoDB Move up and right, or find another tool Unknown Known Simple Complex Data discovery Application search User generated search Multi- Range Query
  • 33. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 34. www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex 4th Dimension is Time Real-time core of application Today’s Data (fast) Last week’s data (slower)
  • 35. www.mongohq.com Scaling Checklist for MongoDB Separate Data with Cross Purposes • If this today’s data must be fast, and last week’s data can be slow: – Rollout today’s data using TTL collections – Use another database for last weeks data – Use high-RAM ratio and SSD backed machines for this today’s data – Use cheaper hardware for last week’s data
  • 36. www.mongohq.com Scaling Checklist for MongoDB MongoDB Doesn’t have Joins Data doesn’t have to be adjacent. Divide, measure, conquer.
  • 37. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 38. www.mongohq.com Scaling Checklist for MongoDB Stop Use `mongodump` `mongodump` is long running tablescan that exports all documents. This disrupts RAM and causes performance issues. Self-hosting: use the MongoDB MMS and Backup As-a-service: ask your vendor about backup alternatives
  • 39. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 40. www.mongohq.com Scaling Checklist for MongoDB Configure MMS Now!
  • 41. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 42. www.mongohq.com Scaling Checklist for MongoDB Avoid Page Faults like the Plague 0 1000 2000 3000 4000 5000 6000 7000 8000 50% Table Scans 1% Table Scans 0% Table Scans MongoDB Operations / Second
  • 43. www.mongohq.com Scaling Checklist for MongoDB MongoDB What type of queries cause page faults? Unknown Known Simple Complex Data discovery Application search User generated search Multi- Range Query
  • 44. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 45. www.mongohq.com Scaling Checklist for MongoDB Track & Remove Slow Queries • system.profile collection – link • MongoDB professor – link • Dex – link • MongoHQ Slow Query Tracker and Profiler - link
  • 46. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 47. www.mongohq.com Scaling Checklist for MongoDB Buying time with hardware has a limited life • Don’t get addicted to buying more hardware. • Before any purchasing decision, always – consider optimization – investigate separating, paring data
  • 48. www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • 49. www.mongohq.com Scaling Checklist for MongoDB Thank you! For any questions: chris@mongohq.com www.mongohq.com @mongohq

Hinweis der Redaktion

  1. 4 years of experienceRun 50,000 total MongoDB databasesRun multi-terrabytesharded environmentsWe have a philosophy of optimize, then shardOur real enjoyment is seeing a company grow, use our platform, and find value with our platform
  2. If your company is creating data, and you and your customers have created 100GB of data, that is fast growing business.In some cases, 10GB is a good, growing business.Unless you are digesting a Twitter’s firehose or scraping the web consuming someone else’s data, there is something special about 100GB of data10GB and growing is a good as-a-service business on a good clipAs you approach 50GB of data, your next hurdle is 100GBWe are thinking about building applications that are planning for 100GB of data
  3. Quick survey to see what type of audience we have. If you would, just respond manually with your dataSize, or with your letter to the chat. We will summate those values in a moment.
  4. Let’s get started looking at the checkilst.Reasonable numberFirst 3 build the case for how you should think about MongoDB, please bare with me through these sections, I am framing the discussion. 1 – 3 will be longer, but will lend a theme to latter examples4 – 6 are a set of techniques for improving performance7 – 10 are some commandments.11 leaves you with some words of wisdom
  5. Let’s start off by identifying your data’s behavior.
  6. I’ve put together two different axis for us to look at: data and queries.Data type : small v. largeData type: fast v. slowQuery type: complex v. simpleQuery type: known v. unknownBefore engineers checkout from this talk, this exercise is quick, easy and important for mapping your data usage. It will help you understand your different types of data that compose your application. It will also help when searching for the best tools for the job.I am proposing two sets of axis here, but I am sure there are more. After discussing with customers, these two charts are a good starting point, and offer a good way to think of data growth.
  7. Here is our first access. Fast and slow on the y axis Small and large on the x axisMy question for you is: what type of data do you have?Do you have fast and large data?Do you have fast and small data?Slow and large data?What type of data do you have?Increasingly we find
  8. Modern applications have all data types
  9. Data’s characteristics are not static.Overtime, your data type is changing. In the chart, we are showing aging data move from fast and small to larger and slower.When discussing the same set of data, we have to discuss the assumption of time. Two engineers can talk about the same collection of data. One engineer is thinking of last week’s data. Another engineer is thinking of this week’s data, and they are arguing different use cases.
  10. Green is good.MongoDB excels at use cases with many types of data except queues, messaging, and OLTP. If you have small / fast data, MongoDB is not the tool for that. Notice, I don’t go to the end of fast, large, or small axis. I recognize there are edge cases past the capabilities of MongoDB performance.
  11. Our next axis we are introducing is query type.Previously, we had discussed data type. Previously, we answered “How does your data behave?”Now, we are answering “How does you retrieve your data?” These axis are not as intuitive as our prior axises.Simple query: single valueComplex query: multiple values, multiple conditions, multiple rangesKnown query: you precisely know the arguments you are queryingUnknown query: you do know yet know the arguments you are querying
  12. As with earlier, modern applications have all types of data.The key positions on this spectrum are upper right, “Simple and known” and lower left, “Complex and unknown”.Simple and known represent many modern NoSQL database’s – key value storesUnknown and complex is what I term “data discovery” – the data has not turned into information, and I want to go through the process of turning raw data into actionable information. This is “data discovery”. It typically represents very complex queries, and represents an unknown end state of the queriesOn the rest of the spectrum, we have internal metrics, which are often every structured datasets, and single range queries. Across from each other, we have application search and user search. Applications’ typically have a search mechanism.
  13. Which queries are off limits in MongoDB – data discovery.
  14. As with data types, queries required of data also change over time. Today’s data is suitable for quick fast application usage.Last week’s data requires analysis – data discoveryThese are important notions when working with increasingly larger data – we recognize that all data created similarly will have different requirements during its life. Recognize these nuances, and adapt.
  15. MongoDB dominates the “simple / known” quadrant. We
  16. If you’ve usedMongoDB, JSON-like dataExpressive documents on complex relationshipsWith creativity you can create simple, known queries and these complex relationshipsOn-the-fly addition of attributes / columns
  17. MongoDB use’s btree indexes.The indexing constraints are:absolutesimpleList the indexing constraintsNo intersections of indexesOperators are $or, $and, $sort, $nin, $in, $ne, $gte, $lteAny violation of these constraints will lead to table scans
  18. Hopefully, with 1 – 3, I’ve built the case for simplicity. And, now we will answer the question: “what does it mean to optimize with for 100 GB of data?”
  19. Scaling your query and database interactions will move you toward the simple and known quadrant. Approaching an interaction that is similar to a key-value store.
  20. Imagine a messaging system that captures messages between two parties.For any person, you want to find the most recent messages for a personNaïve example – use the $or operator Of course, we learned that $sort does not work with $or, this will cause table scansSecond attempt, query on participantsBest approach, use NoSQL for what it does best
  21. Here is a view of aging data.How does data become “cold”?How does your data become “hot”? What data needs to be fast?What data can be slow?
  22. Keep fast data fast, and keep cold data separateOver time, we’ve stated our data becomes:LargerSlowerRequires complex queriesFilters on unknown conditionsKeep that data separate from today’s data.
  23. Backups are important – don’t use `mongodump` to do it (unless you have a 3rd member you want to run them against)