SlideShare ist ein Scribd-Unternehmen logo
1 von 58
How to Achieve Scale with 
MongoDB 
Jake Angerman 
Sr. Solutions Architect, MongoDB
Today’s Webinar Agenda 
Schema Design 
Indexes 
Monitoring your Workload 
Achieve Scale 
Optimization Tips 
Scale Vertically 
Horizontal 
Scaling 
1 
2 
3
Optimization Tips to 
Scale Your App
Premature Optimization 
• There is no doubt that the grail of efficiency leads to abuse. 
Programmers waste enormous amounts of time thinking about, 
or worrying about, the speed of noncritical parts of their 
programs, and these attempts at efficiency actually have a strong 
negative impact when debugging and maintenance are 
considered. We should forget about small efficiencies, say about 
97% of the time: premature optimization is the root of all evil. 
Yet we should not pass up our opportunities in that critical 3%. 
- Donald Knuth, 1974
Premature Optimization 
• "There is no doubt that the grail of efficiency leads to abuse. 
Programmers waste enormous amounts of time thinking about, 
or worrying about, the speed of noncritical parts of their 
programs, and these attempts at efficiency actually have a strong 
negative impact when debugging and maintenance are 
considered. We should forget about small efficiencies, say about 
97% of the time: premature optimization is the root of all evil. 
Yet we should not pass up our opportunities in that critical 3%." 
- Donald Knuth, 1974
Premature Optimization 
• "There is no doubt that the grail of efficiency leads to abuse. 
Programmers waste enormous amounts of time thinking about, 
or worrying about, the speed of noncritical parts of their 
programs, and these attempts at efficiency actually have a strong 
negative impact when debugging and maintenance are 
considered. We should forget about small efficiencies, say 
about 97% of the time: premature optimization is the root of 
all evil. Yet we should not pass up our opportunities in that 
critical 3%." 
- Donald Knuth, 1974
Schema Design 
• Document Model 
• Dynamic Schema 
• Collections 
{ "customer_id" : 123, 
"first_name" : ”John", 
"last_name" : "Smith", 
"address" : { 
"street": "123 Main Street", 
"city": "Houston", 
"state": "TX", 
"zip_code": "77027" 
} 
policies: [ { 
policy_number : 13, 
description: “short term”, 
deductible: 500 
}, 
{ policy_number : 14, 
description: “dental”, 
visits: […] 
} ] 
}
The Importance of Schema Design 
• MongoDB schemas are built oppositely than relational 
schemas! 
• Relational Schema: 
– normalize data 
– write complex queries to join the data 
– let the query planner figure out how to make queries efficient 
• MongoDB Schema: 
– denormalize the data 
– create a (potentially complex) schema with prior knowledge of your 
actual (not just predicted) query patterns 
– write simple queries
Real World Example: Optimizing Schema for 
Scale 
Product catalog schema for retailer selling in 20 countries 
{ 
_id: 375, 
en_US: { name: …, description: …, <etc…> }, 
en_GB: { name: …, description: …, <etc…> }, 
fr_FR: { name: …, description: …, <etc…> }, 
fr_CA: { name: …, description: …, <etc…> }, 
de_DE: …, 
de_CH: …, 
<… and so on for other locales …> 
}
What's good about this schema? 
• Each document contains all the data about the 
product across all possible locales. 
• It is the most efficient way to retrieve all translations of 
a product in a single query (English, French, German, 
etc).
But that's not how the data was accessed 
db.catalog.find( { _id: 375 }, { en_US: true } ); 
db.catalog.find( { _id: 375 }, { fr_FR: true } ); 
db.catalog.find( { _id: 375 }, { de_DE: true } ); 
… and so forth for other locales 
The data model did not fit the access pattern.
Why is this inefficient? 
Data in RED are 
being used. Data in 
BLUE take up 
memory but are not in 
demand. 
{ 
_id: 375, 
en_US: { name: …, description: …, <etc…> }, 
en_GB: { name: …, description: …, <etc…> }, 
fr_FR: { name: …, description: …, <etc…> }, 
fr_CA: { name: …, description: …, <etc…> }, 
de_DE: …, 
de_CH: …, 
<… and so on for other locales …> 
} 
{ 
_id: 42, 
en_US: { name: …, description: …, <etc…> }, 
en_GB: { name: …, description: …, <etc…> }, 
fr_FR: { name: …, description: …, <etc…> }, 
fr_CA: { name: …, description: …, <etc…> }, 
de_DE: …, 
de_CH: …, 
<… and so on for other locales …> 
}
Consequences of the schema 
• Each document contained 20x more data than the 
common use case requires 
• Disk IO was too high for the relatively modest query 
load on the dataset 
• MongoDB lets you request a subset of a document's 
contents via projection… 
• … but the entire document must be loaded into RAM 
to service the request
Consequences of the schema redesign 
• Queries induced minimal memory overhead 
• 20x as many distinct products fit in RAM at once 
• Disk IO utilization reduced 
• Application latency reduced 
{ 
_id: "375-en_GB", 
name: …, 
description: …, 
<… the rest of the document …> 
}
Schema Design Patterns 
• Pattern: pre-computing interesting quantities, ideally with each 
write operation 
• Pattern: putting unrelated items in different collections to take 
advantage of indexing 
• Anti-pattern: appending to arrays ad infinitum 
• Anti-pattern: importing relational schemas directly into 
MongoDB
Schema Design Tips 
• Avoid inherently slow operations 
– Updates of unindexed arrays of several thousand elements 
– Updates of indexed arrays of several hundred elements 
– Document moves 
• Arrays are great, but know how to use them
Schema Design resources 
• Blog series, "6 rules of thumb" 
– Part 1: http://goo.gl/TFJ3dr 
– Part 2: http://goo.gl/qTdGhP 
– Part 3: http://goo.gl/JFO1pI
Indexing 
• Indexes are tree-structured sets of references to your 
documents 
• Indexes are the single biggest tunable performance factor in 
the database 
• Indexing and schema design go hand in hand
Indexing Mistakes 
• Failing to build necessary indexes 
• Building unnecessary indexes 
• Running ad-hoc queries in production
Indexing Fixes 
• Failing to build necessary indexes 
– Run .explain(), examine slow query log, mtools, system.profile 
collection 
• Building unnecessary indexes 
– Talk to your application developers about usage 
• Running ad-hoc queries in production 
– Use a staging environment, use secondaries
mongod log files 
Sun Jun 29 06:35:37.646 [conn2] query 
test.docs query: { parent.company: 
"22794", parent.employeeId: "83881" } 
ntoreturn:1 ntoskip:0 nscanned:806381 
keyUpdates:0 numYields: 5 
locks(micros) r:2145254 nreturned:0 
reslen:20 1156ms
mongod log files 
date and time thread operation 
Sun Jun 29 06:35:37.646 [conn2] query 
test.docs query: { parent.company: 
"22794", parent.employeeId: "83881" } 
ntoreturn:1 ntoskip:0 nscanned:806381 
keyUpdates:0 numYields: 5 
locks(micros) r:2145254 nreturned:0 
reslen:20 1156ms 
n… 
counters 
lock 
times 
duration 
number 
of yields
You need a tool when doing log file analysis
mtools 
• http://github.com/rueckstiess/mtools 
• log file analysis for poorly performing queries 
– Show me queries that took more than 1000 ms from 6 am to 6 pm: 
– mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 
1000 > mongodb-filtered.log
Graphing with mtools 
% mplotqueries --type histogram --group namespace --bucketSize 3600
Real World Example: Indexing for Scale 
Sun Jun 29 06:35:37.646 [conn2] query 
test.docs query: { parent.company: 
"22794", parent.employeeId: "83881" } 
ntoreturn:1 ntoskip:0 nscanned:806381 
keyUpdates:0 numYields: 5 
locks(micros) r:2145254 nreturned:0 
reslen:20 1156ms
Document schema 
{ 
_id: ObjectId("53b9ab7e939f1e229b4f574c"), 
firstName: "Alice", 
lastName: "Smith", 
parent: { 
company: 22794, 
employeeId: 83881 
} 
}
But there's an index!?! 
db.system.indexes.find().toArray() 
[{ 
"v" : 1, 
"key" : { 
"company" : 1, 
"employeeId" : 1 
}, 
"ns" : "test.docs", 
"name" : "company_1_employeeId_1" 
}]
But there's an index!?! 
db.system.indexes.find().toArray() 
[{ 
"v" : 1, 
"key" : { 
"company" : 1, 
"employeeId" : 1 
}, 
"ns" : "test.docs", 
"name" : "company_1_employeeId_1" 
}] 
This isn't 
the index 
you're 
looking for.
Did you see the problem? 
{ 
_id: ObjectId("53b9ab7e939f1e229b4f574c"), 
firstName: "Alice", 
lastName: "Smith", 
parent: { 
company: 22794, 
employeeId: 83881 
} 
}
The index was created incorrectly 
db.system.indexes.find().toArray() 
[{ 
"v" : 1, 
"key" : { 
"parent.company" : 1, 
"parent.employeeId" : 1 
}, 
"ns" : "test.docs", 
"name" : 
"parent.company_1_parent.employeeId_1" 
}] 
Subdocument 
needed
Indexing Strategies 
• Create indexes that support your queries! 
• Create highly selective indexes 
• Eliminate duplicate indexes with a compound index, if possible 
– db.collection.ensureIndex({A:1, B:1, C:1}) 
– allows queries using leftmost prefix 
• Order compound index fields thusly: equality, sort, then range 
– see http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ 
• Create indexes that support covered queries 
• Prevent collection scans in pre-production environments 
– mongod --notablescan 
– db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )
Monitoring Your Workload 
• Log files, iostat, mtools, mongotop are for debugging 
• MongoDB Management Service (MMS) can do metrics 
collection and reporting
What can MMS do?
Database Metrics
Hardware statistics (CPU, disk)
MMS Monitoring Setup
Cloud Version of MMS 
1. Go to http://mms.mongodb.com 
2. Create an account 
3. Install one agent in your datacenter 
4. Add hosts from the web interface 
5. Enjoy!
Today’s Webinar Agenda 
Hardware Considerations 
Achieve Scale 
1 Optimization Tips 
Scale Vertically 
Horizontal 
Scaling 
2 
3
Vertical Scaling 
Factors: 
– RAM 
– Disk 
– CPU 
– Network 
Replica Set 
Primary 
Secondary 
Secondary 
Replica Set 
Primary 
Secondary 
Secondary 
Horizontal Scaling
Working Set Exceeds Physical 
Memory
RAM - Measure your working set and index 
sizes 
• db.serverStatus({workingSet:1}).workingSet 
{ "computationTimeMicros": 2751, 
"note": "thisIsAnEstimate", 
"overSeconds": 1084, 
"pagesInMemory": 2041 
} 
• db.stats().indexSize 
2032880640 
• In this example, 
(2041 * 4096) + 2032880640 = 2041240576 bytes 
= 1.9 GB 
• Note: this is a subset of the virtual memory used by mongod
Real World Example: Vertical Scaling 
• System that tracked status information for entities in the 
business 
• State changes happen in batches; sometimes 10% of entities 
get updated, sometimes 100% get updated
Initial Architecture 
Sharded cluster with 4 shards using spinning disks 
Application / mongos 
mongod
Adding shards to scale horizontally 
• Application was a success! Business entities grew by a factor of 
5 
• Cluster capacity multiplied by 5, but so did the TCO 
Application / mongos 
mongod 
…16 more shards…
More success means more shards 
• 10x growth means … 200 shards 
• Horizontal scaling with sharding is linear scaling, but an order 
of magnitude was needed 
• Bulk updates of random documents approaches speed of 
disks
Final architecture 
• Scaling the random IOPS with SSDs was a vertical scaling 
approach 
Application / mongos 
mongod SSD
Before you add hardware… 
• Make sure you are solving the right scaling problem 
• Remedy schema and index problems first 
– schema and index problems can look like hardware problems 
• Tune the Operating System 
– ulimits, swap, NUMA, NOOP scheduler with hypervisors 
• Tune the IO subsystem 
– ext4 or XFS vs SAN, RAID10, readahead, noatime 
• See MongoDB "production notes" page 
• Heed logfile startup warnings
Today’s Webinar Agenda 
Achieve Scale 
1 Optimization Tips 
2 Scale Vertically 
The Horizontal Basics of Sharding 
Scaling 
3
The basics of 
Horizontal Scaling
The basics of 
Horizontal Scaling 
(aka Sharding)
The Basics of Sharding
Rule of Thumb 
To make good decisions about 
MongoDB implementations, you 
must understand MongoDB and your 
applications and the workload your 
applications generate and your 
business requirements.
Summary 
• Don't throw hardware at the problem until you examine all 
other possibilities (schema, indexes, OS, IO subsystem) 
• Know what is considered "normal" performance by monitoring 
• Horizontal scaling in MongoDB is implemented with sharding, 
but you must understand schema design and indexing before 
you shard 
Sharding a sub-optimally designed 
database will not make it performant
Today’s Webinar Agenda 
Achieve Scale 
1 Optimization Tips 
The Horizontal Basics of Sharding 
Scaling 
3 
Schema Design 
Indexes 
Monitoring your Workload 
2 Scale Vertically
Limited Time: Get Expert Advice for Free 
If you’re thinking about 
scaling, why reinvent the 
wheel? 
Our experts can collaborate 
with you to provide detailed 
guidance. 
Sign Up For a Free One Hour 
Consult: 
http://bit.ly/1rkXcfN
Questions? 
Stay tuned after the webinar and take our survey 
for your chance to win MongoDB schwag.
Thank You 
Jake Angerman 
Sr. Solutions Architect, MongoDB

Weitere ähnliche Inhalte

Was ist angesagt?

Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 

Was ist angesagt? (20)

A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Hexagonal Architecture.pdf
Hexagonal Architecture.pdfHexagonal Architecture.pdf
Hexagonal Architecture.pdf
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
Impacts of Sharding, Partitioning, Encoding, and Sorting on Distributed Query...
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 

Andere mochten auch

Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
nathanmarz
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
Daum DNA
 

Andere mochten auch (12)

Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
 
The Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDBThe Right (and Wrong) Use Cases for MongoDB
The Right (and Wrong) Use Cases for MongoDB
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 

Ähnlich wie How to Achieve Scale with MongoDB

Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
MongoDB APAC
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
bostonrb
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
MongoDB
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
MongoDB
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 

Ähnlich wie How to Achieve Scale with MongoDB (20)

Webinar: Performance Tuning + Optimization
Webinar: Performance Tuning + OptimizationWebinar: Performance Tuning + Optimization
Webinar: Performance Tuning + Optimization
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Boosting the Performance of your Rails Apps
Boosting the Performance of your Rails AppsBoosting the Performance of your Rails Apps
Boosting the Performance of your Rails Apps
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 20197 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
7 Database Mistakes YOU Are Making -- Linuxfest Northwest 2019
 
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
Workshop on Advanced Design Patterns for Amazon DynamoDB - DAT405 - re:Invent...
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Tales from the Field
Tales from the FieldTales from the Field
Tales from the Field
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
Data Modeling, Normalization, and De-Normalization | PostgresOpen 2019 | Dimi...
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
MongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL DatabaseMongoDB - An Agile NoSQL Database
MongoDB - An Agile NoSQL Database
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

How to Achieve Scale with MongoDB

  • 1. How to Achieve Scale with MongoDB Jake Angerman Sr. Solutions Architect, MongoDB
  • 2. Today’s Webinar Agenda Schema Design Indexes Monitoring your Workload Achieve Scale Optimization Tips Scale Vertically Horizontal Scaling 1 2 3
  • 3. Optimization Tips to Scale Your App
  • 4. Premature Optimization • There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. - Donald Knuth, 1974
  • 5. Premature Optimization • "There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth, 1974
  • 6. Premature Optimization • "There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth, 1974
  • 7. Schema Design • Document Model • Dynamic Schema • Collections { "customer_id" : 123, "first_name" : ”John", "last_name" : "Smith", "address" : { "street": "123 Main Street", "city": "Houston", "state": "TX", "zip_code": "77027" } policies: [ { policy_number : 13, description: “short term”, deductible: 500 }, { policy_number : 14, description: “dental”, visits: […] } ] }
  • 8. The Importance of Schema Design • MongoDB schemas are built oppositely than relational schemas! • Relational Schema: – normalize data – write complex queries to join the data – let the query planner figure out how to make queries efficient • MongoDB Schema: – denormalize the data – create a (potentially complex) schema with prior knowledge of your actual (not just predicted) query patterns – write simple queries
  • 9. Real World Example: Optimizing Schema for Scale Product catalog schema for retailer selling in 20 countries { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, de_CH: …, <… and so on for other locales …> }
  • 10. What's good about this schema? • Each document contains all the data about the product across all possible locales. • It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc).
  • 11. But that's not how the data was accessed db.catalog.find( { _id: 375 }, { en_US: true } ); db.catalog.find( { _id: 375 }, { fr_FR: true } ); db.catalog.find( { _id: 375 }, { de_DE: true } ); … and so forth for other locales The data model did not fit the access pattern.
  • 12. Why is this inefficient? Data in RED are being used. Data in BLUE take up memory but are not in demand. { _id: 375, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, de_CH: …, <… and so on for other locales …> } { _id: 42, en_US: { name: …, description: …, <etc…> }, en_GB: { name: …, description: …, <etc…> }, fr_FR: { name: …, description: …, <etc…> }, fr_CA: { name: …, description: …, <etc…> }, de_DE: …, de_CH: …, <… and so on for other locales …> }
  • 13. Consequences of the schema • Each document contained 20x more data than the common use case requires • Disk IO was too high for the relatively modest query load on the dataset • MongoDB lets you request a subset of a document's contents via projection… • … but the entire document must be loaded into RAM to service the request
  • 14. Consequences of the schema redesign • Queries induced minimal memory overhead • 20x as many distinct products fit in RAM at once • Disk IO utilization reduced • Application latency reduced { _id: "375-en_GB", name: …, description: …, <… the rest of the document …> }
  • 15. Schema Design Patterns • Pattern: pre-computing interesting quantities, ideally with each write operation • Pattern: putting unrelated items in different collections to take advantage of indexing • Anti-pattern: appending to arrays ad infinitum • Anti-pattern: importing relational schemas directly into MongoDB
  • 16. Schema Design Tips • Avoid inherently slow operations – Updates of unindexed arrays of several thousand elements – Updates of indexed arrays of several hundred elements – Document moves • Arrays are great, but know how to use them
  • 17. Schema Design resources • Blog series, "6 rules of thumb" – Part 1: http://goo.gl/TFJ3dr – Part 2: http://goo.gl/qTdGhP – Part 3: http://goo.gl/JFO1pI
  • 18. Indexing • Indexes are tree-structured sets of references to your documents • Indexes are the single biggest tunable performance factor in the database • Indexing and schema design go hand in hand
  • 19. Indexing Mistakes • Failing to build necessary indexes • Building unnecessary indexes • Running ad-hoc queries in production
  • 20. Indexing Fixes • Failing to build necessary indexes – Run .explain(), examine slow query log, mtools, system.profile collection • Building unnecessary indexes – Talk to your application developers about usage • Running ad-hoc queries in production – Use a staging environment, use secondaries
  • 21. mongod log files Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
  • 22. mongod log files date and time thread operation Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms n… counters lock times duration number of yields
  • 23. You need a tool when doing log file analysis
  • 24. mtools • http://github.com/rueckstiess/mtools • log file analysis for poorly performing queries – Show me queries that took more than 1000 ms from 6 am to 6 pm: – mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log
  • 25. Graphing with mtools % mplotqueries --type histogram --group namespace --bucketSize 3600
  • 26. Real World Example: Indexing for Scale Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms
  • 27. Document schema { _id: ObjectId("53b9ab7e939f1e229b4f574c"), firstName: "Alice", lastName: "Smith", parent: { company: 22794, employeeId: 83881 } }
  • 28. But there's an index!?! db.system.indexes.find().toArray() [{ "v" : 1, "key" : { "company" : 1, "employeeId" : 1 }, "ns" : "test.docs", "name" : "company_1_employeeId_1" }]
  • 29. But there's an index!?! db.system.indexes.find().toArray() [{ "v" : 1, "key" : { "company" : 1, "employeeId" : 1 }, "ns" : "test.docs", "name" : "company_1_employeeId_1" }] This isn't the index you're looking for.
  • 30. Did you see the problem? { _id: ObjectId("53b9ab7e939f1e229b4f574c"), firstName: "Alice", lastName: "Smith", parent: { company: 22794, employeeId: 83881 } }
  • 31. The index was created incorrectly db.system.indexes.find().toArray() [{ "v" : 1, "key" : { "parent.company" : 1, "parent.employeeId" : 1 }, "ns" : "test.docs", "name" : "parent.company_1_parent.employeeId_1" }] Subdocument needed
  • 32. Indexing Strategies • Create indexes that support your queries! • Create highly selective indexes • Eliminate duplicate indexes with a compound index, if possible – db.collection.ensureIndex({A:1, B:1, C:1}) – allows queries using leftmost prefix • Order compound index fields thusly: equality, sort, then range – see http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/ • Create indexes that support covered queries • Prevent collection scans in pre-production environments – mongod --notablescan – db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )
  • 33. Monitoring Your Workload • Log files, iostat, mtools, mongotop are for debugging • MongoDB Management Service (MMS) can do metrics collection and reporting
  • 38. Cloud Version of MMS 1. Go to http://mms.mongodb.com 2. Create an account 3. Install one agent in your datacenter 4. Add hosts from the web interface 5. Enjoy!
  • 39. Today’s Webinar Agenda Hardware Considerations Achieve Scale 1 Optimization Tips Scale Vertically Horizontal Scaling 2 3
  • 40. Vertical Scaling Factors: – RAM – Disk – CPU – Network Replica Set Primary Secondary Secondary Replica Set Primary Secondary Secondary Horizontal Scaling
  • 41. Working Set Exceeds Physical Memory
  • 42. RAM - Measure your working set and index sizes • db.serverStatus({workingSet:1}).workingSet { "computationTimeMicros": 2751, "note": "thisIsAnEstimate", "overSeconds": 1084, "pagesInMemory": 2041 } • db.stats().indexSize 2032880640 • In this example, (2041 * 4096) + 2032880640 = 2041240576 bytes = 1.9 GB • Note: this is a subset of the virtual memory used by mongod
  • 43. Real World Example: Vertical Scaling • System that tracked status information for entities in the business • State changes happen in batches; sometimes 10% of entities get updated, sometimes 100% get updated
  • 44. Initial Architecture Sharded cluster with 4 shards using spinning disks Application / mongos mongod
  • 45. Adding shards to scale horizontally • Application was a success! Business entities grew by a factor of 5 • Cluster capacity multiplied by 5, but so did the TCO Application / mongos mongod …16 more shards…
  • 46. More success means more shards • 10x growth means … 200 shards • Horizontal scaling with sharding is linear scaling, but an order of magnitude was needed • Bulk updates of random documents approaches speed of disks
  • 47. Final architecture • Scaling the random IOPS with SSDs was a vertical scaling approach Application / mongos mongod SSD
  • 48. Before you add hardware… • Make sure you are solving the right scaling problem • Remedy schema and index problems first – schema and index problems can look like hardware problems • Tune the Operating System – ulimits, swap, NUMA, NOOP scheduler with hypervisors • Tune the IO subsystem – ext4 or XFS vs SAN, RAID10, readahead, noatime • See MongoDB "production notes" page • Heed logfile startup warnings
  • 49. Today’s Webinar Agenda Achieve Scale 1 Optimization Tips 2 Scale Vertically The Horizontal Basics of Sharding Scaling 3
  • 50. The basics of Horizontal Scaling
  • 51. The basics of Horizontal Scaling (aka Sharding)
  • 52. The Basics of Sharding
  • 53. Rule of Thumb To make good decisions about MongoDB implementations, you must understand MongoDB and your applications and the workload your applications generate and your business requirements.
  • 54. Summary • Don't throw hardware at the problem until you examine all other possibilities (schema, indexes, OS, IO subsystem) • Know what is considered "normal" performance by monitoring • Horizontal scaling in MongoDB is implemented with sharding, but you must understand schema design and indexing before you shard Sharding a sub-optimally designed database will not make it performant
  • 55. Today’s Webinar Agenda Achieve Scale 1 Optimization Tips The Horizontal Basics of Sharding Scaling 3 Schema Design Indexes Monitoring your Workload 2 Scale Vertically
  • 56. Limited Time: Get Expert Advice for Free If you’re thinking about scaling, why reinvent the wheel? Our experts can collaborate with you to provide detailed guidance. Sign Up For a Free One Hour Consult: http://bit.ly/1rkXcfN
  • 57. Questions? Stay tuned after the webinar and take our survey for your chance to win MongoDB schwag.
  • 58. Thank You Jake Angerman Sr. Solutions Architect, MongoDB

Hinweis der Redaktion

  1. trap: concern about correctness overrides optimization at scale
  2. importing a relational schema directly into MongoDB is an anti-pattern!
  3. different parts of the world are awake and shopping at a given time
  4. Anti-pattern: embedding highly volatile data in an array
  5. these may look like performance tips instead of schema design tips sub-optimal query might be $unwind followed by $match instead of projection
  6. 100ms threshold by default
  7. shard key aside
  8. Indexes should be contained in working set.
  9. In this case I had a 50GB database but only ~2GB were needed in RAM
  10. this applies to both vertical and horizontal scaling
  11. The order presented is the order you should analyze
  12. www.mongodb.com/lp/contact/scaling-101