SlideShare a Scribd company logo
1 of 38
MongoDB
Memory Management
Demystified
Alon Horev
MongoDB World 2014
Meta
Alon Horev
@alonhorev
Why should you care?
MongoDB
Memory Mapped Files
Page Cache
Storage
MongoDB
Memory Mapped Files
Page Cache
Storage
Storage
RAM SSD
HDD
Throughput in MB
Price per GB
5.5$ 0.5$ 0.05$
6400 650 1-160
Hardware Configuration
MongoDB
Memory Mapped Files
Page Cache
Storage
Page Cache
Process
User Space Kernel Space
Process
read(fd, *buffer, count)
Page Cache
System
call
Page Cache – Read
Example File
Page 1 Page 2 Page 3
File descriptor
At 2,000
End
at 10,000
Page in
cache?
offset+count  pages
Read from
disk and
store in
cache
Read from cache
and copy to *buffer
No
Yes
Disk
Page Cache
Process
write(fd, *buffer, count)
System
call
Page Cache – Write
Update page
And mark as dirty
After X seconds
flush to disk
Page Reclamation
LRU – Least Recently Used
$ free -g
total used free cached
Mem: 64 61 3 55
-/+ buffers/cache: 5 58
Swap: 16 0 16
Free
MongoDB
Memory Mapped Files
Page Cache
Storage
Memory Mapped Files
Process File
2000
1000
4000
5000
mmapProcess B
File
Process A
MongoDB
Memory Mapped Files
Page Cache
Storage
MongoDB
Maps everything: documents, indexes, journal
Running top:
Challenges
No control over what is saved in memory
Warm-up
Expensive queries
Mitigation Plan
Protect MongoDB with an API
Enforce index usage
Pass a query timeout (from 2.6)
Example of a simple API
def find_samples(start_time, end_time):
return samples.find({‘time’: {‘$gte’: start_time,
‘$lt’: end_time}})
Challenges
Lack of Inter-process prioritization
Mitigation: isolate mongo
Estimate required memory
How big is the working set?
Working Set
Contains:
Documents
Indexes
Padding (!)
Doc 1 Doc 2 Doc 3
0 4k
Padding
Working Set Analysis
Planning
Monitoring
Planning
db.samples.stats()
dataSize
indexSizes
ColdWarmHot
Month
Last 2 weeks 1 week 1 week
Monitoring
Online
top, iostat
db.currentOp(), mongostat, mongomem
Offline
Profiling collection
MMS/Graphite
Mongomem
Top collections:
local.oplog.rs 11218 / 49865 MB (22.496883%) [25 extents]
samples.quarter 3661 / 219714 MB (1.666450%) [128 extents]
samples.hour 1629 / 10921 MB (14.924107%) [26 extents]
Total resident pages: 16508 / 280500 MB (5.885%)
Mongomem
Procedure:
Stop the database
Clear the page cache:
echo 1 > /proc/sys/vm/drop_caches
Start the database
Run queries that should return fast
Run mongomem!
What to monitor?
Thrashing
Page faults
Disk utilization
Symptoms
Queued queries
High locking ratios
iostat
$ iostat –xm 1 /dev/sda
Device: r/s w/s rMB/s wMB/s %util
sda 570.00 0.00 31.28 0.00 100.00
mongostat
Uses db.serverStatus()
Metrics per second:
Page faults
Queued reads (qr)
Offline monitoring
MMS/Graphite
Mandatory!
Optimization
Smaller = faster!
Less memory
Higher disk throughput
Schema
Shorten keys
firstName -> first -> f
Size vs. count
Optimizing indices
Unused indices
Sparse
Indices should fit in memory
A
Index on name:
Older Newer
Index on creation_time:
Z
Summary
How it works
Challenges
Monitor
Optimize

More Related Content

What's hot

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 

What's hot (20)

Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
mysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancementmysql 8.0 architecture and enhancement
mysql 8.0 architecture and enhancement
 
The consequences of sync_binlog != 1
The consequences of sync_binlog != 1The consequences of sync_binlog != 1
The consequences of sync_binlog != 1
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Elastic Search Indexing Internals
Elastic Search Indexing InternalsElastic Search Indexing Internals
Elastic Search Indexing Internals
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
How Scylla Manager Handles Backups
How Scylla Manager Handles BackupsHow Scylla Manager Handles Backups
How Scylla Manager Handles Backups
 
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
How to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your WorkloadsHow to Choose the Right Database for Your Workloads
How to Choose the Right Database for Your Workloads
 

Viewers also liked

Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
NLJUG
 

Viewers also liked (20)

MongoDB memory management demystified
MongoDB memory management demystifiedMongoDB memory management demystified
MongoDB memory management demystified
 
Akka - Developing SEDA Based Applications
Akka - Developing SEDA Based ApplicationsAkka - Developing SEDA Based Applications
Akka - Developing SEDA Based Applications
 
Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0Concurrency Control in MongoDB 3.0
Concurrency Control in MongoDB 3.0
 
LMAX Architecture
LMAX ArchitectureLMAX Architecture
LMAX Architecture
 
Introduction to the Actor Model
Introduction to the Actor ModelIntroduction to the Actor Model
Introduction to the Actor Model
 
Actors and Threads
Actors and ThreadsActors and Threads
Actors and Threads
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Webinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage EngineWebinar: Schema Patterns and Your Storage Engine
Webinar: Schema Patterns and Your Storage Engine
 
Concurrent Programming Using the Disruptor
Concurrent Programming Using the DisruptorConcurrent Programming Using the Disruptor
Concurrent Programming Using the Disruptor
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Introduction to the Disruptor
Introduction to the DisruptorIntroduction to the Disruptor
Introduction to the Disruptor
 
Akka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based ApplicationsAkka in Practice: Designing Actor-based Applications
Akka in Practice: Designing Actor-based Applications
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
How to monitor MongoDB
How to monitor MongoDBHow to monitor MongoDB
How to monitor MongoDB
 
Device Simulator with Akka
Device Simulator with AkkaDevice Simulator with Akka
Device Simulator with Akka
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
Webinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDBWebinar: Transitioning from SQL to MongoDB
Webinar: Transitioning from SQL to MongoDB
 
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 

Similar to MongoDB Memory Management Demystified

Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 

Similar to MongoDB Memory Management Demystified (20)

Vam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory AllocatorVam: A Locality-Improving Dynamic Memory Allocator
Vam: A Locality-Improving Dynamic Memory Allocator
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
Lightning Talk: MongoDB Migration Strategies
Lightning Talk: MongoDB Migration StrategiesLightning Talk: MongoDB Migration Strategies
Lightning Talk: MongoDB Migration Strategies
 
MongoDB-Migration-Strategies
MongoDB-Migration-StrategiesMongoDB-Migration-Strategies
MongoDB-Migration-Strategies
 
A Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDBA Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDB
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 
Sql server performance tuning and optimization
Sql server performance tuning and optimizationSql server performance tuning and optimization
Sql server performance tuning and optimization
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Unit 5
Unit 5Unit 5
Unit 5
 
Linux Memory
Linux MemoryLinux Memory
Linux Memory
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Memory comp
Memory compMemory comp
Memory comp
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Managing Memory & Locks - Series 1 Memory Management
Managing  Memory & Locks - Series 1 Memory ManagementManaging  Memory & Locks - Series 1 Memory Management
Managing Memory & Locks - Series 1 Memory Management
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
08 Operating System Support
08  Operating  System  Support08  Operating  System  Support
08 Operating System Support
 
I/O System and Case Study
I/O System and Case StudyI/O System and Case Study
I/O System and Case Study
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

MongoDB Memory Management Demystified

Editor's Notes

  1. Why should you care about memory management? memory management has a huge impact on performance and costs. This relates both to developers and dbas, as a developer you can optimize the schema and queries for better memory usage, As a dba you can monitor and predict performance issues related to memory usage. I’m pretty sure every mongodb administrator asked himself atleast once: how much memory do I really need?. Before we dive in I want to tell you a little secret: MongoDB doesn’t actually manage memory. It leaves that responsibility to the operating system.
  2. Within the operating system there’s a stack of components which MongoDB depends on to manage memory. Each component relies on the component below it. This talk is structured around this stack of components. We’ll start from the low level components which are storage devices: disks and RAM We’ll continue with the page cache and memory mapped files which are a part of the operating system’s kernel And we’ll finish off with MongoDB’s usage of these mechanisms. 
  3. Let’s talk about storage.
  4. There are different types of storage devices with different characteristics, we’ll review hard disk drives, solid state drives and RAM. (!) Let’s start by breaking these into categories: HDDs and SSDs are persistent and RAM isn’t, but RAM is really fast. That’s why every computer has both types of storage, one persistent (a HDD or a SSD) and one is volatile (RAM).
  5. Now let’s compare throughput. As I said before, RAM is fast, it could go as fast as 6400 MBPS for reads and writes. SSDs are 10 times slower than RAM, modern SSDs can reach a read rate of 650 MBPS and a little less for writes. HDDs are much slower, ranging from 1 MB to 160 MB per second for reads and writes. The reason there’s such variance in HDD speed is because throughput is highly affected by access patterns. Specifically with HDDs, random access is much slower than sequential access, and that’s because a HDD contains a mechanical arm that needs to move on almost every random access. Sadly for us, databases do a lot of random I/O. which means, if you’re running a query on data that’s not in memory and therefore, it has to be read from disk, you’re seeing a penalty of about two multitudes on response times. The next characteristic is price. (!) For making the comparison easier we’ll compare the price per GB. It’s not surprising that there’s a correlation between price and throughput, meaning, the more you pay for each GB, you get better throughput. So hard drives are really cheap at 5 cents per GB, SSDs are 10 times more expensive and RAM is 100 times more expensive. This slide reveals the tradeoffs between price, capacity and performance which are key factors in choosing the right hardware configuration.
  6. Is this information sufficient to choose the optimal hardware configuration? I think it’s not, your application’s requirements are also a part of the equation. For example, if your application is an archive that saves huge amounts of data that is rarely accessed, you can go for a large HDD and save a lot of money. Later on we’ll see how can you take measurements of things like RAM and capacity and then you’ll be able to determine what kind of hardware configuration you need.
  7. Before looking at additional tools I want to answer a simple question: how do we know when something is wrong? what do we need to monitor? And since we’re talking about memory, how do we know we don’t have enough of it?. Well, the phenomenon of not having enough memory is called thrashing. When the OS is thrashing, it’s because an application is constantly accessing pages that are not in memory, the OS is busy handling the pagefaults, reading the pages from disk. So the first thing to monitor is page faults, and since it’s hard to tell how many page faults are too much, you should also look at disk utilization. There are a lot of other things that go wrong like a lot of queries being queued and high locking ratios but these just are symptoms
  8. I usually use iostat for looking at disk utlization. Here’s an example output of the command, the rightmost column shows this disk utilization and reveals a disk that is busy a 100% of the time. The second column show the disk serves 570 reads per second and the third column shows the number of writes per second which is zero. If this is happening constantly, the working set does not fit in memory. Along with iostat, I frequently use mongostat
  9. Mongostat comes packaged with MongoDB and uses the underlying serverStatus command. It displays a bunch of interesting metrics like the number of page faults and queued reads. It’s pretty hard to say how many page faults are too much but more than one or two hundread page faults per second are an indication of a lot of data being read from disk. If this happens over long periods of time it could be an indication the working set does not fit in RAM. If the number of queued reads is larger than a hundred over long periods of time it could also be an indication the working set doesn’t fit in RAM. It’s often important to look at these parameters over time in order to determine if there’s a sudden spike or repeating problem. This brings me to offline monitoring.
  10. Tools like the MMS or graphite can show you these important metrics over time. Using one of these tools is mandatory for a production system. I cannot tell you how useful they are. Whenever we get a ticket about a performance problem we put our Sherlock hats on and start an investigation. We look at metrics related to our application but also, a lot of metrics related to mongo and how they change over time: we look at the number of queries, the number of documents in collections and tens of other metrics. I’d like to show you an example workflow of a ticket. It was a beautiful morning, 10 A.M, when I get an automated email that one of our shards is misbehaving, it has more than 300 queries just waiting in queue.
  11. I immediately open graphite, this is a screenshot of the number of page faults in green and the number of queued readers in blue. By looking at the history you can spot two trends: 1. First, there’s a spike of high load every hour. This is actually normal since we’re doing hourly aggregations of our data. 2. The second trend, is a massive rise in page faults and queued queries at exactly 20:00. At this point there’s an impact on users as a lot of queries take a very long time. Why is this happening? Has the working set outgrown memory?
  12. Lets look at another screenshot of the same time frame. This time we look at other metrics: in blue are the numbers of queries, in green are the number of updates, the disk utilization in red. Remember that disk utilization is measured in percentage so even though the graph is lower than others we can still see that at 20:00 the disk was constantly utilized at a 100%. When looking at the updates vs. queries it’s obvious that a huge amount of updates is hurting the query performance. We were busy writing to disk. In this case an application change was the root cause of the problem, the application simply started updating a lot more documents. We were still able to trace it to application and later on changed our schema to reduce the document size and the load on disk. This brings me to next topic which is optimization.
  13. When optimizing memory usage the main target is to reduce the amount of required memory for your application. Smaller the collections and documents are, the faster the queries will be. not just in terms of memory but also disk, if documents are smaller less disk access is required to read them. There are several optimizations you can do when it comes to schema: first, shorten the keys. we’ve started with long names like firstName, then, shortened them to a single word or acronym and finally used one or two letters since it had a huge impact on the size of our data. By shortening the keys we reduced the size of our data in more than 50%. There is a huge downside for doing this because it obscures the data but fortunately, we have an API that hides this ugly implementation detail so it doesn’t have an impact on our users. Another thing to consider is the tradeoff between the number of documents and their size, in many use cases it’s more efficient to store a smaller amount of large documents vs. a large amount of small ones. The next thing you can optimize is indices
  14. First thing you should know is that unused indices are still accessed whenever documents are being inserted, updated or deleted. Try to identify those and remove them. (!) Use sparse indices when only some of the documents will have the indexed attribute as they use less space. (!) The last thing I want to talk about is how much of the index is located in memory. The answer is: it depends. If the entire index is accessed by queries then the entire index should be located in memory. If only a single part of the index is used, only that part has to fit in memory. Lets look at a few examples to emphasize the difference, you can imagine an index as a segment of memory, the red marks are locations frequently accessed by queries. (!) The first example is an index on a date field called creation_time. Each inserted document inserts the largest value of all previous ones so the right most part of the index is updated. In many such indexes only the recent history is often accessed so only the right-most part of the index will be located in memory. (!) The second example is an index on a person’s name, the index accesses will probably distribute evenly across the entire index so most of it will be located in memory.
  15. So lets summarize what we’ve learned: 1. We’ve seen how memory management works, we’ve started from the disk and RAM, went up the stack to the page cache whose sole purpose is to improve read and write performance by using the memory. We continued to memory mapped files which translate memory accesses like reads and writes to file reads and writes. And we finished with MongoDB’s usage of these mechanisms. 2. We’ve talked about the challenges this strategy presents: like predicting and measuring the size of the working set. 3. We then talked about monitoring, which is something you have to do if you have a DB running in production. 4. We finished with schema and index optimizations which are crucial for cutting costs and improving performance. I hope you enjoyed my talk and thanks for having me.