MongoDB Memory Management Demystified

•Download as PPTX, PDF•

3 likes•4,770 views

MongoDB

Technology

MongoDB
Memory Mapped Files
Page Cache
Storage

Throughput in MB
Price per GB
5.5$ 0.5$ 0.05$
6400 650 1-160

User Space Kernel Space
Process
read(fd, *buffer, count)
Page Cache
System
call
Page Cache – Read
Example File
Page 1 Page 2 Page 3
File descriptor
At 2,000
End
at 10,000
Page in
cache?
offset+count  pages
Read from
disk and
store in
cache
Read from cache
and copy to *buffer
No
Yes

Disk
Page Cache
Process
write(fd, *buffer, count)
System
call
Page Cache – Write
Update page
And mark as dirty
After X seconds
flush to disk

Page Reclamation
LRU – Least Recently Used

$ free -g
total used free cached
Mem: 64 61 3 55
-/+ buffers/cache: 5 58
Swap: 16 0 16
Free

Memory Mapped Files
Process File
2000
1000
4000
5000

MongoDB
Maps everything: documents, indexes, journal
Running top:

Challenges
No control over what is saved in memory
Warm-up
Expensive queries

$Mitigation Plan Protect MongoDB with an API Enforce index usage Pass a query timeout (from 2.6) Example of a simple API def find_samples(start_time, end_time): return samples.find({‘time’: {‘$gte’: start_time, ‘$lt’: end_time}})$

Challenges
Lack of Inter-process prioritization
Mitigation: isolate mongo
Estimate required memory
How big is the working set?

Working Set
Contains:
Documents
Indexes
Padding (!)
Doc 1 Doc 2 Doc 3
0 4k
Padding

Working Set Analysis
Planning
Monitoring

Planning
db.samples.stats()
dataSize
indexSizes
ColdWarmHot
Month
Last 2 weeks 1 week 1 week

Monitoring
Online
top, iostat
db.currentOp(), mongostat, mongomem
Offline
Profiling collection
MMS/Graphite

Mongomem
Top collections:
local.oplog.rs 11218 / 49865 MB (22.496883%) [25 extents]
samples.quarter 3661 / 219714 MB (1.666450%) [128 extents]
samples.hour 1629 / 10921 MB (14.924107%) [26 extents]
Total resident pages: 16508 / 280500 MB (5.885%)

Mongomem
Procedure:
Stop the database
Clear the page cache:
echo 1 > /proc/sys/vm/drop_caches
Start the database
Run queries that should return fast
Run mongomem!

What to monitor?
Thrashing
Page faults
Disk utilization
Symptoms
Queued queries
High locking ratios

iostat
$ iostat –xm 1 /dev/sda
Device: r/s w/s rMB/s wMB/s %util
sda 570.00 0.00 31.28 0.00 100.00

mongostat
Uses db.serverStatus()
Metrics per second:
Page faults
Queued reads (qr)

Offline monitoring
MMS/Graphite
Mandatory!

Optimization
Smaller = faster!
Less memory
Higher disk throughput
Schema
Shorten keys
firstName -> first -> f
Size vs. count

Optimizing indices
Unused indices
Sparse
Indices should fit in memory
A
Index on name:
Older Newer
Index on creation_time:
Z

Summary
How it works
Challenges
Monitor
Optimize

What's hot

For MariaDB Enterprise Server 10.5, the default transactional storage engine, InnoDB, has been significantly rewritten to improve the performance of writes and backups. Next, we removed a number of parameters to reduce unnecessary complexity, not only in terms of configuration but of the code itself. And finally, we improved crash recovery thanks to better consistency checks and we reduced memory consumption and file I/O thanks to an all new log record format. In this session, we’ll walk through all of the improvements to InnoDB, and dive deep into the implementation to explain how these improvements help everything from configuration and performance to reliability and recovery.

Faster, better, stronger: The new InnoDB

MariaDB plc

A Technical Introduction to WiredTiger

MongoDB

Introduction to MongoDB

Ravi Teja

Introduction to mongo db

Hemant Sharma

Introduction to MongoDB

Dineesha Suraweera

mysql 8.0 architecture and enhancement

lalit choudhary

Have you ever needed to get some additional write throughput from MySQL ? If yes, you probably found that setting sync_binlog to 0 (and trx_commit to 2) gives you an extra performance boost. As all such easy optimisation, it comes at a cost. This talk explains how this tuning works, presents its consequences and makes recommendations to avoid them. This will bring us to the details of how MySQL commits transactions and how those are replicated to slaves. Come to this talk to learn how to get the benefit of this tuning the right way and to learn some replication internals.

The consequences of sync_binlog != 1

Jean-François Gagné

MongoDB Fundamentals

MongoDB

MongoDB Performance Tuning

MongoDB

MyRocks Deep Dive

Yoshinori Matsunobu

Inside MongoDB: the Internals of an Open-Source Database

Mike Dirolf

MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database. In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.

A Technical Introduction to WiredTiger

MongoDB

Elastic Search Indexing Internals

Gaurav Kukal

by Mahesh Pakal, AWS PostgreSQL is a powerful, enterprise class open source object-relational database system with an emphasis on extensibility and standards-compliance. PostgreSQL boasts many sophisticated features and runs stored procedures in more than a dozen programming languages. We’ll explore the advantages and limitations of PostgreSQL, examples of where it is best suited for use, and examples of who is using PostgreSQL to power their applications.

PostgreSQL

Amazon Web Services

[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL

PgDay.Seoul

The Basics of MongoDB

valuebound

No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.

How Scylla Manager Handles Backups

ScyllaDB

Is my MySQL server configured properly? Should I run Community MySQL, MariaDB, Percona or WebScaleSQL? How many innodb buffer pool instances should I run? Why should I NOT use the query cache? How do I size the innodb log file size and what IS that innodb log anyway? All answers are inside. Aurimas Mikalauskas is a former Percona performance consultant and architect currently writing and teaching at speedemy.com. He's been involved with MySQL since 1999, scaling and optimizing MySQL backed systems since 2004 for companies such as BBC, EngineYard, famous social networks and small shops like EstanteVirtual, Pine Cove and hundreds of others. Additional content mentioned in the presentation can be found here: http://speedemy.com/17

MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)

Aurimas Mikalauskas

MongoDB Sharding

Rob Walters

How to Choose the Right Database for Your Workloads

InfluxData

What's hot (20)

Faster, better, stronger: The new InnoDB

A Technical Introduction to WiredTiger

Introduction to MongoDB

Introduction to mongo db

Introduction to MongoDB

mysql 8.0 architecture and enhancement

The consequences of sync_binlog != 1

MongoDB Fundamentals

MongoDB Performance Tuning

MyRocks Deep Dive

Inside MongoDB: the Internals of an Open-Source Database

A Technical Introduction to WiredTiger

Elastic Search Indexing Internals

PostgreSQL

[Pgday.Seoul 2021] 1. 예제로 살펴보는 포스트그레스큐엘의 독특한 SQL

The Basics of MongoDB

How Scylla Manager Handles Backups

MySQL Performance Tuning. Part 1: MySQL Configuration (includes MySQL 5.7)

MongoDB Sharding

How to Choose the Right Database for Your Workloads

Viewers also liked

Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.

MongoDB memory management demystified

Alon Horev

Akka - Developing SEDA Based Applications

Benjamin Darfler

This talk will describe the changes which went into MongoDB 3.0 in order to allow storage engines to achieve their maximum concurrency potential. In MongoDB 3.0, concurrency control has been separated into two levels: top-level, which protects the database catalog, and storage engine-level, which allows each individual storage engine implementation to manage its own concurrency. We will start from the top and introduce the concept of multi-granularity locking and how it protects the database catalog. We will then explain how the MongoDB lock manager works and how it allows storage engines to manage their own concurrency control without imposing any additional overhead.

Concurrency Control in MongoDB 3.0

MongoDB

LMAX Architecture

Stephan Schmidt

Introduction to the Actor Model

BoldRadius Solutions

Actors and Threads

mperham

Hadoop-based data lakes are enabling enterprises and governments to efficiently capture and analyze unprecedented volumes of data. Join this webinar to learn how digital transformation is driving the rise of the data lake, the role Hadoop plays in generating new classes of analytics and insight, the critical capabilities you need to evaluate in an operational database for your data lake, and more.

Unlocking Operational Intelligence from the Data Lake

MongoDB

How do MongoDB’s different storage options change the way you model your data? Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways. This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger. Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.

Webinar: Schema Patterns and Your Storage Engine

MongoDB

Concurrent Programming Using the Disruptor

Trisha Gee

A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies. This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data. Watch the webinar to learn: - What MongoDB is and where it's used - What data streaming is and where it fits into modern data architectures - How Kafka works, what it delivers, and where it's used - How to operationalize the Data Lake with MongoDB & Kafka - How MongoDB integrates with Kafka – both as a producer and a consumer of event data The webinar is co-presented with Confluent, the company founded by the creators of Apache Kafka.

Webinar: Data Streaming with Apache Kafka & MongoDB

MongoDB

Introduction to the Disruptor

Trisha Gee

Akka in Practice: Designing Actor-based Applications

NLJUG

The term 'streams' has been getting pretty overloaded recently–it's hard to know where to best use different technologies with streams in the name. In this talk by noted hAkker Konrad Malawski, we'll disambiguate what streams are and what they aren't, taking a deeper look into Akka Streams (the implementation) and Reactive Streams (the standard). You'll be introduced to a number of real life scenarios where applying back-pressure helps to keep your systems fast and healthy at the same time. While the focus is mainly on the Akka Streams implementation, the general principles apply to any kind of asynchronous, message-driven architectures.

Understanding Akka Streams, Back Pressure, and Asynchronous Architectures

Lightbend

David Mytton is a MongoDB master and the founder of Server Density. In this presentation David delves deeper into what's discussed in our how to monitor MongoDB tutorial (https://blog.serverdensity.com/monitor-mongodb/), with the aim of taking you through: Key MongoDB metrics to monitor. Non-critical MongoDB metrics to monitor. Alerts to set for MongoDB on production. Tools for monitoring MongoDB.

How to monitor MongoDB

Server Density

Device Simulator with Akka

Max Huang

This talk is about architecture designs for data processing platforms based on SMACK stack which stands for Spark, Mesos, Akka, Cassandra and Kafka. The main topics of the talk are: - SMACK stack overview - storage layer layout - fixing NoSQL limitations (joins and group by) - cluster resource management and dynamic allocation - reliable scheduling and execution at scale - different options for getting the data into your system - preparing for failures with proper backup and patching strategies

Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...

Anton Kirillov

Webinar: Transitioning from SQL to MongoDB

MongoDB

2011-11-02 | 05:45 PM - 06:35 PM | Victoria The Disruptor is new open-source concurrency framework, designed as a high performance mechanism for inter-thread messaging. It was developed at LMAX as part of our efforts to build the world's fastest financial exchange. Using the Disruptor as an example, this talk will explain of some of the more detailed and less understood areas of concurrency, such as memory barriers and cache coherency. These concepts are often regarded as scary complex magic only accessible by wizards like Doug Lea and Cliff Click. Our talk will try and demystify them and show that concurrency can be understood by us mere mortal programmers.

Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...

JAX London

Regardless of the meaning we are searching for over our vast amounts of data, whether we are in science, finance, technology, energy, health care…, we all share the same problems that must be solved: How do we achieve that? What technologies best support the requirements? This talk is about how to leverage fast access to historical data with real time streaming data for predictive modeling for lambda architecture with Spark Streaming, Kafka, Cassandra, Akka and Scala. Efficient Stream Computation, Composable Data Pipelines, Data Locality, Cassandra data model and low latency, Kafka producers and HTTP endpoints as akka actors...

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...

Helena Edelson

Organizations have long seen the value in aggregating data from multiple systems into a single, holistic, real-time representation of a business entity. That entity is often a customer. But the benefits of a single view in enhancing business visibility and operational intelligence can apply equally to other business contexts. Think products, supply chains, industrial machinery, cities, financial asset classes, and many more. However, for many organizations, delivering a single view to the business has been elusive, impeded by a combination of technology and governance limitations. MongoDB has been used in many single view projects across enterprises of all sizes and industries. In this session, we will share the best practices we have observed and institutionalized over the years. By attending the webinar, you will learn: - A repeatable, 10-step methodology to successfully delivering a single view - The required technology capabilities and tools to accelerate project delivery - Case studies from customers who have built transformational single view applications on MongoDB.

Webinar: 10-Step Guide to Creating a Single View of your Business

MongoDB

Viewers also liked (20)

MongoDB memory management demystified

Akka - Developing SEDA Based Applications

Concurrency Control in MongoDB 3.0

LMAX Architecture

Introduction to the Actor Model

Actors and Threads

Unlocking Operational Intelligence from the Data Lake

Webinar: Schema Patterns and Your Storage Engine

Concurrent Programming Using the Disruptor

Webinar: Data Streaming with Apache Kafka & MongoDB

Introduction to the Disruptor

Akka in Practice: Designing Actor-based Applications

Understanding Akka Streams, Back Pressure, and Asynchronous Architectures

How to monitor MongoDB

Device Simulator with Akka

Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...

Webinar: Transitioning from SQL to MongoDB

Java Core | Understanding the Disruptor: a Beginner's Guide to Hardcore Concu...

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...

Webinar: 10-Step Guide to Creating a Single View of your Business

Similar to MongoDB Memory Management Demystified

Vam: A Locality-Improving Dynamic Memory Allocator

Emery Berger

Advance google file system

Lalit Rastogi

Lightning Talk: MongoDB Migration Strategies

MongoDB

MongoDB-Migration-Strategies

andyjwoodard

Ticketmaster is the world leader in selling tickets. After more than a decade of developing applications extensively on Oracle and MySQL, Ticketmaster made the move to MongoDB. The reasons for the move are generally in line with those of other companies – increased flexibility and performance, and decreased costs and time-to-market. In this session we’ll discuss how the conversion to MongoDB went at Ticketmaster and we’ll take a deeper dive into some of the successes and set-backs that we faced. We’ll give an overview of the MongoDB topology at Ticketmaster, discuss exactly what data we’re writing to MongoDB and comment on the MongoDB support model that we’re using. We’ll also touch on the transition from relational DBA to NoSQL DBA at Ticketmaster.

A Front-Row Seat to Ticketmaster’s Use of MongoDB

MongoDB

Exchange Server 2013 Database and Store Changes

Microsoft TechNet - Belgium and Luxembourg

Sql server performance tuning and optimization

Manish Rawat

Deployment Strategies (Mongo Austin)

Unit 5

Linux Memory

Dba tuning

Deployment Strategy

Memory comp

Using AWS has never been easier or more affordable to solve business problems and uncover new opportunities using data. Now, businesses of all sizes and across all industries can take advantage of big data technologies and easily collect, store, process, analyze, and share their data. Gain a thorough understanding of what AWS offers across the big data lifecycle and learn architectural best practices for applying these technologies to your projects. We will also deep dive into how to use AWS services such as Kinesis, DynamoDB, Redshift, and Quicksight to optimize logging, build real-time applications, and analyze and visualize data at any scale.

Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools

Amazon Web Services

Managing Memory & Locks - Series 1 Memory Management

DAGEOP LTD

Main Memory Management in Operating System

Rashmi Bhat

Deployment Strategies

MongoDB

08 Operating System Support

Jeanie Delos Arcos

I/O System and Case Study

GRamya Bharathi

Cluster based storage - Nasd and Google file system - advanced operating syst...

Antonio Cesarano

Similar to MongoDB Memory Management Demystified (20)

Vam: A Locality-Improving Dynamic Memory Allocator

Advance google file system

Lightning Talk: MongoDB Migration Strategies

MongoDB-Migration-Strategies

A Front-Row Seat to Ticketmaster’s Use of MongoDB

Exchange Server 2013 Database and Store Changes

Sql server performance tuning and optimization

Deployment Strategies (Mongo Austin)

Unit 5

Linux Memory

Dba tuning

Deployment Strategy

Memory comp

Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools

Managing Memory & Locks - Series 1 Memory Management

Main Memory Management in Operating System

Deployment Strategies

08 Operating System Support

I/O System and Case Study

Cluster based storage - Nasd and Google file system - advanced operating syst...

More from MongoDB

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Architecting Cloud Native Applications

WSO2

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

This presentations targets students or working professionals. You may know Google for search, YouTube, Android, Chrome, and Gmail, but did you know Google has many developer tools, platforms & APIs? This comprehensive yet still high-level overview outlines the most impactful tools for where to run your code, store & analyze your data. It will also inspire you as to what's possible. This talk is 50 minutes in length.

Powerful Google developer tools for immediate impact! (2023-24 C)

wesley chun

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

MINDCTI Revenue Release Quarter One 2024

MIND CTI

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

Manulife - Insurer Transformation Award 2024

The Digital Insurer

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Modernizing Securities Finance: The cloud-native prime brokerage platform transforming capital markets. Madhu Subbu, Managing Director, Head of Securities Finance Engineering Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

apidays

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

In this session, we will delve into strategic approaches for optimizing knowledge management within Microsoft 365, amidst the evolving landscape of Copilot. From leveraging automatic metadata classification and permission governance with SharePoint Premium, to unlocking Viva Engage for the cultivation of knowledge and communities, you will gain actionable insights to bolster your organization's knowledge-sharing initiatives. In this session, we will also explore how to facilitate solutions to enable your employees to find answers and expertise within Microsoft 365. You will leave equipped with practical techniques and a deeper understanding of how there is more to effective knowledge management than just enabling Copilot, but building actual solutions to prepare the knowledge that Copilot and your employees can use.

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Drew Madelung

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

Igalia

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty

AWS Community Day CPH - Three problems of Terraform

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

How to Troubleshoot Apps for the Modern Connected Worker

Architecting Cloud Native Applications

FWD Group - Insurer Innovation Award 2024

Powerful Google developer tools for immediate impact! (2023-24 C)

Data Cloud, More than a CDP by Matt Robison

Corporate and higher education May webinar.pptx

MINDCTI Revenue Release Quarter One 2024

presentation ICT roal in 21st century education

Manulife - Insurer Transformation Award 2024

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

A Year of the Servo Reboot: Where Are We Now?

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Automating Google Workspace (GWS) & more with Apps Script

MongoDB Memory Management Demystified

1. MongoDB Memory Management Demystified Alon Horev MongoDB World 2014

2. Meta Alon Horev @alonhorev

4. Why should you care?

5. MongoDB Memory Mapped Files Page Cache Storage

6. MongoDB Memory Mapped Files Page Cache Storage

7. Storage RAM SSD HDD

8. Throughput in MB Price per GB 5.5$ 0.5$ 0.05$ 6400 650 1-160

9. Hardware Configuration

10. MongoDB Memory Mapped Files Page Cache Storage

11. Page Cache Process

12. User Space Kernel Space Process read(fd, *buffer, count) Page Cache System call Page Cache – Read Example File Page 1 Page 2 Page 3 File descriptor At 2,000 End at 10,000 Page in cache? offset+count  pages Read from disk and store in cache Read from cache and copy to *buffer No Yes

13. Disk Page Cache Process write(fd, *buffer, count) System call Page Cache – Write Update page And mark as dirty After X seconds flush to disk

14. Page Reclamation LRU – Least Recently Used

15. $ free -g total used free cached Mem: 64 61 3 55 -/+ buffers/cache: 5 58 Swap: 16 0 16 Free

16. MongoDB Memory Mapped Files Page Cache Storage

17. Memory Mapped Files Process File 2000 1000 4000 5000

18. mmapProcess B File Process A

19. MongoDB Memory Mapped Files Page Cache Storage

20. MongoDB Maps everything: documents, indexes, journal Running top:

21. Challenges No control over what is saved in memory Warm-up Expensive queries

22. Mitigation Plan Protect MongoDB with an API Enforce index usage Pass a query timeout (from 2.6) Example of a simple API def find_samples(start_time, end_time): return samples.find({‘time’: {‘$gte’: start_time, ‘$lt’: end_time}})

23. Challenges Lack of Inter-process prioritization Mitigation: isolate mongo Estimate required memory How big is the working set?

24. Working Set Contains: Documents Indexes Padding (!) Doc 1 Doc 2 Doc 3 0 4k Padding

25. Working Set Analysis Planning Monitoring

26. Planning db.samples.stats() dataSize indexSizes ColdWarmHot Month Last 2 weeks 1 week 1 week

27. Monitoring Online top, iostat db.currentOp(), mongostat, mongomem Offline Profiling collection MMS/Graphite

28. Mongomem Top collections: local.oplog.rs 11218 / 49865 MB (22.496883%) [25 extents] samples.quarter 3661 / 219714 MB (1.666450%) [128 extents] samples.hour 1629 / 10921 MB (14.924107%) [26 extents] Total resident pages: 16508 / 280500 MB (5.885%)

29. Mongomem Procedure: Stop the database Clear the page cache: echo 1 > /proc/sys/vm/drop_caches Start the database Run queries that should return fast Run mongomem!

30. What to monitor? Thrashing Page faults Disk utilization Symptoms Queued queries High locking ratios

31. iostat $ iostat –xm 1 /dev/sda Device: r/s w/s rMB/s wMB/s %util sda 570.00 0.00 31.28 0.00 100.00

32. mongostat Uses db.serverStatus() Metrics per second: Page faults Queued reads (qr)

33. Offline monitoring MMS/Graphite Mandatory!

34.

35.

36. Optimization Smaller = faster! Less memory Higher disk throughput Schema Shorten keys firstName -> first -> f Size vs. count

37. Optimizing indices Unused indices Sparse Indices should fit in memory A Index on name: Older Newer Index on creation_time: Z

38. Summary How it works Challenges Monitor Optimize

Editor's Notes

Why should you care about memory management? memory management has a huge impact on performance and costs. This relates both to developers and dbas, as a developer you can optimize the schema and queries for better memory usage, As a dba you can monitor and predict performance issues related to memory usage. I’m pretty sure every mongodb administrator asked himself atleast once: how much memory do I really need?. Before we dive in I want to tell you a little secret: MongoDB doesn’t actually manage memory. It leaves that responsibility to the operating system.
Within the operating system there’s a stack of components which MongoDB depends on to manage memory. Each component relies on the component below it. This talk is structured around this stack of components. We’ll start from the low level components which are storage devices: disks and RAM We’ll continue with the page cache and memory mapped files which are a part of the operating system’s kernel And we’ll finish off with MongoDB’s usage of these mechanisms.
Let’s talk about storage.
There are different types of storage devices with different characteristics, we’ll review hard disk drives, solid state drives and RAM. (!) Let’s start by breaking these into categories: HDDs and SSDs are persistent and RAM isn’t, but RAM is really fast. That’s why every computer has both types of storage, one persistent (a HDD or a SSD) and one is volatile (RAM).
Now let’s compare throughput. As I said before, RAM is fast, it could go as fast as 6400 MBPS for reads and writes. SSDs are 10 times slower than RAM, modern SSDs can reach a read rate of 650 MBPS and a little less for writes. HDDs are much slower, ranging from 1 MB to 160 MB per second for reads and writes. The reason there’s such variance in HDD speed is because throughput is highly affected by access patterns. Specifically with HDDs, random access is much slower than sequential access, and that’s because a HDD contains a mechanical arm that needs to move on almost every random access. Sadly for us, databases do a lot of random I/O. which means, if you’re running a query on data that’s not in memory and therefore, it has to be read from disk, you’re seeing a penalty of about two multitudes on response times. The next characteristic is price. (!) For making the comparison easier we’ll compare the price per GB. It’s not surprising that there’s a correlation between price and throughput, meaning, the more you pay for each GB, you get better throughput. So hard drives are really cheap at 5 cents per GB, SSDs are 10 times more expensive and RAM is 100 times more expensive. This slide reveals the tradeoffs between price, capacity and performance which are key factors in choosing the right hardware configuration.
Is this information sufficient to choose the optimal hardware configuration? I think it’s not, your application’s requirements are also a part of the equation. For example, if your application is an archive that saves huge amounts of data that is rarely accessed, you can go for a large HDD and save a lot of money. Later on we’ll see how can you take measurements of things like RAM and capacity and then you’ll be able to determine what kind of hardware configuration you need.
Before looking at additional tools I want to answer a simple question: how do we know when something is wrong? what do we need to monitor? And since we’re talking about memory, how do we know we don’t have enough of it?. Well, the phenomenon of not having enough memory is called thrashing. When the OS is thrashing, it’s because an application is constantly accessing pages that are not in memory, the OS is busy handling the pagefaults, reading the pages from disk. So the first thing to monitor is page faults, and since it’s hard to tell how many page faults are too much, you should also look at disk utilization. There are a lot of other things that go wrong like a lot of queries being queued and high locking ratios but these just are symptoms
I usually use iostat for looking at disk utlization. Here’s an example output of the command, the rightmost column shows this disk utilization and reveals a disk that is busy a 100% of the time. The second column show the disk serves 570 reads per second and the third column shows the number of writes per second which is zero. If this is happening constantly, the working set does not fit in memory. Along with iostat, I frequently use mongostat
Mongostat comes packaged with MongoDB and uses the underlying serverStatus command. It displays a bunch of interesting metrics like the number of page faults and queued reads. It’s pretty hard to say how many page faults are too much but more than one or two hundread page faults per second are an indication of a lot of data being read from disk. If this happens over long periods of time it could be an indication the working set does not fit in RAM. If the number of queued reads is larger than a hundred over long periods of time it could also be an indication the working set doesn’t fit in RAM. It’s often important to look at these parameters over time in order to determine if there’s a sudden spike or repeating problem. This brings me to offline monitoring.
Tools like the MMS or graphite can show you these important metrics over time. Using one of these tools is mandatory for a production system. I cannot tell you how useful they are. Whenever we get a ticket about a performance problem we put our Sherlock hats on and start an investigation. We look at metrics related to our application but also, a lot of metrics related to mongo and how they change over time: we look at the number of queries, the number of documents in collections and tens of other metrics. I’d like to show you an example workflow of a ticket. It was a beautiful morning, 10 A.M, when I get an automated email that one of our shards is misbehaving, it has more than 300 queries just waiting in queue.
I immediately open graphite, this is a screenshot of the number of page faults in green and the number of queued readers in blue. By looking at the history you can spot two trends: 1. First, there’s a spike of high load every hour. This is actually normal since we’re doing hourly aggregations of our data. 2. The second trend, is a massive rise in page faults and queued queries at exactly 20:00. At this point there’s an impact on users as a lot of queries take a very long time. Why is this happening? Has the working set outgrown memory?
Lets look at another screenshot of the same time frame. This time we look at other metrics: in blue are the numbers of queries, in green are the number of updates, the disk utilization in red. Remember that disk utilization is measured in percentage so even though the graph is lower than others we can still see that at 20:00 the disk was constantly utilized at a 100%. When looking at the updates vs. queries it’s obvious that a huge amount of updates is hurting the query performance. We were busy writing to disk. In this case an application change was the root cause of the problem, the application simply started updating a lot more documents. We were still able to trace it to application and later on changed our schema to reduce the document size and the load on disk. This brings me to next topic which is optimization.
When optimizing memory usage the main target is to reduce the amount of required memory for your application. Smaller the collections and documents are, the faster the queries will be. not just in terms of memory but also disk, if documents are smaller less disk access is required to read them. There are several optimizations you can do when it comes to schema: first, shorten the keys. we’ve started with long names like firstName, then, shortened them to a single word or acronym and finally used one or two letters since it had a huge impact on the size of our data. By shortening the keys we reduced the size of our data in more than 50%. There is a huge downside for doing this because it obscures the data but fortunately, we have an API that hides this ugly implementation detail so it doesn’t have an impact on our users. Another thing to consider is the tradeoff between the number of documents and their size, in many use cases it’s more efficient to store a smaller amount of large documents vs. a large amount of small ones. The next thing you can optimize is indices
First thing you should know is that unused indices are still accessed whenever documents are being inserted, updated or deleted. Try to identify those and remove them. (!) Use sparse indices when only some of the documents will have the indexed attribute as they use less space. (!) The last thing I want to talk about is how much of the index is located in memory. The answer is: it depends. If the entire index is accessed by queries then the entire index should be located in memory. If only a single part of the index is used, only that part has to fit in memory. Lets look at a few examples to emphasize the difference, you can imagine an index as a segment of memory, the red marks are locations frequently accessed by queries. (!) The first example is an index on a date field called creation_time. Each inserted document inserts the largest value of all previous ones so the right most part of the index is updated. In many such indexes only the recent history is often accessed so only the right-most part of the index will be located in memory. (!) The second example is an index on a person’s name, the index accesses will probably distribute evenly across the entire index so most of it will be located in memory.
So lets summarize what we’ve learned: 1. We’ve seen how memory management works, we’ve started from the disk and RAM, went up the stack to the page cache whose sole purpose is to improve read and write performance by using the memory. We continued to memory mapped files which translate memory accesses like reads and writes to file reads and writes. And we finished with MongoDB’s usage of these mechanisms. 2. We’ve talked about the challenges this strategy presents: like predicting and measuring the size of the working set. 3. We then talked about monitoring, which is something you have to do if you have a DB running in production. 4. We finished with schema and index optimizations which are crucial for cutting costs and improving performance. I hope you enjoyed my talk and thanks for having me.

MongoDB Memory Management Demystified

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to MongoDB Memory Management Demystified

Similar to MongoDB Memory Management Demystified (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

MongoDB Memory Management Demystified

Editor's Notes