SlideShare ist ein Scribd-Unternehmen logo
1 von 61
1JUNE 2014
Performance Tuning
on the Fly at CMP.LY
Michael De Lorenzo
CTO, CMP.LY Inc.
michael@cmp.ly
@mikedelorenzo
2JUNE 2014
Agenda
• CMP.LY and CommandPost
• What is MongoDB Management Service?
• Performance Tuning
• MongoDB Issues we’ve faced
• Slow response times and delayed writes
• Unindexed queries
• Rising Replication Lag + Falling oplog Window
• Keep your deployment healthy with MMS
• Using MMS Alerts
• Using MMS Backups
3JUNE 2014
A venture-funded NYC startup that offers proprietary social media, monitoring,
measurement, insight and compliance solutions for Fortune 100
A Monitoring, Measurement & Insights (MMI) tool for managed social
communications.
4JUNE 2014
Use CommandPost to:
• Track and measure cross-platform in real-time
• Identify and attribute high-value engagement
• Analyze and segment engaged audience
• Optimize content and engagement strategies
• Address compliance needs
5JUNE 2014
What is MongoDB
Management Service?
6JUNE 2014
MongoDB Management Service
• Free MongoDB Monitoring
• MongoDB Backup in the Cloud
• Free Cloud service or Available
to run On-Prem for Standard or
Enterprise Subscriptions
• Automation coming soon—FTW!
Ops
Makes MongoDB easier to use and
manage
7JUNE 2014
Who Is MMS for?
• Developers
• Ops Team
• MongoDB Technical Service Team
8JUNE 2014
Performance Tuning
9JUNE 2014
How To Do Performance Tuning?
• Assess the problem and establish acceptable behavior.
• Measure the performance before modification.
• Identify the bottleneck.
• Remove the bottleneck.
• Measure performance after modification to confirm.
• Keep it or revert it and repeat.
Adapted from [http://en.wikipedia.org/wiki/Performance_tuning]
10JUNE 2014
What We’ve Faced
11JUNE 2014
Issues We’ve Faced
• Concurrency Issues
• Slow response times and delayed writes
• Querying without indexes
• Slow reads, timeouts
• Increasing Replication Lag + Plummeting oplog Window
12JUNE 2014
Concurrency
Slow responses and delayed writes
13JUNE 2014
Concurrency
• What is it?
• How did it affect us?
• How did MMS help identify it?
• How did we diagnose the issue in our app and fix it?
• Today
14JUNE 2014
Concurrency in MongoDB
• MongoDB uses a readers-writer lock
• Many read operations can use a read lock
• If a write lock exists, a single write lock holds the lock exclusively
• No other read or write operations can share the lock
• Locks are “writer-greedy”
15JUNE 2014
How Did This Affect Us?
• Slow API response times due to slow database operations
• Delayed writes
• Backed up queues
16JUNE 2014
MMS: Identify Concurrency Issues
17JUNE 2014
Lock % Greater than 100%?!?!?
• Global lock percentage is a derived metric:
% of time in global lock (small number)
+
% of time locked by hottest (“most locked”) database
• Data is sampled and combined, it is possible to see values over 100%.
18JUNE 2014
Diagnosis
• Identified the write-heavy collections in our applications
• Used application logs to identify slow API responses
• Analyzed MongoDB logs to identify slow database queries
19JUNE 2014
Our Remedies
• Schema changes
• Message queues
• Multiple databases
• Sharding
20JUNE 2014
Schema Changes
• Changed our schema
• Allowed for atomic updates
• Customized documents’ _id attribute
• Leveraged existing index on _id attribute
21JUNE 2014
Normalized Schema
// Social Content Collection
{
_id: “12345”,
_type: “tweet”,
text: “Welcome to #MongoDBWorld!”
twitter_user: “mongodb”
}
// Campaign Collection
{
_id: “mongodbworld_campaign”, name: “MongoDB World”
}
// Campaign Content Collection (joins content + campaigns)
{
campaign_id: “#mongodbworld_campaign”, content_id: “12345”
}
22JUNE 2014
Denormalized Schema
// Social Content Collection
{
"_id": “tweet_123456789”,
“text”: 'Welcome to #MongoDBWorld!'
“twitter_user”: 'mongodb',
“campaigns”: [“mongodbworld_campaign”]
}
23JUNE 2014
Modeling for Atomic Operations
Document
{
"_id": “tweet_123456789”,
"text": "Welcome to #MongoDBWorld!",
"twitter_user": "mongodb",
“campaigns": [ ]
}
Update Operation
db.social_content.update(
{
_id: “tweet_123456789”
},
{
$addToSet: {
campaigns: ”mongodbworld_campaign”
}
}
);
Result
WriteResult({
"nMatched": 1,"nUpserted”:0,"nModified": 1
})
24JUNE 2014
Message Queues
• Controlled writes to specific collections using Pub/Sub
• We chose Amazon SQS
• Other options include Redis, Beanstalkd, IronMQ or any other message queue
• Created consistent flow of writes versus bursts
• Reduced length and frequency of write locks by controlling flow/speed of writes
25JUNE 2014
Using Multiple Databases
• As of version 2.2, MongoDB implements locks at a per database granularity for
most read and write operations
• Planned to be at the document level in version 2.8
• Moved write-heavy collections to new (separate) databases
26JUNE 2014
Using Sharding
• Improves concurrency by distributing databases across multiple mongod
instances
• Locks are per-mongod instance
27JUNE 2014
Lock %: Today
28JUNE 2014
Queries without Indexes
Slow responses and timeouts
29JUNE 2014
Indexing
• What is it?
• How did it affect us?
• How did MMS help identify it?
• How did we diagnose the issue in our app and fix it?
• Today
30JUNE 2014
Indexing with MongoDB
• Support for efficient execution of queries
• Without indexes, MongoDB must scan every document
• Example
Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0
nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros)
r:46877422 nreturned:16 reslen:6948 38172ms
38 seconds! Scanned 17k documents, returned 16
• Create indexes to cover all queries, especially support common and user-facing
• Collection scans can push entire working set out of RAM
31JUNE 2014
How Did this Affect Us?
• Our web apps became slow
• Queries began to timeout
• Longer operations mean longer lock times
32JUNE 2014
MMS: Identifying Indexing Issues
Page Faults
• The number of times that
MongoDB requires data
not located in physical
memory, and must read
from virtual memory.
33JUNE 2014
Diagnosis
• Log Analysis
• Use mtools
A collection of scripts to parse and visualize MongoDB log files developed by
MongoDB Engineer Thomas Rueckstiess.
• mlogfilter
• filter logs for slow queries, collection scans, etc.
• mplotqueries
• graph query response times and volumes
• https://github.com/rueckstiess/mtools
34JUNE 2014
Diagnosis
• Monitoring application logs
• Enabling ‘notablescan’ option in development and testing versions of apps
• MongoDB profiling
35JUNE 2014
The MongoDB Profiler
• Collects fine grained data about MongoDB write operations, cursors, database
commands on a running mongod instance.
• Default slowOpThreshold value is 100ms, can be changed from the Mongo shell
• When enabled, profiling has a minor effect on performance
36JUNE 2014
Our Remedies
• Add indexes!
• Make sure queries are covered
• Utilize the projection specification to limit fields (data) returned
37JUNE 2014
Adding Indexes
• Improved performance for common queries
• Alleviates the need to go to disk for many operations
38JUNE 2014
Projection Specification
Controls the amount of data that needs to be (de-)serialized for use in your app
• We used it to limit data returned in embedded documents and arrays
db.content.find(
{
tweet_id: ’12345678'
},
{
text: 1, screen_name: 1
});
39JUNE 2014
Page Faults: Today
40JUNE 2014
Rising Replication Lag +
Falling oplog Window
41JUNE 2014
Replication
• What is it?
• How did it affect us?
• How did MMS help identify it?
• How did we diagnose the issue in our app?
• How did we fix it?
• Today
42JUNE 2014
What is Replication?
• A replica set is a group of mongod
processes that maintain the same data
set.
• Replica sets provide redundancy and
high availability, and are the basis for all
production deployments
43JUNE 2014
What Is the Oplog?
• A special capped collection that keeps a rolling record of all operations that
modify the data stored in your databases.
• Operations are first applied on the primary and then recorded to its oplog.
• Secondary members then copy and apply these operations in an asynchronous
process.
44JUNE 2014
What is Replication Lag?
• A delay between an operation on the primary and the application of that
operation from the oplog to the secondary.
• Effects of excessive lag
• “Lagged” members ineligible to quickly become primary
• Increases the possibility that distributed read operations will be inconsistent.
45JUNE 2014
How did this affect us?
• Degraded overall health of our production deployment.
• Distributed reads are no longer eventually consistent.
• Unable to bring new secondary members online.
• Caused MMS Backups to do full re-syncs.
46JUNE 2014
Identifying Replication Lag Issues
with MMS
The Replication Lag chart displays the lag for your deployment
47JUNE 2014
Diagnosis
• Possible causes of replication lag include network latency, disk throughput,
concurrency and/or appropriate write concern
• Size of operations to be replicated
• Confirmed Non-Issues for us
• Network latency
• Disk throughput
• Possible Issues for us
• Concurrency/write concern
• Size of op is an issue because entire document is written to oplog
48JUNE 2014
Concurrency/Write Concern
• Our applications apply many updates very quickly
• All operations need to be replicated to secondary members
• We use the default write concern—Acknowledge (w:1)
• The mongod confirms receipt of the write operation
• Allows clients to catch network, duplicate key and other errors
49JUNE 2014
Concurrency Wasn’t the Issue
Lock Percentage
50JUNE 2014
Operation Size Was the Issue
Collection A (most active)
Total Updates: 3,373
Total Size of updates: 6.5 GB
Activity accounted for nearly 87% of total traffic
Collection B (next most active)
Total Updates: 85,423
Total Size of updates: 740 MB
51JUNE 2014
Fast Growing oplog causes issues
Replication oplog Window – approximate hours available in the primary’s oplog
52JUNE 2014
How We Fixed It
• Changed our schema
• Changed the types of updates that were made to documents
• Both allowed us to utilize atomic operations
• Led to smaller updates
• Smaller updates == less oplog space used
53JUNE 2014
Replication Lag: Today
54JUNE 2014
oplog Window: Today
55JUNE 2014
Keeping Your Deployment
Healthy
56JUNE 2014
MMS Alerts
57JUNE 2014
Watch for Warnings
• Be warned if you are
• Running outdated versions
• Have startup warnings
• If a mongod is publicly visible
• Pay attention to these warnings
58JUNE 2014
MMS Backups
• Engineered by MongoDB
• Continuous backup with point-in-time recovery
• Fully managed backups
59JUNE 2014
Using MMS Backups
• Seeding new secondaries
• Repairing replica set members
• Development and testing databases
• Restores are free!
60JUNE 2014
Summary
• Know what’s expected and “normal” in your systems
• Know when and what changes in your systems
• Utilize MMS alerts, visualizations and warnings to keep things running smoothly
61JUNE 2014
Questions?
Michael De Lorenzo
CTO, CMP.LY Inc.
michael@cmp.ly
@mikedelorenzo

Weitere ähnliche Inhalte

Andere mochten auch

Bal Vikas Sangh
Bal Vikas SanghBal Vikas Sangh
Bal Vikas Sanghdivya0021
 
Our Lady Of Perpetual Succour High School
Our Lady Of Perpetual Succour High SchoolOur Lady Of Perpetual Succour High School
Our Lady Of Perpetual Succour High Schooldivya0021
 
Van Bemmel interiors
Van Bemmel interiorsVan Bemmel interiors
Van Bemmel interiorsalexvanbemmel
 
Gurukul event
Gurukul eventGurukul event
Gurukul eventdivya0021
 
Catalogo elledici quaresima pasqua 2011
Catalogo elledici quaresima pasqua 2011Catalogo elledici quaresima pasqua 2011
Catalogo elledici quaresima pasqua 2011Editrice Elledici
 
Trabajo de investigación-Primero Básico
Trabajo de investigación-Primero BásicoTrabajo de investigación-Primero Básico
Trabajo de investigación-Primero BásicoFreddy Caal
 
Clasificación dos materiais
Clasificación dos materiaisClasificación dos materiais
Clasificación dos materiaisCrucifijo19
 
Gladioli nursery
Gladioli nurseryGladioli nursery
Gladioli nurserydivya0021
 
Jaro4sushant
Jaro4sushantJaro4sushant
Jaro4sushantdivya0021
 
Complex regional pain syndrome
Complex regional pain syndromeComplex regional pain syndrome
Complex regional pain syndromedivya0021
 
しみじみサーバーレス
しみじみサーバーレスしみじみサーバーレス
しみじみサーバーレスToru Makabe
 

Andere mochten auch (17)

Euro kids
Euro kidsEuro kids
Euro kids
 
Bal Vikas Sangh
Bal Vikas SanghBal Vikas Sangh
Bal Vikas Sangh
 
Ram kumar ok
Ram kumar okRam kumar ok
Ram kumar ok
 
OLPS
OLPSOLPS
OLPS
 
Our Lady Of Perpetual Succour High School
Our Lady Of Perpetual Succour High SchoolOur Lady Of Perpetual Succour High School
Our Lady Of Perpetual Succour High School
 
Van Bemmel interiors
Van Bemmel interiorsVan Bemmel interiors
Van Bemmel interiors
 
Gurukul event
Gurukul eventGurukul event
Gurukul event
 
Catalogo elledici quaresima pasqua 2011
Catalogo elledici quaresima pasqua 2011Catalogo elledici quaresima pasqua 2011
Catalogo elledici quaresima pasqua 2011
 
Trabajo de investigación-Primero Básico
Trabajo de investigación-Primero BásicoTrabajo de investigación-Primero Básico
Trabajo de investigación-Primero Básico
 
Clasificación dos materiais
Clasificación dos materiaisClasificación dos materiais
Clasificación dos materiais
 
new one
new onenew one
new one
 
Gladioli nursery
Gladioli nurseryGladioli nursery
Gladioli nursery
 
Novità Elledici
Novità  EllediciNovità  Elledici
Novità Elledici
 
Jaro4sushant
Jaro4sushantJaro4sushant
Jaro4sushant
 
this is new
this is newthis is new
this is new
 
Complex regional pain syndrome
Complex regional pain syndromeComplex regional pain syndrome
Complex regional pain syndrome
 
しみじみサーバーレス
しみじみサーバーレスしみじみサーバーレス
しみじみサーバーレス
 

Ähnlich wie Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service

Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyEduardo Piairo
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to MicroservicesMahmoudZidan41
 
DockerDay 2015: From months to minutes - How GE appliances brought docker int...
DockerDay 2015: From months to minutes - How GE appliances brought docker int...DockerDay 2015: From months to minutes - How GE appliances brought docker int...
DockerDay 2015: From months to minutes - How GE appliances brought docker int...Docker-Hanoi
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity PlanningMongoDB
 
Optimize with Open Source
Optimize with Open SourceOptimize with Open Source
Optimize with Open SourceEDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWSMongoDB
 
DevOps and the Future of IT Operations
DevOps and the Future of IT OperationsDevOps and the Future of IT Operations
DevOps and the Future of IT OperationsCorrelsense
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes Abdul Basit Munda
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices M A Hossain Tonu
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity PlanningNorberto Leite
 
Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Renaud Boutet
 
DockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDocker, Inc.
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big timeproitconsult
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...Amazon Web Services
 

Ähnlich wie Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service (20)

Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
DockerDay 2015: From months to minutes - How GE appliances brought docker int...
DockerDay 2015: From months to minutes - How GE appliances brought docker int...DockerDay 2015: From months to minutes - How GE appliances brought docker int...
DockerDay 2015: From months to minutes - How GE appliances brought docker int...
 
Webinar: Capacity Planning
Webinar: Capacity PlanningWebinar: Capacity Planning
Webinar: Capacity Planning
 
Optimize with Open Source
Optimize with Open SourceOptimize with Open Source
Optimize with Open Source
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
DevOps and the Future of IT Operations
DevOps and the Future of IT OperationsDevOps and the Future of IT Operations
DevOps and the Future of IT Operations
 
MWLUG 2017 SA110
MWLUG 2017 SA110MWLUG 2017 SA110
MWLUG 2017 SA110
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes
 
NicetoNodeYou
NicetoNodeYouNicetoNodeYou
NicetoNodeYou
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10
 
DockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to MinutesDockerCon SF 2015: From Months to Minutes
DockerCon SF 2015: From Months to Minutes
 
Scaling apps for the big time
Scaling apps for the big timeScaling apps for the big time
Scaling apps for the big time
 
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
AWS re:Invent 2016: Moving Mission Critical Apps from One Region to Multi-Reg...
 
Azure Functions
Azure FunctionsAzure Functions
Azure Functions
 

Kürzlich hochgeladen

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 

Kürzlich hochgeladen (20)

Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 

Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service

  • 1. 1JUNE 2014 Performance Tuning on the Fly at CMP.LY Michael De Lorenzo CTO, CMP.LY Inc. michael@cmp.ly @mikedelorenzo
  • 2. 2JUNE 2014 Agenda • CMP.LY and CommandPost • What is MongoDB Management Service? • Performance Tuning • MongoDB Issues we’ve faced • Slow response times and delayed writes • Unindexed queries • Rising Replication Lag + Falling oplog Window • Keep your deployment healthy with MMS • Using MMS Alerts • Using MMS Backups
  • 3. 3JUNE 2014 A venture-funded NYC startup that offers proprietary social media, monitoring, measurement, insight and compliance solutions for Fortune 100 A Monitoring, Measurement & Insights (MMI) tool for managed social communications.
  • 4. 4JUNE 2014 Use CommandPost to: • Track and measure cross-platform in real-time • Identify and attribute high-value engagement • Analyze and segment engaged audience • Optimize content and engagement strategies • Address compliance needs
  • 5. 5JUNE 2014 What is MongoDB Management Service?
  • 6. 6JUNE 2014 MongoDB Management Service • Free MongoDB Monitoring • MongoDB Backup in the Cloud • Free Cloud service or Available to run On-Prem for Standard or Enterprise Subscriptions • Automation coming soon—FTW! Ops Makes MongoDB easier to use and manage
  • 7. 7JUNE 2014 Who Is MMS for? • Developers • Ops Team • MongoDB Technical Service Team
  • 9. 9JUNE 2014 How To Do Performance Tuning? • Assess the problem and establish acceptable behavior. • Measure the performance before modification. • Identify the bottleneck. • Remove the bottleneck. • Measure performance after modification to confirm. • Keep it or revert it and repeat. Adapted from [http://en.wikipedia.org/wiki/Performance_tuning]
  • 11. 11JUNE 2014 Issues We’ve Faced • Concurrency Issues • Slow response times and delayed writes • Querying without indexes • Slow reads, timeouts • Increasing Replication Lag + Plummeting oplog Window
  • 13. 13JUNE 2014 Concurrency • What is it? • How did it affect us? • How did MMS help identify it? • How did we diagnose the issue in our app and fix it? • Today
  • 14. 14JUNE 2014 Concurrency in MongoDB • MongoDB uses a readers-writer lock • Many read operations can use a read lock • If a write lock exists, a single write lock holds the lock exclusively • No other read or write operations can share the lock • Locks are “writer-greedy”
  • 15. 15JUNE 2014 How Did This Affect Us? • Slow API response times due to slow database operations • Delayed writes • Backed up queues
  • 16. 16JUNE 2014 MMS: Identify Concurrency Issues
  • 17. 17JUNE 2014 Lock % Greater than 100%?!?!? • Global lock percentage is a derived metric: % of time in global lock (small number) + % of time locked by hottest (“most locked”) database • Data is sampled and combined, it is possible to see values over 100%.
  • 18. 18JUNE 2014 Diagnosis • Identified the write-heavy collections in our applications • Used application logs to identify slow API responses • Analyzed MongoDB logs to identify slow database queries
  • 19. 19JUNE 2014 Our Remedies • Schema changes • Message queues • Multiple databases • Sharding
  • 20. 20JUNE 2014 Schema Changes • Changed our schema • Allowed for atomic updates • Customized documents’ _id attribute • Leveraged existing index on _id attribute
  • 21. 21JUNE 2014 Normalized Schema // Social Content Collection { _id: “12345”, _type: “tweet”, text: “Welcome to #MongoDBWorld!” twitter_user: “mongodb” } // Campaign Collection { _id: “mongodbworld_campaign”, name: “MongoDB World” } // Campaign Content Collection (joins content + campaigns) { campaign_id: “#mongodbworld_campaign”, content_id: “12345” }
  • 22. 22JUNE 2014 Denormalized Schema // Social Content Collection { "_id": “tweet_123456789”, “text”: 'Welcome to #MongoDBWorld!' “twitter_user”: 'mongodb', “campaigns”: [“mongodbworld_campaign”] }
  • 23. 23JUNE 2014 Modeling for Atomic Operations Document { "_id": “tweet_123456789”, "text": "Welcome to #MongoDBWorld!", "twitter_user": "mongodb", “campaigns": [ ] } Update Operation db.social_content.update( { _id: “tweet_123456789” }, { $addToSet: { campaigns: ”mongodbworld_campaign” } } ); Result WriteResult({ "nMatched": 1,"nUpserted”:0,"nModified": 1 })
  • 24. 24JUNE 2014 Message Queues • Controlled writes to specific collections using Pub/Sub • We chose Amazon SQS • Other options include Redis, Beanstalkd, IronMQ or any other message queue • Created consistent flow of writes versus bursts • Reduced length and frequency of write locks by controlling flow/speed of writes
  • 25. 25JUNE 2014 Using Multiple Databases • As of version 2.2, MongoDB implements locks at a per database granularity for most read and write operations • Planned to be at the document level in version 2.8 • Moved write-heavy collections to new (separate) databases
  • 26. 26JUNE 2014 Using Sharding • Improves concurrency by distributing databases across multiple mongod instances • Locks are per-mongod instance
  • 28. 28JUNE 2014 Queries without Indexes Slow responses and timeouts
  • 29. 29JUNE 2014 Indexing • What is it? • How did it affect us? • How did MMS help identify it? • How did we diagnose the issue in our app and fix it? • Today
  • 30. 30JUNE 2014 Indexing with MongoDB • Support for efficient execution of queries • Without indexes, MongoDB must scan every document • Example Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms 38 seconds! Scanned 17k documents, returned 16 • Create indexes to cover all queries, especially support common and user-facing • Collection scans can push entire working set out of RAM
  • 31. 31JUNE 2014 How Did this Affect Us? • Our web apps became slow • Queries began to timeout • Longer operations mean longer lock times
  • 32. 32JUNE 2014 MMS: Identifying Indexing Issues Page Faults • The number of times that MongoDB requires data not located in physical memory, and must read from virtual memory.
  • 33. 33JUNE 2014 Diagnosis • Log Analysis • Use mtools A collection of scripts to parse and visualize MongoDB log files developed by MongoDB Engineer Thomas Rueckstiess. • mlogfilter • filter logs for slow queries, collection scans, etc. • mplotqueries • graph query response times and volumes • https://github.com/rueckstiess/mtools
  • 34. 34JUNE 2014 Diagnosis • Monitoring application logs • Enabling ‘notablescan’ option in development and testing versions of apps • MongoDB profiling
  • 35. 35JUNE 2014 The MongoDB Profiler • Collects fine grained data about MongoDB write operations, cursors, database commands on a running mongod instance. • Default slowOpThreshold value is 100ms, can be changed from the Mongo shell • When enabled, profiling has a minor effect on performance
  • 36. 36JUNE 2014 Our Remedies • Add indexes! • Make sure queries are covered • Utilize the projection specification to limit fields (data) returned
  • 37. 37JUNE 2014 Adding Indexes • Improved performance for common queries • Alleviates the need to go to disk for many operations
  • 38. 38JUNE 2014 Projection Specification Controls the amount of data that needs to be (de-)serialized for use in your app • We used it to limit data returned in embedded documents and arrays db.content.find( { tweet_id: ’12345678' }, { text: 1, screen_name: 1 });
  • 40. 40JUNE 2014 Rising Replication Lag + Falling oplog Window
  • 41. 41JUNE 2014 Replication • What is it? • How did it affect us? • How did MMS help identify it? • How did we diagnose the issue in our app? • How did we fix it? • Today
  • 42. 42JUNE 2014 What is Replication? • A replica set is a group of mongod processes that maintain the same data set. • Replica sets provide redundancy and high availability, and are the basis for all production deployments
  • 43. 43JUNE 2014 What Is the Oplog? • A special capped collection that keeps a rolling record of all operations that modify the data stored in your databases. • Operations are first applied on the primary and then recorded to its oplog. • Secondary members then copy and apply these operations in an asynchronous process.
  • 44. 44JUNE 2014 What is Replication Lag? • A delay between an operation on the primary and the application of that operation from the oplog to the secondary. • Effects of excessive lag • “Lagged” members ineligible to quickly become primary • Increases the possibility that distributed read operations will be inconsistent.
  • 45. 45JUNE 2014 How did this affect us? • Degraded overall health of our production deployment. • Distributed reads are no longer eventually consistent. • Unable to bring new secondary members online. • Caused MMS Backups to do full re-syncs.
  • 46. 46JUNE 2014 Identifying Replication Lag Issues with MMS The Replication Lag chart displays the lag for your deployment
  • 47. 47JUNE 2014 Diagnosis • Possible causes of replication lag include network latency, disk throughput, concurrency and/or appropriate write concern • Size of operations to be replicated • Confirmed Non-Issues for us • Network latency • Disk throughput • Possible Issues for us • Concurrency/write concern • Size of op is an issue because entire document is written to oplog
  • 48. 48JUNE 2014 Concurrency/Write Concern • Our applications apply many updates very quickly • All operations need to be replicated to secondary members • We use the default write concern—Acknowledge (w:1) • The mongod confirms receipt of the write operation • Allows clients to catch network, duplicate key and other errors
  • 49. 49JUNE 2014 Concurrency Wasn’t the Issue Lock Percentage
  • 50. 50JUNE 2014 Operation Size Was the Issue Collection A (most active) Total Updates: 3,373 Total Size of updates: 6.5 GB Activity accounted for nearly 87% of total traffic Collection B (next most active) Total Updates: 85,423 Total Size of updates: 740 MB
  • 51. 51JUNE 2014 Fast Growing oplog causes issues Replication oplog Window – approximate hours available in the primary’s oplog
  • 52. 52JUNE 2014 How We Fixed It • Changed our schema • Changed the types of updates that were made to documents • Both allowed us to utilize atomic operations • Led to smaller updates • Smaller updates == less oplog space used
  • 55. 55JUNE 2014 Keeping Your Deployment Healthy
  • 57. 57JUNE 2014 Watch for Warnings • Be warned if you are • Running outdated versions • Have startup warnings • If a mongod is publicly visible • Pay attention to these warnings
  • 58. 58JUNE 2014 MMS Backups • Engineered by MongoDB • Continuous backup with point-in-time recovery • Fully managed backups
  • 59. 59JUNE 2014 Using MMS Backups • Seeding new secondaries • Repairing replica set members • Development and testing databases • Restores are free!
  • 60. 60JUNE 2014 Summary • Know what’s expected and “normal” in your systems • Know when and what changes in your systems • Utilize MMS alerts, visualizations and warnings to keep things running smoothly
  • 61. 61JUNE 2014 Questions? Michael De Lorenzo CTO, CMP.LY Inc. michael@cmp.ly @mikedelorenzo

Hinweis der Redaktion

  1. Free MongoDB Monitoring - mongodb specific metrics, visualization of performance, custom alerting Backup - industrial strength, point-in-time recovery, free usage tier
  2. Developers, what we’re focused on today – track bottlenecks Ops team :: great for small teams where your developers are also part of your ops team (DevOps) – monitor health of clusters, backup dbs, automate updates and add capacity MongoDB technical service team :: helps them help you Important for us because we maintain a small tech team
  3. PRO-TIP: Know what is “normal” for you system. Know what changed when something happens, what do you expect to be normal behavior, what are you normal MMS metrics
  4. readers-writer lock allows concurrent read access to the db, but exclusive access to a single write “Writer-greedy” - When both a read and write are waiting for a lock, MongoDB grants the lock to the write. The exclusivity of write locks is one of the keys to why getting our lock % under control is so important.
  5. Lock % time spent in write lock state; sum of global lock + hottest database at that time, can make value > 100% Our Issue: Primary database maintaining a write lock of 150-175% of the time
  6. Global lock percentage has remained about the same Primary client-facing database has seen lock % drop
  7. Developed by a MongoDB engineer
  8. - Purple bar indicates downtime
  9. - Alerts for down hosts, down agents and more
  10. - According to Technical Services, In many cases, fixing warnings will fix issues