SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Technical Support Manager, North America @ 10gen
Nicholas Tang
#MongoDB
Performance Tuning and
Monitoring Using MMS
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Agenda
• What is MMS?
• Why use it?
• Setting it up and getting around
• Performance and monitoring (the fun stuff)
• Wrap up
What is MMS?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What is MMS?
The MongoDB Monitoring Service: a free
service (or software) for monitoring and
management
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Metric collection and
reporting
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Alerting
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Event Tracking
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Logs and Profile data
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Hardware stats (CPU, disk)
Performance Tuning and Monitoring Using MMS, Nicholas Tang
DB stats
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Basic user management
What’s in it for me?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Why?
• Great high level view + detailed metrics
• Low effort, high-return
• Makes it easier for us to help you!
• Makes you more attractive, promotes bone
strength and muscle tone *
* - these last points still under review
How do I use this crazy
thing?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Setting it up
http://mms.10gen.com/help/monitoring/tutorial/
• Setup an account
• Install the agent
• Add your hosts
• Optional: hardware stats through munin-node
• Optional: enable logging and profiling
• More info:
http://mms.10gen.com/help/monitoring/install/
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Notes
• Agent written in Python (moving to Go)
• Failover: run multiple agents (1 primary)
• Hosts: use CNAMEs, especially on AWS!
• You can use a group per env (each needs an
agent)
• Connections are over SSL
• On-Premise solution for Enterprise customers
that don’t want to use the hosted service
Performance tuning
and monitoring
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Finding the bottleneck
Source:http://www.flickr.com/photos/laenulfean/462715479/
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What is performance tuning?
1. Assess the problem and establish acceptable behavior
2. Measure the current performance
3. Find the bottleneck*
4. Remove the bottleneck
5. Re-test to confirm
6. Lather, rinse, repeat
* - (This is often the hard part)
(Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Pro-Tip: know thyself
You have to recognize normal to know when it
isn’t.
Source:http://www.flickr.com/photos/skippy/6853920/
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Some handy metrics to watch
• Memory usage
• Opcounters
• Lock %
• Queues
• Background flush average
• Replication stats
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Scenario:
Customer reports 150,000s of replication lag ==
almost 2 days of lag!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Some common causes of replication lag:
• Secondaries underspecced vs primaries
• Access patterns between primary/ secondaries
• Insufficient bandwidth
• Foreground index builds on secondaries
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Fun fact: oplog idempotency
Operations in the oplog only affect the value once,
so they can be run multiple times safely.
Example: If you increment n from 2 to 3, n = 3 is
fine; n + 1 is not.
Frequent, large updates means a big oplog to
sync. Updates that change sets mean writing the
entire new version of the set to the oplog.
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
• Secondaries underspecced vs primaries
• Access patterns between primary/ secondaries
• Insufficient bandwidth
• Foreground index builds on secondaries
“…when you have eliminated the impossible,whatever remains,however
improbable,must be the truth…” -- Sherlock Holmes
SirArthur Conan Doyle,The Sign of the Four
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: replication lag
Example:
• ~1500 ops per minute (opcounters)
• 0.1 MB per object (average object size, local db)
~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B
=~ 20 mbps required bandwidth
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Remember to use alerts!
Don’t wait until your secondaries fall off your oplog!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Scenario: user-facing web application. Customer
was seeing significant performance degradation
after adding and removing an index from their
replicaset.
Their replicaset had 2 visible data-bearing nodes,
each on real hardware, with dedicated 15K RPM
disks and a significant amount of RAM.
Why were things slow?
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Opcounters: queries rose a bit but writes were
flat…
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Background flush average: went up
considerably!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Queues: also went up considerably!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Journal stats: went up much higher than the
ops…
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Connections: also went up…
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Background flush average: consistent until then
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Opcounters: interesting… around July 9th
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Page faults: something’s going on!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Local DB average object size: growing!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Now what?
Time to analyze the logs – what query or queries
were going crazy? And what sort of query would
grow in size without growing significantly in
volume?
Remember: growing disk latency (maybe caused
by page faults?) and journal/ oplog entries growing
even though inserts/ updates were flat.
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Log analysis
The best tools for analyzing MongoDB logs are
included in mtools*:
• mlogfilter (filter logs for slow queries, table scans,
etc…)
• mplotqueries (graph query response times and
volumes)
* https://github.com/rueckstiess/mtools
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Log analysis (example syntax)
Show me queries that took more than 1000 ms
from 6 am to 6 pm:
mlogfilter mongodb.log --from 06:00 --to
18:00 --slow 1000 > mongodb-filtered.log
Now, graph those queries:
mplotqueries --logscale mongodb-
filtered.log
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Filter more!
--operation
Logarithmic!
--logscale
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Sample query
Wed Jul 17 14:16:44 [conn60560] update x.y query: { e: ”[id1]"
} update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nupdated:1
keyUpdates:1 locks(micros) w:889 6504ms
6.5 seconds to add a single value to a set!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
http://docs.mongodb.org/manual/reference/operator/addTo
Set/
The $addToSet operator adds a value to an array only if the
value is not in the array already. If the value is in the array,
$addToSet returns without modifying the array. Consider the
following example:
db.collection.update( { field: value }, { $addToSet: { field: value1 } } );
Here, $addToSet appends value1 to the array stored in field,
only if value1 is not already a member of this array.
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
https://jira.mongodb.org/browse/SERVER-
8192
“IndexSpec::getKeys() finds the set of index keys for a given document
and index key spec. It's used when inserting / updating / deleting a document
to update the index entries,and also for performing in memorysorts,deduping
$or clauses and for other purposes.
Right now extracting 10k elements from a nested object field within an array
takes on the order of seconds on a decentlyfast machine.We could see how
much we can optimize the implementation.”
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
What else?!
Wed Jul 17 14:11:59 [conn56541] update x.y query: { e: ”[id1]"
} update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nmoved:1
nupdated:1 keyUpdates:0 locks(micros) w:85145 11768ms
Almost 12 seconds! This time, there’s
“nmoved:1”, too. This means a document was
moved on disk – it outgrew the space allocated for
it.
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
But wait, there’s more!
Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16
ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0
numYields: 906 locks(micros) r:46877422 nreturned:16
reslen:6948 38172ms
38 seconds! Scanned 17k documents, returned
16.
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
What next?
Short term fix: disable the new feature for the
heaviest users! After that:
• rework the code to avoid $addToSet
• add indexes for queries scanning collections
• use powerOf2Sizes* to reduce fragmentation/
document moves
* http://docs.mongodb.org/manual/reference/command/collMod/
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Example: slow performance
Did it work?
(Yes.)
(So far. ;) )
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Memory: resident vs virtual vs (non-)mapped
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Page faults and Record Stats
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Examining memory and disk
Background flush and Disk IO
(Checkout http://www.wmarrow.com/strcalc/ )
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Monitoring: watch for
warnings
MMS warns you if your systems have startup
warnings or if they are running outdated versions.
Don’t ignore these!
Wrapping up
Performance Tuning and Monitoring Using MMS, Nicholas Tang
What’s next?
• Visual update (June 3rd)
• Backup service (join the queue!)
• More UI/ UX improvements:
– Enhanced dashboards
– Improved cluster view
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Summary
• MMS is a great, free service
• Setup is easy
• Metrics are awesome, preventing failures even
more awesome
• There’s more functionality coming soon!
Performance Tuning and Monitoring Using MMS, Nicholas Tang
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Dana Elisabeth Groce
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...DataStax
 
MongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)John Adams
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootouttomdc
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB
 
Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service
Performance Tuning On the Fly at CMP.LY Using MongoDB Management ServicePerformance Tuning On the Fly at CMP.LY Using MongoDB Management Service
Performance Tuning On the Fly at CMP.LY Using MongoDB Management ServiceMichael De Lorenzo
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13DECK36
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment, Inc.
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinarMongoDB
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 

Was ist angesagt? (20)

Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2Webinar: Best Practices for Upgrading to MongoDB 3.2
Webinar: Best Practices for Upgrading to MongoDB 3.2
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
PlayStation and Searchable Cassandra Without Solr (Dustin Pham & Alexander Fi...
 
MongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the CloudMongoDB World 2018: Enterprise Security in the Cloud
MongoDB World 2018: Enterprise Security in the Cloud
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
STACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSISSTACK OVERFLOW DATASET ANALYSIS
STACK OVERFLOW DATASET ANALYSIS
 
Log Files
Log FilesLog Files
Log Files
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Mumak
MumakMumak
Mumak
 
Open Source Monitoring Tools Shootout
Open Source Monitoring Tools ShootoutOpen Source Monitoring Tools Shootout
Open Source Monitoring Tools Shootout
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storageMongoDB Europe 2016 - Deploying MongoDB on NetApp storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
 
Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service
Performance Tuning On the Fly at CMP.LY Using MongoDB Management ServicePerformance Tuning On the Fly at CMP.LY Using MongoDB Management Service
Performance Tuning On the Fly at CMP.LY Using MongoDB Management Service
 
Log everything! @DC13
Log everything! @DC13Log everything! @DC13
Log everything! @DC13
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 

Ähnlich wie MongoDB performance tuning and monitoring with MMS

Performance Tuning and Monitoring Using MMS
Performance Tuning and Monitoring Using MMSPerformance Tuning and Monitoring Using MMS
Performance Tuning and Monitoring Using MMSMongoDB
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDBMongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDBMongoDB
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mydbops
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...Daniel Martin
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Codemotion
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysqlliufabin 66688
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUGslandelle
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at ParseMongoDB
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Demi Ben-Ari
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingSolarWinds
 

Ähnlich wie MongoDB performance tuning and monitoring with MMS (20)

Performance Tuning and Monitoring Using MMS
Performance Tuning and Monitoring Using MMSPerformance Tuning and Monitoring Using MMS
Performance Tuning and Monitoring Using MMS
 
MongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and MonitoringMongoDB Performance Tuning and Monitoring
MongoDB Performance Tuning and Monitoring
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDBMongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
MongoDB in Denver: How Global Healthcare Exchange is Using MongoDB
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
 
Silicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDBSilicon Valley Code Camp 2014 - Advanced MongoDB
Silicon Valley Code Camp 2014 - Advanced MongoDB
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
IBM Insight 2013 - Aetna's production experience using IBM DB2 Analytics Acce...
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
Search on the fly: how to lighten your Big Data - Simona Russo, Auro Rolle - ...
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Gatling - Bordeaux JUG
Gatling - Bordeaux JUGGatling - Bordeaux JUG
Gatling - Bordeaux JUG
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Berlin 2017
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
 

Kürzlich hochgeladen

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

MongoDB performance tuning and monitoring with MMS

  • 1. Technical Support Manager, North America @ 10gen Nicholas Tang #MongoDB Performance Tuning and Monitoring Using MMS
  • 2. Performance Tuning and Monitoring Using MMS, Nicholas Tang Agenda • What is MMS? • Why use it? • Setting it up and getting around • Performance and monitoring (the fun stuff) • Wrap up
  • 4. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is MMS? The MongoDB Monitoring Service: a free service (or software) for monitoring and management
  • 5. Performance Tuning and Monitoring Using MMS, Nicholas Tang Metric collection and reporting
  • 6. Performance Tuning and Monitoring Using MMS, Nicholas Tang Alerting
  • 7. Performance Tuning and Monitoring Using MMS, Nicholas Tang Event Tracking
  • 8. Performance Tuning and Monitoring Using MMS, Nicholas Tang Logs and Profile data
  • 9. Performance Tuning and Monitoring Using MMS, Nicholas Tang Hardware stats (CPU, disk)
  • 10. Performance Tuning and Monitoring Using MMS, Nicholas Tang DB stats
  • 11. Performance Tuning and Monitoring Using MMS, Nicholas Tang Basic user management
  • 12. What’s in it for me?
  • 13. Performance Tuning and Monitoring Using MMS, Nicholas Tang Why? • Great high level view + detailed metrics • Low effort, high-return • Makes it easier for us to help you! • Makes you more attractive, promotes bone strength and muscle tone * * - these last points still under review
  • 14. How do I use this crazy thing?
  • 15. Performance Tuning and Monitoring Using MMS, Nicholas Tang Setting it up http://mms.10gen.com/help/monitoring/tutorial/ • Setup an account • Install the agent • Add your hosts • Optional: hardware stats through munin-node • Optional: enable logging and profiling • More info: http://mms.10gen.com/help/monitoring/install/
  • 16. Performance Tuning and Monitoring Using MMS, Nicholas Tang Notes • Agent written in Python (moving to Go) • Failover: run multiple agents (1 primary) • Hosts: use CNAMEs, especially on AWS! • You can use a group per env (each needs an agent) • Connections are over SSL • On-Premise solution for Enterprise customers that don’t want to use the hosted service
  • 18. Performance Tuning and Monitoring Using MMS, Nicholas Tang Finding the bottleneck Source:http://www.flickr.com/photos/laenulfean/462715479/
  • 19. Performance Tuning and Monitoring Using MMS, Nicholas Tang What is performance tuning? 1. Assess the problem and establish acceptable behavior 2. Measure the current performance 3. Find the bottleneck* 4. Remove the bottleneck 5. Re-test to confirm 6. Lather, rinse, repeat * - (This is often the hard part) (Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
  • 20. Performance Tuning and Monitoring Using MMS, Nicholas Tang Pro-Tip: know thyself You have to recognize normal to know when it isn’t. Source:http://www.flickr.com/photos/skippy/6853920/
  • 21. Performance Tuning and Monitoring Using MMS, Nicholas Tang Some handy metrics to watch • Memory usage • Opcounters • Lock % • Queues • Background flush average • Replication stats
  • 22. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Scenario: Customer reports 150,000s of replication lag == almost 2 days of lag!
  • 23. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Some common causes of replication lag: • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries
  • 24. Performance Tuning and Monitoring Using MMS, Nicholas Tang Fun fact: oplog idempotency Operations in the oplog only affect the value once, so they can be run multiple times safely. Example: If you increment n from 2 to 3, n = 3 is fine; n + 1 is not. Frequent, large updates means a big oplog to sync. Updates that change sets mean writing the entire new version of the set to the oplog.
  • 25. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag • Secondaries underspecced vs primaries • Access patterns between primary/ secondaries • Insufficient bandwidth • Foreground index builds on secondaries “…when you have eliminated the impossible,whatever remains,however improbable,must be the truth…” -- Sherlock Holmes SirArthur Conan Doyle,The Sign of the Four
  • 26. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: replication lag Example: • ~1500 ops per minute (opcounters) • 0.1 MB per object (average object size, local db) ~1500 ops/min / 60 seconds * 0.1 MB/op * 8b/B =~ 20 mbps required bandwidth
  • 27. Performance Tuning and Monitoring Using MMS, Nicholas Tang Remember to use alerts! Don’t wait until your secondaries fall off your oplog!
  • 28. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Scenario: user-facing web application. Customer was seeing significant performance degradation after adding and removing an index from their replicaset. Their replicaset had 2 visible data-bearing nodes, each on real hardware, with dedicated 15K RPM disks and a significant amount of RAM. Why were things slow?
  • 29. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: queries rose a bit but writes were flat…
  • 30. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: went up considerably!
  • 31. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Queues: also went up considerably!
  • 32. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Journal stats: went up much higher than the ops…
  • 33. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Connections: also went up…
  • 34. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Background flush average: consistent until then
  • 35. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Opcounters: interesting… around July 9th
  • 36. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Page faults: something’s going on!
  • 37. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Local DB average object size: growing!
  • 38. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Now what? Time to analyze the logs – what query or queries were going crazy? And what sort of query would grow in size without growing significantly in volume? Remember: growing disk latency (maybe caused by page faults?) and journal/ oplog entries growing even though inserts/ updates were flat.
  • 39. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis The best tools for analyzing MongoDB logs are included in mtools*: • mlogfilter (filter logs for slow queries, table scans, etc…) • mplotqueries (graph query response times and volumes) * https://github.com/rueckstiess/mtools
  • 40. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Log analysis (example syntax) Show me queries that took more than 1000 ms from 6 am to 6 pm: mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log Now, graph those queries: mplotqueries --logscale mongodb- filtered.log
  • 41. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance
  • 42. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Filter more! --operation Logarithmic! --logscale
  • 43. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Sample query Wed Jul 17 14:16:44 [conn60560] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nupdated:1 keyUpdates:1 locks(micros) w:889 6504ms 6.5 seconds to add a single value to a set!
  • 44. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance http://docs.mongodb.org/manual/reference/operator/addTo Set/ The $addToSet operator adds a value to an array only if the value is not in the array already. If the value is in the array, $addToSet returns without modifying the array. Consider the following example: db.collection.update( { field: value }, { $addToSet: { field: value1 } } ); Here, $addToSet appends value1 to the array stored in field, only if value1 is not already a member of this array.
  • 45. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance https://jira.mongodb.org/browse/SERVER- 8192 “IndexSpec::getKeys() finds the set of index keys for a given document and index key spec. It's used when inserting / updating / deleting a document to update the index entries,and also for performing in memorysorts,deduping $or clauses and for other purposes. Right now extracting 10k elements from a nested object field within an array takes on the order of seconds on a decentlyfast machine.We could see how much we can optimize the implementation.”
  • 46. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What else?! Wed Jul 17 14:11:59 [conn56541] update x.y query: { e: ”[id1]" } update: { $addToSet: { fr: ”[id2]" } } nscanned:1 nmoved:1 nupdated:1 keyUpdates:0 locks(micros) w:85145 11768ms Almost 12 seconds! This time, there’s “nmoved:1”, too. This means a document was moved on disk – it outgrew the space allocated for it.
  • 47. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance But wait, there’s more! Wed Jul 17 13:40:14 [conn28600] query x.y [snip] ntoreturn:16 ntoskip:0 nscanned:16779 scanAndOrder:1 keyUpdates:0 numYields: 906 locks(micros) r:46877422 nreturned:16 reslen:6948 38172ms 38 seconds! Scanned 17k documents, returned 16.
  • 48. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance What next? Short term fix: disable the new feature for the heaviest users! After that: • rework the code to avoid $addToSet • add indexes for queries scanning collections • use powerOf2Sizes* to reduce fragmentation/ document moves * http://docs.mongodb.org/manual/reference/command/collMod/
  • 49. Performance Tuning and Monitoring Using MMS, Nicholas Tang Example: slow performance Did it work? (Yes.) (So far. ;) )
  • 50. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Memory: resident vs virtual vs (non-)mapped
  • 51. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Page faults and Record Stats
  • 52. Performance Tuning and Monitoring Using MMS, Nicholas Tang Examining memory and disk Background flush and Disk IO (Checkout http://www.wmarrow.com/strcalc/ )
  • 53. Performance Tuning and Monitoring Using MMS, Nicholas Tang Monitoring: watch for warnings MMS warns you if your systems have startup warnings or if they are running outdated versions. Don’t ignore these!
  • 55. Performance Tuning and Monitoring Using MMS, Nicholas Tang What’s next? • Visual update (June 3rd) • Backup service (join the queue!) • More UI/ UX improvements: – Enhanced dashboards – Improved cluster view
  • 56. Performance Tuning and Monitoring Using MMS, Nicholas Tang Summary • MMS is a great, free service • Setup is easy • Metrics are awesome, preventing failures even more awesome • There’s more functionality coming soon!
  • 57. Performance Tuning and Monitoring Using MMS, Nicholas Tang Questions?

Hinweis der Redaktion

  1. Show of hands: who is responsible in some ways for monitoring? Who has used Nagios, Cacti, Zenoss, Graphite, or some other similar tools?
  2. 5-10 minutes for What, Why, and How, and then the rest of the time to Performance and monitoring and the wrap-up.Talk a little bit about why it’s helpful for 10gen support.
  3. Understand the components (i.e. potential bottlenecks)Test and measure each oneWatch performance before, during, after the testsWatch trends over time
  4. Know your environment – a critical piece of understanding what changed is to know the way things were before. The great thing about MMS is that not only does it provide you with what’s happening right now, but it also provides you with history – the sort of context you need to be able to identify changes, which is a critical piece of finding and fixing bottlenecks.
  5. Memory: get back to thisOpcounters: commands, queries, etc. per time unitLock%: Time spent in a write-lock state, global time == global lock + hottest database.Queues: operations waiting for read lock, write lock, or global lock (total).Background flush: time it takes to flush the journal to disk (via fsync) – by default once per minute, so the closer to 60s, the bigger the problem.Repl Lag: number of seconds secondary is behind primary in writing each oplog entry.Replica: number of hours of oplog on the primary
  6. We had a customer report replication lag – almost 150,00 seconds of it. We examined their systems – checked CPU, checked IO capacity, checked network utilization, and had them do an initial sync via data file copy, and nothing worked – even though the systems seemed fine.
  7. Background index creation on secondaries: fixed in 2.6
  8. Background index creation on secondaries: fixed in 2.6
  9. NOTE: That’s the minimum required assuming no overhead, no competing traffic, nothing else… and that’s just to keep up!In the customer’s case, they had huge updates, which since the oplog is idempotent, meant huge oplog entries, and it turns out the bandwidth required was 3x their available bandwidth (30 mbps vs 10 mbps).
  10. Moral of the story: pay attention to these things, get alerted when they first start to go south, and you can resolve them before things blow up at 3 am.
  11. Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  12. Blue: commands, purple: queries, green: updates, orange: deletes, red: getmores, yellow: inserts
  13. Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  14. Memory: resident vs. virtual vs. mapped vs. non-mapped (connections)Page faults: accessing a page of memory that is in virtual memory but not resident in physical memory. Page fault on normal spinning disk is ~40k slower than direct memory access. However, the size of page faults also matters: 100 small page faults/ sec might be better than 10 large ones! Check readahead!Record stats: number of accesses not in memory, and page faults required to get them into memoryBtree: misses/ missRatio indicates indexes can’t be stored in memory (see above re: page faulting)
  15. Background flush: average time it takes to flush the journal to disk (fsync).IO time: amount of time (in ms) spent waiting on disk for a read or write operation.
  16. Also worth noting: the exposed DB check in settings, to tell you if you messed up your firewall settings.