MongoDB allows to profile slow operations. However, it's difficult to get a quick overview of a sharded system or to have a historical view since MongoDB stores slow operations on every profiled node in a capped collection. This talk, held during the MongoDB User Group Berlin on 4th of June 2013, gives a deeper insight how idealo solved these shortcomings.
2. 2
idealo and MongoDB
●
idealo = Europe's leading price comparison web site
●
Germany, Austria, United Kingdom, France, Italy, Poland and Spain
●
250 millions offers online (May 2013)
●
fast growing
●
different types of databases (MySQL, Oracle, MongoDB)
●
MongoDB in production since v1.6
●
sharding in production since MongoDB v1.8
●
MongoDB stores offers for back-end usage
●
30 mongoDB servers for offerStore + 3 servers for offerHistory
●
15 mongoDB servers for other purposes
●
nearly 15 TB of data all together
3. 3
Review profiling
●
MongoDB supports profiling of “slow” operations
●
“slow” is a threshold to be set when turning profiling on (default 100 ms)
●
profiling per-database or per-instance on a running mongod
●
profiler writes collected data to a capped collection “system.profile”
6. 6
Inconveniences
●
each mongod needs to be handled separately
●
replSet: connect to master and every slave
●
sharding: incomplete view through router, thus replSet * n shards
●
gives only a view on a limited time span due to capped collection
●
different formats of “query” field makes querying more difficult
●
bug: ops through mongos omit the user (JIRA: SERVER-7538)
8. 8
idealo requirements
●
quick overview of types of slow-ops and their quantity within a time period
(“types” means op type, user, server, queried and sorted fields)
●
historical view to see how slow-ops evolve to extrapolate them
●
discovering spikes in time or in slow-op types
●
filtering by slow-op types and/or time range to drill down
10. 10
Steps to go
●
two global steps:
●
1) collect and aggregate slow ops from all mongod's into one global
collection
●
2) GUI to query and show results
11. 11
Step 1 of 2
●
global collection:
●
allows easy and fast querying of the whole mongoDB (shard) system
●
keeps historical data (no capped collection)
●
located on another replSet to avoid interfering with profiled mongod's
●
collector:
●
guarantee that only 1 instance is running at once (or add logic to avoid
doubled entries)
●
use tailable cursors to collect data from profiled mongod's
●
in case of failure: reconnect before data gets overwritten but avoid DoS
●
monitor it (nagios etc.)
●
profiled entries:
●
reduce size by keeping only interesting fields
●
make them easier to query (i.e. only 1 schema)
●
aggregate fields inside “query” and “orderby” to values
●
choose short field names
13. 13
Step 2 of 2
●
GUI:
●
x-axis = execution time
●
y-axis = duration of slow op
●
size of point = quantity of slow-op type
●
zoomable in x or y axis
14. 14
How to query slow ops
●
group by time component allows resolution by year, month, week etc.
●
group by server address, user, operation, queried fields and sorted fields
allows to define different slow-op types
●
filter allows to focus on time period and specific slow ops
●
use slavePreferred option
●
error handling, i.e. result exceeds max of 16 MB