The document discusses using MongoDB as both a primary data store and queueing system for Server Density. It describes how Server Density implemented queuing functionality in MongoDB using the findAndModify command to atomically retrieve and update documents. It also provides an overview of monitoring considerations for MongoDB in production, including keeping indexes and frequently accessed data in memory, watching for disk I/O spikes or slow queries that may indicate insufficient memory, and using db.serverStatus() to monitor connection usage and check for limits.
4. â˘Server Density
â˘+7TB / mth
â˘+1bn docs / mth
â˘2-5k inserts/s @ 3ms
We use MongoDB as our primary data store but also as a queueing system. So Iâm going to
talk ďŹrst about how we built the queuing functionality into Mongo and then more generally
about what you need to keep an eye on when monitoring MongoDB in production.
23. Implementation
⢠Consumers
db.runCommand(
{ ďŹndAndModify : <collection>, <options> } )
query: ďŹlter (WHERE)
{ query: { inProg: false } }
Specify the query just like any normal query against Mongo. The very ďŹrst document that
matches this will be returned. Since weâre building a queuing system, weâre using a ďŹeld
called inProg so weâre asking it to give us documents where this is false - i.e. the processing
of that document isnt in progress.
25. Implementation
⢠Consumers
db.runCommand(
{ ďŹndAndModify : <collection>, <options> } )
sort: selects the ďŹrst one on multi-match
{ sort: { added: -1 } }
We can also sort e.g. on a timestamp so you can return the oldest documents ďŹrst, or you
could build a priority system to return more important documents ďŹrst.
33. Itâs a little different,
but not entirely new.
The problem is that MongoDB is fairly new and whilst itâs still just another database running
on a server, there are things that are new and unusual. This means that some old
assumptions are still valid, but others arenât. You donât have to approach it as a completely
new thing, but it is a little different. There are disadvantages to this but one advantage is you
can use it for novel tasks, like queuing.
34. Keep it in RAM. Obviously.
www.ďŹickr.com/photos/comedynose/4388430444/
The ďŹrst and most obvious thing to note is that keeping everything in RAM is faster. But what
does that actually mean and how do you know when something is in RAM?
35. How do you know?
> db.stats()
{
! "collections" : 3,
! "objects" : 379970142,
! "avgObjSize" : 146.4554114991488,
! "dataSize" : 55648683504, 51GB
! "storageSize" : 61795435008,
! "numExtents" : 64,
! "indexes" : 1,
! "indexSize" : 21354514128, 19GB
! "fileSize" : 100816388096,
! "ok" : 1
}
http://www.ďŹickr.com/photos/comedynose/4388430444/
The easiest way is to check the database size. The MongoDB console provides an easy way to
look at the data and index sizes, and the output is provided in bytes.
36. Where should it go?
Should it be in
What?
memory?
Indexes Always
Data If you can
http://www.ďŹickr.com/photos/comedynose/4388430444/
In every case, having something in memory is going to be faster than not. However, thatâs not
always feasible if you have massive data sets. Instead, you want to make sure you always
have enough RAM to store all the indexes, which is what the db.stats() output is for. And if
you can, have space for data too. MongoDB is smart about its memory management so it will
keep commonly accessed data in RAM where possible.
37. How youâll know
1) Slow queries
Thu Oct 14 17:01:11 [conn7410] update sd.apiLog
query: { c: "android/setDeviceToken", a: 1466, u:
"blah", ua: "Server Density Android" } 51926ms
www.ďŹickr.com/photos/tonivc/2283676770/
Although not the only reason, a slow query does indicate insufficient memory. This might be
that youâve not got the most optimal indexes for a query but if indexes are being used and
itâs still slow, it could be because of a disk i/o bottleneck because the data isnât in RAM.
Doing an explain on the query will show you what indexes it is using.
38. How youâll know
2) Timeouts
cursor timed out (20000 ms)
These slow queries will obviously cause a slowdown in your app but they may also cause
timeouts. In the PHP driver a cursor will timeout after 20,000ms by default, although this is
conďŹgurable.
39. How youâll know
3) Disk i/o spikes
www.ďŹickr.com/photos/daddo83/3406962115/
Youâll see write spikes because MongoDB syncs data to disk periodically, but if youâre seeing
read spikes then that can indicate MongoDB is having to read the data ďŹles rather than
accessing data from memory. Be careful though because this wonât distinguish between data
and indexes, or even other server activity. Read spikes can also occur even if you have little
or no read activity if the mongod is part of a cluster where the slaves are reading from the
oplog.
40. Watch your storage
1) Pre-alloc
It sounds obvious but our statistics show that people run out disk space suddenly, even
though there is a predictable increase over time. Remember that MongoDB pre-allocates ďŹles
before the space is used, so youâll see your storage being used up in 2GB increments (once
you go past the smaller initial data ďŹle sizes).
41. Watch your storage
2) Sharding maxSize
When adding a new shard you can specify the maximum amount of data you want to store on
that shard. This isnât a hard limit and is instead used as a guide. MongoDB will try to keep the
data balanced across all your shards so that it meets this setting but it may not. MongoDB
doesnât currently look at actual disk levels and assumes available capacity is the same across
all nodes. As such, itâs advisable that you set this to around 70% of the total available disk
space.
42. Watch your storage
3) Logging
--quiet
db.runCommand("logRotate");
killall -SIGUSR1 mongod
Logging is verbose by default, so youâll want to use the quiet option to ensure only important
things are output. And assuming youâre logging to a log ďŹle, you will want to periodically
rotate it via the MongoDB console so that it doesnât get too big. You can also do a killall
SIGUSR1 on all your mongod processes from the shell which will cause a log rotation
(because of the SIGUSR1 ďŹag). This is useful if you want to script log rotation or put it into a
cron job.
43. Watch your storage
4) Journaling
david@rs2b ~: ls -alh /mongodbdata/journal/
total 538M
drwxrwxr-x 2 david david 29 Mar 20 16:50 .
drwx------ 4 david david 4.0K Mar 13 09:50 ..
-rw------- 1 david david 538M Mar 20 17:00 j._862
-rw------- 1 david david 88 Mar 20 17:00 lsn
Mongo should rotate the journal ďŹles often but you need to remember that they will take up
some space too, and as new ďŹles are allocated and old ones deleted, you may see your disk
usage spiking up and down.
44. db.serverStatus()
The server status command provides a lot of different statistics that can help you, like this
map of traffic in central Tokyo.
47. Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters:
[ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 }
MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception
Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname
Weâve recently had this problem and it manifests itself by the logs ďŹlling up all available disk
Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed
space instantly, and in some cases completely crashing the server.
Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2c:27018
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] reconnect rs2c:27018 failed
Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2a:27018
Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname
Fri Nov 19 17:24:34 [conn2343] reconnect rs2a:27018 failed
Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE)
Fri Nov 19 17:24:35 [conn2343] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters:
[ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 }
MessagingPort say send() errno:9 Bad file descriptor (NONE)
48. connPoolStats
> db.runCommand("connPoolStats")
{
! "hosts" : {
! ! "config1:27019" : {
! ! ! "available" : 2,
! ! ! "created" : 6
! ! },
! ! "set1/rs1a:27018,rs1b:27018" : {
! ! ! "available" : 1,
! ! ! "created" : 249
! ! },
...
! },
! "totalAvailable" : 5,
! "totalCreated" : 1002,
! "numDBClientConnection" : 3490,
! "numAScopedConnection" : 3,
}
connPoolStats allows you to see the connection pools that have been set up by a mongos to
connect to different members of the replica set shards. This is useful to correlate against
open ďŹle descriptors so you can see if there are suddenly a large number of connections, or if
there are a low number of available connections across your entire cluster.
49. db.serverStatus()
3) Index counters
"indexCounters" : {
! ! "btree" : {
! ! ! "accesses" : 15180175,
! ! ! "hits" : 15178725,
! ! ! "misses" : 1450,
! ! ! "resets" : 0,
! ! ! "missRatio" : 0.00009551932
! ! }
! },
The miss ratio is what youâre looking at here. If youâre seeing a lot of index misses then you
need to look at your queries to see if theyâre making optimal use of the indexes youâve
created. You should consider adding new indexes and seeing if your queries run faster as a
result. You can use the explain syntax to see which indexes queries are hitting, and the total
execution time so you can benchmark them before and after.
50. db.serverStatus()
4) Op counters
www.ďŹickr.com/photos/cosmic_bandita/2395369614/
The op counters - inserts, updates, deletes and queries - are fun to look at, especially if the
numbers are high. But you have to be careful these are not just vanity metrics. There are
some things you can use them for though. If you have a high number of inserts and updates,
i.e. writes, then you may want to look at your fsync time setting. By default this will ďŹush to
disk every 60 seconds but if youâre doing thousands of writes per second you might want to
do this sooner for durability. Of course you can also ensure the write happens from within
the driver. Queries can show whether you need to load off reads to your slaves, which can be
done through the drivers, so that youâre spreading the load across your servers and only
writing to the master. Deletes can also cause concurrency problems if youâre doing a large
number of them and the database keeps having to yield.
51. db.serverStatus()
5) Background ďŹushing
Picture is unrelated! Mmm, ice cream.
The server status output allows you to see the last time data was ďŹushed to disk, and how
long that took. This is useful to see if youâre causing high disk load but also so you can
monitor how often data is being written. Remember that whilst it isnât synced to disk, you
could experience data loss in the event of a crash or power outage.
52. db.serverStatus()
6) Dur
If you have journalling enabled then serverStatus will also show some stats such as how many
commits have occurred, the amount of data written and how long various operations have
taken. This can be useful for seeing how much overhead durability adds to servers. Weâve
found no noticeable difference when enabling journaling and thatâs on servers processing
billions of operations.
53. rs.status()
{
! "_id" : 1,
! "name" : "rs3b:27018",
! "health" : 1,
! "state" : 2,
! "stateStr" : "SECONDARY",
! "uptime" : 1886098,
! "optime" : {
! ! "t" : 1291252178000,
! ! "i" : 13
! },
! "optimeDate" : ISODate("2010-12-02T01:09:38Z"),
"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")
},
www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes itâs a replicator from Star Trek)
If youâre running a replica set then you can use the rs.status() command to get information
about the whole replica set, on any set member. This gives you a few stats about the current
member as well as a full list of every member in the set.
54. rs.status()
1) myState
Value Meaning
0 Starting up (phase 1)
1 Primary
2 Secondary
3 Recovering
4 Fatal error
5 Starting up (phase 2)
6 Unknown state
7 Arbiter
8 Down
en.wikipedia.org/wiki/State_of_matter
The ďŹrst value is myState which shows you the status of the server you executed the
command on. However, itâs also used in the list of members the command also provides so
you can see the state of any member in the replica set, as that member sees it. This is useful
to understand why members might be down because other members canât see them.
55. rs.status()
2) Optime
"optimeDate" : ISODate("2010-12-02T01:09:38Z")
www.ďŹickr.com/photos/robbie73/4244846566/
Replica set members who are not master will be secondary, which means theyâll act as a slave
staying up to date with the master. The optimeDate allows you to see whether a member is
behind on the replication sync. The timestamp is the last applied log item so if itâs up to date,
itâll be very close to the current actual time on the server.
56. rs.status()
3) Heartbeat
"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")
www.ďŹickr.com/photos/drawblindfaith/3400981091/
The whole idea behind replica sets is that they automate failover in the event of failure
somewhere. This is done by a regular heartbeat that all members send out to all other
members. The status output shows you the last time that particular member was contacted
from the current member. In the event of a network partition it may be that some members
canât communicate with eachother, and when there is an error youâll see it in this section too.
57. mongostat
The mongostat tool is included as part of the standard MongoDB download and gives you a
quick, real time snapshot of the current state of your servers.
58. mongostat
1) faults
Picture is unrelated! Snowmobile in Norway.
The faults column shows you the number of Linux page faults per second. This is when
Mongo accesses something that is mapped to the virtual address space but not in physical
memory. i.e. it results in a read from disk. High values here indicate you may not have
enough RAM to store all necessary data and disk accesses may start to become the
bottleneck.
59. mongostat
2) locked
www.ďŹickr.com/photos/bbusschots/4541573665/
The next column is locked, which shows the % of time in a global write lock. When this is
happening no other queries will complete until the lock is given up, or the lock owner yields.
This is indicative of a large, global operation like a remove() or dropping a collection and can
result in slow performance.
60. mongostat
3) index miss
www.ďŹickr.com/photos/gareandkitty/276471187/
Index miss is like we saw in the server status output except instead of an aggregate total,
you can see queries hitting (or missing) the index in real time. This is useful if youâre
debugging speciďŹc queries in development or need to track down a server that is performing
badly.
61. mongostat
4) queues
When MongoDB gets too many queries to handle in real time, it queues them up. This is
represented in mongostat by the read and write queue columns. When this starts to increase
you will slowdowns in executing queries as they have to wait to run through the queue. You
can alleviate this by stopping any more queries until the queue has dissipated. Queues will
tend to spike if youâre doing a lot of write operations alongside other write heavy ops, such
as large ranged removes. The second column it the active read and writes.
62. mongostat
5) Diagnostics
The last three columns show the total number of connections per server, the replica set they
belong to and the status of that server. This is useful if you need to quickly see which server
is a master in a replica set.
63. Current operations
db.currentOp();
{
! ! ! "opid" : "shard1:299939199",
! ! ! "active" : true,
! ! ! "lockType" : "write",
! ! ! "waitingForLock" : false,
! ! ! "secs_running" : 15419,
! ! ! "op" : "remove",
! ! ! "ns" : "sd.metrics",
! ! ! "query" : {
! ! ! ! "accId" : 1391,
! ! ! ! "tA" : {
! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")
! ! ! ! }
! ! ! },
! ! ! "client" : "10.121.12.228:44426",
! ! ! "desc" : "conn"
! ! },
www.ďŹickr.com/photos/jeffhester/2784666811/
The db.currentOp() function will give you a full list of every operation currently in progress. In
this case thereâs a long runnin remove which has been active for over 4 hours. You can see
that itâs targeted at shard 1 and the query is based on an account ID and a timestamp. Itâs
part of our retention scripts to remove older metrics data. This is useful because you can
track down long running queries which might be hurting performance, and kill them off using
the opid.