4. Prototype
Ops
Playbook
Test
Capacity
Planning
Monitor
Reinventing the wheel
5. Essentials
• Disable NUMA
• Pick appropriate file-system (xfs, ext4)
• Pick 64-bit O/S
– Recent Linux kernel, Win2k8R2
• More RAM
– Spend on RAM not Cores
• Faster Disks
– SSDs vs. SAN
– Separate Journal and Data Files
6. Key things to consider
• Profiling
– Baseline/Blue print: Understand what should happen
– Ensure good Index usage
• Monitoring
– SNMP, munin, zabix, cacti, nagios
– MongoDB Monitoring Service (MMS)
• Sizing
– Understand Capability (RAM, IOPs)
– Understand Use Cases + Schema
7. What is your SLA?
• High Availability?
– 24x7x365 operation?
– Limited maintenance window?
• Data Protection?
– Failure of a Single Node?
– Failure of a Data Center?
• Disaster Recovery?
– Manual or automatic failover?
– Data Center, Region, Continent?
8. Build & Test your Playbook
• Backups
• Restores (backups are not enough)
• Upgrades
• Replica Set Operations
• Sharding Operations
10. How to see metrics
• mongostat
• MongoDB plug ins for
– munin, zabix, cacti, ganglia
•Hosted Services
– MMS - 10gen
– Server Density, Cloudkick
• Profiling
12. Metrics in detail: opcounters
• Counts:
Insert, Update, Delete, Query, Commands
• Operation counters are mostly straightforward:
more is better
• Some operations in a replica set primary are
accounted differently in a secondary
• getlastError(), system.status etc are also
counted
14. Metrics in detail: resident
memory
• Key metric: to a very high degree, the
performance of a mongod is a measure of how
much data fits in RAM.
• If this quantity is stably lower than available
physical memory, the mongod is likely
performing well.
• Correlated metrics: page faults, B-Tree misses
16. Collection 1 Virtual Disk
Address
Space 1
Physical
RAM
Index 1
100 ns
=
10,000,000 ns
=
17. Metrics in detail: page faults
• This measures reads or writes to pages of data
file that aren't resident in memory
• If this is persistently non-zero, your data doesn't
fit in memory.
• Correlated metrics: resident memory, B-Tree
misses, iostats
18. Working Set
> db.blogs.stats()
{ Size of data
"ns" : "test.blogs",
"count" : 1338330,
"size" : 46915928, Average
"avgObjSize" : 35.05557523181876, document size
"storageSize" : 86092032,
"numExtents" : 12, Size on disk (and
"nindexes" : 2, in memory!)
"lastExtentSize" : 20872960,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 99860480, Size of all
"indexSizes" : { indexes
"_id_" : 55877632,
"name_1" : 43982848 Size of each
}, index
"ok" : 1
}
20. Metrics in detail: lock
percentage and queues
• By itself, lock % can be misleading: a high lock
percentage just means that writing is happening.
• But when lock % is high and queued readers or
writers is non-zero, then the mongod probably at
its write capacity.
• Correlated metrics: iostats
22. explain, hint
// explain() shows the plan used by the operation
> db.c.find(<query>).explain()
// hint() forces a query to use a specific index
// x_1 is the name of the index from db.c.getIndexes()
> db.c.find( {x:1} ).hint("x_1")
24. Metrics in detail: B-Tree
• Indicates b-tree accesses including page fault
service during an index lookup
• If misses are persistently non-zero, your indexes
don't fit in RAM. (You might need to change or
drop indexes, or shard your data.)
• Correlated metrics: resident memory, page
faults, iostats
25. B-Trees' strengths
• B-Tree indexes are designed for range queries
over a single dimension
• Think of a compound index on { A, B } as being
an index on the concatenation of the A and B
values in documents
• MongoDB can use its indexes for sorting as well
26. B-Trees' weaknesses
• Ranges queries on the first field of a compound
index are suboptimal
• Range queries over multiple dimensions are
suboptimal
• In both these cases, a suboptimal index might
be better than nothing, but best is to try to see if
you can't change the problem
27. Indexing dark corners
• Some functionality can't currently always use
indexes:
– $where JavaScript clauses
– $mod, $not, $ne
– regex
• Negation maybe transformed into a range query
– Index can be used
• Complicated regular expressions scan a whole
index
30. Journal on another disk
•The journal's write load is very different than the
data files
– journal = append-only
– data files = randomly accessed
•Putting the journal on a separate disk or RAID
(e.g., with a symlink) will minimize any seek-time
related journaling overhead
31. --directoryperdb
• Allows storage tiering
– Different access patterns
– Different Disk Types / Speeds
• use --directoryperdb
• add symlink into database directory
32. Dynamically change log level
// Change logging level to get more info
> db.adminCommand({ setParameter: 1, logLevel: 1 })
> db.adminCommand({ setParameter: 1, logLevel: 0 })