The document discusses various techniques for optimizing MongoDB databases, including:
- Making _id values meaningful and using fixed-width hashes to improve indexing performance.
- Refactoring activity feed documents to group related data into a single field indexed by MongoDB to enable fast queries.
- Removing unnecessary indexes to improve performance without needing more hardware.
- Addressing inefficiencies from variable document sizes through techniques like pre-allocating space to prevent document moves within collections.
4. _id and indexes
• Bad Ideas
– ObjectId("4fb284…")
– Big Compound Indexes
– Long,VariableWidthStringsMissIndexes
• Good Ideas
– Make _id mean something
– Fixed Width Hashes
– Use _id as a compound index
5. activity feeds: first attempt
{“_id”: “201109122304-lucas-dan-c7dede43…”,
"username”: “lucas”, "created”: 201109122304,
"actor”: “dan”, “verb”: “love”}
db.user.feed.find({„username‟: „lucas‟, „verb‟: „love‟})
.sort({„created‟: -1})
Working just fine for 4MM documents, but getting slow…
6. new version of activity feeds
{“_id”: “201109122304-lucas-dan-
c7dede43…”, ”uid”: “lucas-201109122304”, ”vid”:
lucas-love-201109122304, "actor”: “dan”}
db.user.feed.find({„vid‟: /^lucas-/})
.sort({„vid‟: -1})
Fast for all 3 use cases!
9. padding factor
• Variable document size
• Allocate for the latest and fattest
• Document moves
• Can be very inefficient
• More RAM!
• Pre-allocate to prevent moves
10. unbounded embedded lists
• Useful for followers, favorites
• Good for a few things, bad for lots
• Constantly bumping up padding factor
• Lots of document moves
11. a metaphor
• You run a coffee shop and can buy only
one size of cup. Which size do you buy?
• On average, each customer has only one
cup
• Heavy drinkers have hundreds of cups
credit: Macintex macintex.deviantart.com
12. bucketing!
• Split list across multiple documents
• Median number of items = bucket size
• Pre-allocate
• Easy seeking and traversal
• Much faster
13. hey charts!
site.meta 1 site.meta 2
site.songs 1 site.songs 2
Allocated and unused
Allocated and full of data
14. same charts when using
bucketing
site.meta 1 site.meta 2
site.songs 1 - 1 site.songs 2 - 1 site.songs 2 - 2
site.songs 1 -2 site.songs 2 - 3 site.songs 2 - 4
site.songs 2 - 5 site.songs 2 -6
Allocated and unused
Allocated and full of data
15. doesn’t work for everything…
• Picking right bucket size
• Defragging
• Random insertion
– Easy for things you don‟t much care about the
order of
– More difficult is you‟re going to insert and
change the order later
17. paying it back
• Bent mongoengine to make this easy
• Follow github.com/exfm
• Also added tooling for
– Trace all queries
– Aggregate tracing by request middleware
– Raise exceptions when queries miss an index