The document discusses Parse's process for benchmarking MongoDB upgrades by replaying recorded production workloads on test servers. They found a 33-75% drop in throughput when upgrading from 2.4.10 to 2.6.3 due to query planner bugs. Working with MongoDB, they identified and helped fix several bugs, improving performance in 2.6.5 but still below 2.4.10 levels initially. Further optimization work increased throughput above 2.4.10 levels when testing with more workers and operations.
2. Parse?
• Parse is a backend service for mobile apps
• Data Storage
• Server-side code
• Push Notifications
• Analytics
• … all by dropping an SDK into your app
3. Parse Stats
• Parse has 400,000 apps
• Rapidly growing MongoDB deployment with:
• 500 databases
• 2.5M collections
• 8M indexes
• 50T storage (excluding replication)
• We have all kinds of workloads!
4. Variety is Fun
• We support just about any kind workload you can
imagine
• Games, social networking, events, travel, music, etc
• Apps that are read heavy or write heavy
• Heavy push users (time sensitive notifications)
• Apps that store large objects
• Apps that use us for backups
• Inefficient queries
5. 2.6 - Why Upgrade?
• General desire to stay current, precursor
for 2.8 and pluggable storage engines
• Specific features in 2.6
• Background indexing on secondaries
• Index intersection
• query plan summary logging
6. Upgrading is Scary
• In the early days, we just upgraded
• Put a new version on a secondary
• ???
• Upgrade primaries
• ???
• Fix bugs as we find them - LIVE!
7. Upgrading
• We’re too big now to cowboy it up
• Upgrading blindly is a potential catastrophe
• In particular, we want to avoid:
• Significant performance regressions
• Unexpected bugs that break customer
apps
8. Benchmarking
• We know that:
• Benchmarking can detect performance
regressions between versions
• Tools and sample workloads (sysbench, YCSB,
…) already exist
• MongoDB runs its own benchmarks
• Our workload is complex - we want more
confidence
9. A Customized Approach
• Why not test with production
workloads?
• Flashback: https://github.com/
ParsePlatform/flashback
• Record - python tool to record ops
• Replay - go tool to play back ops
10. Record
• Record leverages mongo’s profiling and oplog
• Profiling is enabled on all DBs
• Inserts are collected from the oplog
• All other ops taken from profile db
• Ops are recorded for specified time period
(24H) and then merged
• Produces a JSON file of ops to feed the replay
tool
12. Base Snapshot
• Need to replay prod ops on prod data
• It’s best to play back ops on a consistent copy of the data,
otherwise:
• inserts are duplicate key errors
• deletes are no-ops
• queries don’t return the right data
• Using EBS snapshots, we grab a copy of the db during the
recording
• Discard ops before the snapshot
14. Base Snapshot
• Snapshot is restored to our benchmark server(s)
• EBS volume has to be “warmed” because snapshot
blocks are not instantiated
• Multi TB volumes can take a few hours to warm
• After warming we create an LVM snapshot
• We can “rewind” (merge) after each playback,
iterating faster
15. Playback
1. Freeze the LVM volume
2. Start the version of mongo being tested
3. Adjust replay parameters
• # workers
• # num ops
• timestamp to start at (when base snapshot was taken)
4. Go!
5. Client-side results are logged to file, server-side collected
from monitoring tools
17. Our Workload
• 24h of ops collected
• 10M ops at a time, as fast as possible
• 10 workers
• No warming of RS
• LVM snapshot reset, mongod restarted for
each version
• Rinse and repeat for multiple replica sets
20. Results
• 33% loss in throughput.
• A second workload showed a 75% drop
in throughput
• 3669.73 ops/sec vs 975.64 ops/sec
• Ouch! What do we do next?
23. Bug Hunt!
• Old fashioned troubleshooting begins
• Began isolating query patterns and collections
with high max times
• Reproduced issue, confirmed slowness in 2.6
• Lots of documentation and log gathering,
including extremely verbose QLOG
• Started investigation with the Mongo team that ran
several weeks
24. What we found
• Basically, new query planner in 2.6 meets Parse
auto-indexer
• We create lots of indexes automatically
• More indexes to score and potentially race
• Increased likelihood of running into query
planner bugs
25. Example 1
Remove op on “Installation”
{ "installationId": {"$ne": ? }, "appIdentifier": "?",
"deviceToken": “?”}
• 9M documents
• installationId is UUID, unique value
• "installationId": {"$ne": ? } matches most documents
• deviceToken is a unique token identifying the device
26. { "installationId": {"$ne": ? }, "appIdentifier": "?", "deviceToken": “?”}
• Three candidate indexes:
{installationId: 1, deviceToken: 1}
{deviceToken: 1, installationId: 1}
{deviceToken: 1}
• The second and third indexes are clearly better candidates
for this query, since the device token is a simple point lookup.
• Mongo bug where the work required to skip keys was not
factored in to the plan ranking, causing the inefficient plan to
sometimes tie
• Since it’s a remove op, held the write lock for the DB
• Fixed in: https://jira.mongodb.org/browse/SERVER-14311
27. Example 2
Query on “Activity”:
{ $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• 25M documents
• _p_project and _p_newProject are pointers to unique IDs of other objects
• acl matches most documents
• Four candidate indexes for this query
{ _p_newProject: 1 }
{ _p_project: 1 }
{ _p_project: 1, _created_at: 1 }
{ acl: 1 }
28. { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl: { $in: [ "a", “b”, “c" ] } } }
• Query Planner would race multiple plans using indexes
• Due to a bug, one of the raced indexes would do a full
index scan (acl)
• Index scan was non-yielding, tying up the lock until it had
completed
• Parse query killer job kills non-yielding queries after 45s
• Query planner would fail to cache plan, and would re-run
on next query with the same pattern
• Fixed: https://jira.mongodb.org/browse/SERVER-15152
29. Example 3
Query on “Activity”: { $or: [ { _p_project: “?" }, { _p_newProject: “?”} ], acl:
{ $in: [ "a", “b”, “c" ] } } } (same as previous example)
• Usually fast, but occasionally saw high nscanned and query time > 60s
• Since there were indexes on all fields in AND condition, this was a
candidate for index intersection
• planSummary: IXSCAN { _p_project: 1 }, IXSCAN
{ _p_newProject: 1 }, IXSCAN { acl: 1.0 }
• acl was not selective, but _p_project and _p_newProject would
sometimes match 0 documents during race
• intersection-based query plan would get cached, subsequent queries
slow
• Fixed in https://jira.mongodb.org/browse/SERVER-14961
31. Comparison
2.4.10
P99
2.4.10
MAX
2.6.4
P99
2.6.4
MAX
2.6.5
P99
2.6.5
MAX
query
18
ms
20,953
ms
19
ms
60,001
ms
10
ms
4,352
ms
insert
23
ms
6,290
ms
50
ms
48,837
ms
24
ms
2,225
ms
update
22
ms
3,835
ms
21
ms
48,776
ms
23
ms
4,535
ms
FAM
22
ms
6,159
ms
24
ms
49,254
ms
23
ms
4,353
ms
33. What now?
• 2.6 has a green light on performance
• Working through functionality testing
• Unit/integration testing catching
majority of issues
• Bonus: Flashback error log helping us to
identify problems not caught by tests
34. Wrap Up
• Benchmarking with something representative of your
production workload is worth the time
• Saved us from discovering slowness in production and
inevitable and painful rollbacks
• Using actual production data is even better
• Helped us avoid new bugs
• Learned a lot about our own service (indexing
algorithms need some work)
• Initial work can be reused to efficiently test future versions