MongoDB Chunks – Distribution, Splitting, and Merging This presentation will discuss how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. This can be especially true when a new change has been introduced to the workload. It may cause hot-spotting, unbalanced storage, or empty chunks.
MongoDB Chunks - Distribution, Splitting, and Merging
1. MongoDB Chunks
Jason Terpko
MongoDB Chunks – Distribution, Splitting, and Merging
NoSQL DBA, Rackspace/ObjectRocket
www.linkedin.com/in/jterpko, jason.terpko@rackspace.com
2. My Story
• Started out in relational databases in public education then financial
services
• Next came online media distribution combined with a paywall
• For analytics, started working with columnar databases and engines with
compression
• Made the switch to NoSQL at ObjectRocket by Rackspace
3. Overview
• MongoDB and Sharding
• What is a chunk in MongoDB?
• Chunk Distribution and Scaling MongoDB
• Use Cases For Splitting
• Use Cases For Merging
• Reconsidering Your Shard Key
18. Merging Process
How we have resolved this with JavaScript in past:
1. Check Balancer State
2. Read in empty chunks from ChunkHunter.py results
collection
3. Locate adjacent chunk
4. If empty and adjacent chunks reside on the same shard,
merge
5. Else move chunk to the shard with the adjacent chunk, then
merge
What have we learned from the current process?
20. Chunk Size and Splits
How frequently are chunks being split?
• Global Change
• split & splitVector
• Decreasing the size
• Increased the size
21. Review Your Shard Key
How frequently does this occur?
• Is this a re-occurring problem?
• What impact does it have to your business?
• Re-analyzing your structure, workload, and access patterns
• What method will you use to re-shard a sharded collection?