2. Agenda
• Why Sharding
• Sharding Architecture
• What is Sharding
• Sharding Balancer
• Write/Reads with Sharding
• Sharding Limitation
• Demo 2
3. Why Sharding
• All writes go to master
• Latency sensitive queries still go to master
• Single replica set has limitation of 12 nodes
• Memory can’t be large enough when active
dataset is big
• Local Disk is not big enough
• Vertical upgrade is too expensive
3
5. Config
Config Servers
Servers
mongod
mongod
• We have three config servers in prod
cluster or one in test environment mongod
• Changes are made using 2 phase commit to
provide strong consistency among all 3
config servers
• If anyone is down, meta data will be read
only
• System is online as long as 1/3 is up
5
6. shard1
Shards mongo
mongo
• Each Shard can be master, master/slave or
replica set
• Replica set provides auto-failover capability
for sharding cluster
• Regular mongod processes
6
7. Mongos
mongos
• Sharding Router
• Acts just like a mongod to clients, it makes
the cluster “invisible” to clients
• You can have as many as you want
• It’s suggested to run on appserver
• It caches metadata from config servers
7
9. What is Sharding
• It’s range based
• Automatic balancing for changes in load and data distribution
• Convert from single replica set to sharding cluster without
downtime
• Easy addition of new shards without downtime
• Scaling to one thousand nodes
• No single points of failure
• Automatic failover
9
10. Shard key
• It can be one or more fields
• every document needs a shard key (null is ok)
• shard key can’t be updated
• MongoDB's sharding is order-preserving.You can
define the shard key as ascending order or
descending order, like { tag : 1, timestamp : -1 }
• null < numbers < strings < objects < arrays <
binary data < ObjectIds < booleans < dates <
regular expressions
10
11. Chunk
• A chunk is a contiguous range of data from a
particular collection
• Collection is broken into chunks by range
• A chunk is a logical concept, not a physical
reality. $minKey <= key < $maxKey
• Each document must belong to one and only
one chunk
• default size is 64M, can be specified by --
chunksize
11
19. Chunk Migration
• Chunk Migration is an expensive operation
• Only one chunk migration happens at any
time
• based on overall size of the shard
• Balancer will automatically migrate chunks
between shards
• you can also manually move chunks
15
20. Sharding Balancer
• keep data evenly distributed on all shards
• minimize the amount of data transfered
• For a balancing round to occur, a shard
must have at least nine more chunks than
the least-populous shard
• it can be turn off
• db.settings.update({"_id" : "balancer"}, {"$set" : {"stopped" : true }}, true)
16
25. Choosing Shard Key
• A good shard key can distribute reads and
writes, but that also keeps the data you’re
using together
• Don’t use ascending shard key like ID
• Don’t use low cardinality shard key like
continent
• Don’t use random shard key like MD5
• Good example: Coarsely ascending key +
search key
20
27. Sharding Limitation
• Unique index can’t be created without shared
key as a prefix
• You can’t update shard key
• Only one chunk move in the cluster at a time
• Sharding does not yet support data center
awareness
• Add new shards brings in more traffic to
existing cluster
• 20Pb size limit
22
28. Demo
• Startup Shards
• Startup config servers
• Startup mongos
• Configure Shards
• Shard Data
• Look at config data @ mongo config server
23
35. Look at config data
• Login config database
• db.shards.find()
• db.databases.find()
• db.chunks.find()
• db.printShardingStatus(true)
30
36. Recommend Reads
• Mongodb Documentation
• http://www.mongodb.org/display/DOCS/
Sharding
• Book “Scaling Mongodb”
• You can find it on
www.safaribooksonline.com
31