2. What is it about? It’s not about sharding, it’s resharding What can sharding do for you What you must do first to obtain it Use case
3. Sharding Basics To maintain the impression that things look like this SearchCriteria using an index scanning the collection
4. Sharding Basics (cont) When they actually are like this SearchCriteria using an index scanning the collection
5. A Detail Partitioning a collection is relatively easy A bit of application logic to find a partition and that’s it Or is it?
6. The Certainty Things change You get spotted, your querying volume grows You build new functionality, your access pattern changes You buy new machines, your fixed partitioning scheme goes out the window
7. Insurance Sharding is not about partitioning. It’s about repartitioning without you bothering to ask Adding or removing shards Splitting and moving chunks* Logic of finding a chunk is MongoDB’s not the application’s * Chunk: an (arbitrary) unit that can move at once between shards
8. What is it about? It’s not about sharding, it’s resharding What can sharding do for you What you must do first to obtain it Use case
9. Starting to Shard You can load data into a sharded collection or shard an existing one* Automatic range partition will take place The data placement will be taken care of By default, it will be sharded over _id but you can specify a different sharding key An index will be built automatically over that key * 1.6
11. A digression A shard can actually live in a group of replicated servers Fault-tolerance is obtained that way Our focus here is incremental scalability and aggregated performance
12. On Reads, I Lookup over the shard key or a prefix thereof Sharding at its best! Search criteria can be satisfied by a single chunk Lookup inside chunk uses index May or may not need to access the collection Example: Shard by user_id, return the user’s name
13. On Reads, II Lookup over secondary index Not bad: merges results from shards Example: {country : “UK”} with secondary index over country
14. On Reads, III Lookups where indexes won’t help Traversing shards sequentially or in parallel?* *1.6
15. What is it about? It’s not about sharding, it’s resharding What can sharding do for you What you must do first to obtain it Use case
16. The Sharding Key Choose wisely; you’re marrying it Often, you’re better off defining a unique key that stores data the application wants to query (Internally generated _id is really not it)
17. Mind Your Queries Sure, dynamic partitioning is automatic But, ultimately, the system’s response time and scalability is connected to how your application query it If most important queries fall into category I, remaining ones in II, and seldom any query that matters in III, you’ll be fine
18. Pick Your Indexes MongoDB allows shardingand secondary indexes Critical queries that are not served by the sharding index can use help Sometimes, you can’t help them all… Index selection is a trade-off between querying and updates/insertion/deletions
19. What is it about? It’s not about sharding, it’s resharding What can sharding do for you What you must do first to obtain it Use case
20. Bit.ly History User creates URL shortener Sharding is used to store all past URL’s of a user Sharding key: user_id Indexes: timestamp(desc) Queries: Shortened URLs by a given user Last n URLs by any user