SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
Managing Data and
Operation Distribution In
MongoDB
Antonios Giannopoulos and Jason Terpko
DBA’s @ Rackspace/ObjectRocket
linkedin.com/in/antonis/ | linkedin.com/in/jterpko/
1
Introduction
www.objectrocket.com
2
Antonios Giannopoulos Jason Terpko
Overview
• Sharded Cluster
• Shard Keys Selection
• Shard Key Operations
• Chunk Management
• Data Distribution
• Orphaned documents
• Q&A
www.objectrocket.com
3
Sharded
Cluster • Cluster Metadata
• Data Layer
• Query Routing
• Cluster Communication
www.objectrocket.com
4
Cluster Metadata
Data Layer
…
s1 s2 sN
Replication
Data redundancy relies on an idempotent log of operations.
Query Routing
…
s1 s2 sN
Sharded Cluster
…
s1 s2 sN
Cluster Communication
How do independent components become a cluster and communicate?
● Replica Set
○ Replica Set Monitor
○ Replica Set Configuration
○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry
○ Misc: replSetName, keyFile, clusterRole
● Mongos Configuration
○ configDB Parameter
○ Network Interface ASIO Shard Registry
○ Replica Set Monitor
○ Task Executor
● Post Add Shard
○ Collection config.shards
○ Replica Set Monitor
○ Task Executor Pool
○ config.system.sessions
Primary Shard
…
s1 s2 sN
Database <foo>
Collection UUID
Cluster Metadata
config.collections
Data Layer (mongod)
config.collections
With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.
Collection UUID
With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.
Cluster Metadata
config.collections
Data Layer (mongod)
config.collections
Important
• UUID’s for a namespace
must match
• Use 4.0+ Tools for a
sharded cluster restore
Shard Key -
Selection • Profiling
• Identify shard key candidates
• Pick a shard key
• Challenges
www.objectrocket.com
14
Sharding
…
15
s1 s2 sN
Database <foo> Collection <foo>
Shards are
Physical Partitions
chunk chunk
Chunks are
Logical Partitions
chunk chunkchunk chunk
What is a Chunk?
The mission of the shard key is to create chunks
The logical partitions your collection is divided into and how data is distributed across the cluster.
● Maximum size is defined in config.settings
○ Default 64MB
● Before 3.4.11: Hardcoded maximum document count of 250,000
● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size
● Chunk map is stored in config.chunks
○ Continuous range from MinKey to MaxKey
● Chunk map is cached at both the mongos and mongod
○ Query Routing
○ Sharding Filter
● Chunks distributed by the Balancer
○ Using moveChunk
○ Up to maxSize
Shard Key Selection
www.objectrocket.com
17
Profiling
Helps identify your workload
Requires Level 2 – db.setProfilingLevel(2)
May need to increase profiler size
Shard Key Selection
www.objectrocket.com
18
CandidatesProfiling
Export statements types with frequency
Export statement patterns with frequency
Produces a list of shard key candidates
Shard Key Selection
www.objectrocket.com
19
Build-in
Constraints
CandidatesProfiling
Key and Value is immutable
Must not contain NULLs
Update and findAndModify operations must contain shard key
Unique constraints must be maintained by a prefix of shard key
A shard key cannot contain special index types (i.e. text)
Potentially reduces the list of candidates
Shard Key Selection
www.objectrocket.com
20
Schema
Constraints
Build-in
Constraints
CandidatesProfiling
Cardinality
Monotonically increased
Data Hotspots
Operational Hotspots
Targeted vs Scatter-gather operations
Shard Key Selection
www.objectrocket.com
21
Future
Schema
Constraints
Build-in
Constraints
CandidatesProfiling
Poor cardinality
Growth and data hotspots
Data pruning & TTL indexes
Schema changes
Try to simulate the dataset in 3,6 and 12 months
Shard key -
Operations • Apply a shard key
• Revert a shard key
www.objectrocket.com
22
Apply a shard key
www.objectrocket.com
23
Create the associated index
Make sure the balancer is stopped:
sh.stopBalancer()
sh.getBalancerState()
Apply the shard key:
sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1})
Allow a burn period
Start the balancer
Sharding
…
s1 s2 sN
Database <foo> Collection <foo>
chunk chunk
sh.ShardCollection({foo.foo},<key>)
sh.startBalancer()
chunk chunk chunk chunk
Burn Period
Revert a shard key
www.objectrocket.com
25
Two categories:
o Affects functionality (exceptions, inconsistent data,…)
o Affects performance (operational hotspots…)
Dump/Restore
o Requires downtime – write and in some cases read
o Time consuming operation
o You may restore on a sharded or unsharded collection
o Better pre-create indexes
o Same or new cluster can be used
o Streaming dump/restore is an option
o On special cases, like time series data can be fast
Revert a shard key
www.objectrocket.com
26
Dual writes
o Mongo to Mongo connector or Change streams
o No downtime
o Requires extra capacity
o May Increase latency
o Same or new cluster can be used
o Adds complexity
Alter the config database
o Requires downtime – but minimal
o Easy during burn period
o Time consuming, if chunks are distributed
o Has overhead during chunk moves
Revert a shard key
www.objectrocket.com
27
Process:
1) Disable the balancer – sh.stopBalancer()
2) Move all chunks to the primary shard (skip during burn period)
3) Stop one secondary from the config server ReplSet (for rollback)
4) Stop all mongos and all shards
5) On the config server replset primary execute:
db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>})
db.getSiblingDB(‘config’).collections.remove({_id:<collection name>})
6) Start all mongos and shards
7) Start the secondary from the config server replset
Rollback:
• After step 6, stop all mongos and shards
• Stop the running members of the config server ReplSet and wipe their data directory
• Start all config server replset members
• Start all mongos and shards
Revert a shard key
www.objectrocket.com
28
Online option requested on SERVER-4000 - May be supported in 4.2
Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL
Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf
Special use cases:
Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1})
o Possible (and easier) if b’s max and min (per a) are predefined
o For example {year:month} to be extended to {year:month:day}
Reduce the elements of a shard key (({a:1, b:1} to {a:1})
o Possible (and easier) if all distinct “a” values are in the same shard
o There aren’t chunks with the same “a.min” (adds complexity)
Revert a shard key
www.objectrocket.com
29
Always preform a dry-run
Balancer/Autosplit must be disabled
You must take downtime during the change
*There might be a more optimal code path but the above one worked like a charm
Chunk
Splitting and
Merging
• Pre-splitting
• Auto Splits
• Manual Intervention
www.objectrocket.com
30
Distribution Goal
…
31
s1* s2 s4
Database <foo>
25% 25%
25%
50G 50G 50G
Database Size: 200G
Primary Shard: s1
Pre-Split – Hashed Keys
32
Shard keys using MongoDB’s hashed index allow the use of numInitialChunks.
Hashing Mechanism
jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”)
Value 64-bits of MD5 64-bit Integer
Estimation
Size = Collection size (in MB) / 32
Count = Number of documents / 125000
Limit = Number of shards * 8192
numInitialChunks = Min(Max(Size, Count), Limit)
1,600 = 51,200 / 32
800 = 100,000,000 / 125,000
32,768 = 4 *8192
1600 = Min(Max(1600, 800), 32768)
Command
db.runCommand( { shardCollection: ”foo.users", key: { "uid": "hashed" }, numInitialChunks : 1600 } );
Pre-Split – Deterministic
33
Use Case: Collection containing user profiles with email as the unique key.
Prerequisites
1. Shard key analysis complete
2. Understanding of access patterns
3. Knowledge of the data
4. Unique key constraint
Pre-Split – Deterministic
34
SplitPrerequisites
Initial Chunk
Splits
Pre-Split – Deterministic
35
SplitPrerequisites Balance
Pre-Split – Deterministic
36
SplitPrerequisites Balance Split
Automatic Splitting
37
Controlling Auto-Split
• sh.enableAutoSplit()
• sh.disableAutoSplit()
Alternatively
Mongos
• The component responsible for track statistics
• Bytes Written Statistics
• Multiple Mongos Servers for HA
Sub-Optimal Distribution
…
38
s1* s2 s4
Database <foo>
40% 20%
20%
50G 50G 50G
Database Size: 200G
Primary Shard: s1
Chunks: Balanced
Maintenance – Splitting
39
Four Helpful Resources:
• collStats
• config.chunks
• Profiler
• Oplog
• dataSize
Maintenance – Splitting
40
Five Helpful Resources:
• collStats
• config.chunks
• dataSize
• oplog.rs
• system.profile
Maintenance – Splitting
41
Five Helpful Resources:
• collStats
• config.chunks
• dataSize
• oplog.rs
• system.profile
Or:
Maintenance – Splitting
42
Five Helpful Resources:
• collStats
• config.chunks
• dataSize
• oplog.rs
• system.profile
*with setProfilingLevel at 2, analyze both read and writes
Maintenance – Splitting
43
Five Helpful Resources:
• collStats
• config.chunks
• dataSize
• oplog.rs
• system.profile*
*with setProfilingLevel at 2, analyze both read and writes
Sub-Optimal Distribution
…
44
s1* s2 s4
Database <foo>
40% 20%
20%
50G 50G 50G
Database Size: 200G
Primary Shard: s1
Chunks: Balanced
Maintenance - Merging
45
Analyze
Maintenance - Merging
46
MoveAnalyze
Maintenance - Merging
47
MoveAnalyze Merge
Balancing
• Balancer overview
• Balancing with defaults
• Create a better distribution
• Create a better balancing
www.objectrocket.com
48
Balancer
49
The balancer process is responsible for redistributing the chunks of a sharded collection evenly
among the shards for every sharded collection.
Takes into account the number of chunks (and not the amount of data)
Number of Chunks Migration Threshold
Fewer than 20 2
20-79 4
80 and greater 8
Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than
1.3 times the result of dividing the configured chunk size by the average document
size. db.collection.stats() includes the avgObjSize field, which represents the average document size in
the collection. Prior to 3.4.11 max was 250000 documents
Balancer
50
Parallel Migrations:
Before 3.4, one migration at a time
After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration
Settings:
chunkSize: Default is 64M – Lives on config.settings
_waitForDelete : Default is false – Lives on config.settings
_secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings
activeWindow - Default is 24h. – Lives on config.settings
maxSize – Default is unlimited. Lives on config.shards
disableBalancing: Disables/Enables balancing per collection
autoSplit: Disables/Enables splits
Balancing
51
Balancer only cares about the number of chunks per shard.
Best case Our case Our goal
Balancing
52
The “apple algorithm” we are going to introduce is simple
For a collection, it requires an ordered chunk map, with attributes: chunk size,
chunk bounds (min, max) and the shard each chunk belongs.
1 Pick the first chunk (current)
2 Merge current with next
3 If merged size is lower than a configured threshold then go to step 2
4 else merge current with next and set next as current
Lets now see the implementation in Python.
Balancing - Variables
53
Balancing – Basic functions
54
Balancing – Main function
55
Balancing – Helper functions
56
Balancing - Output
57
Balancing
58
Can the algorithm do better?
Can we improve the balancing post running the script?
Balancing
59
Can the algorithm do better?
Can we improve the balancing post running the script?
Make bounds more strict and add more parameters will improve it.
-OR- Chunk Buckets maybe the answer.
The script produces chunks between (chunksize/2) and (chunksize) chunks
It will improved balancing but, It may not achieve a perfect distribution
The idea is to categorize the chunks to buckets between (chunksize/2) and
(chunksize) and each shard to have equal number of chunks from each
bucket
Balancing - Buckets
60
For example, chunksize=64 we can create the following buckets:
o Bucket1 for sizes between 32 and 36 MiB
o Bucket2 for sizes between 36 and 40 MiB
o Bucket3 for sizes between 40 and 44 MiB
o Bucket4 for sizes between 44 and 48 MiB
o Bucket5 for sizes between 48 and 52 MiB
o Bucket6 for sizes between 52 and 56 MiB
o Bucket7 for sizes between 56 and 60 MiB
o Bucket8 for sizes between 60 and 64 MiB
More buckets means more accuracy but it may cause more chunk moves.
The diversity of the chunks plays a major role
Balancing - Buckets
61
Balancing – Get the code
62
GitHub Repo - https://bit.ly/2M0LnxG
Orphaned
Documents • Definition
• Issues
• Cleanup
www.objectrocket.com
63
Definition/Impact
64
Definition: Orphaned documents are those documents on a shard that also
exist in chunks on other shards
How can they occur:
- Failed migration
- Failed cleanup (RangeDeleter)
- Direct access to the shards
Impact:
- Space
- Performance
- Application consistency
Cleanup
65
cleanupOrphaned
• Must run on every shard
• Removes the Orphans automatically
• No dry run / Poor reporting
Drain shard(s)
• Expensive – storage/performance
• Locate shards with orphans
Cleanup Cont.
66
There are ways to scan more intelligently:
• Skip unsharded collections
db.collections.find({"dropped" : false},{_id:1})
• Skip collections without migrations
db.changelog.distinct("ns",{"what":"moveChunk.start"})
• Check first event - changelog is a capped collection
Cleanup Cont.
67
An offline method to cleanup orphans:
mongodump/mongorestore shards with orphans and config.chunks collection
Remove documents on all ranges belong to the shard(s)
The “leftovers” are the orphaned documents
Its a bit more tricky with “hashed” keys:
Questions?
www.objectrocket.com
68
Rate Our Session
www.objectrocket.com
69
www.objectrocket.com
70
We’re Hiring!
Looking to join a dynamic & innovative team?
https://www.objectrocket.com/careers/
or email careers@objectrocket.com
Thank you!
Address:
401 Congress Ave Suite 1950
Austin, TX 78701
Support:
1-800-961-4454
Sales:
1-888-440-3242
www.objectrocket.com
71

Weitere ähnliche Inhalte

Was ist angesagt?

Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialAntonios Giannopoulos
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External AuthenticationJason Terpko
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialJason Terpko
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streamingRamūnas Urbonas
 
MySQL async message subscription platform
MySQL async message subscription platformMySQL async message subscription platform
MySQL async message subscription platformLouis liu
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오PgDay.Seoul
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...Ontico
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingDatabricks
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientMike Friedman
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10gsagai
 
StackExchange.redis
StackExchange.redisStackExchange.redis
StackExchange.redisLarry Nung
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureMariaDB plc
 
GOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLGOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLHenning Jacobs
 

Was ist angesagt? (20)

Percona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorialPercona Live 2017 ­- Sharded cluster tutorial
Percona Live 2017 ­- Sharded cluster tutorial
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 
MySQL async message subscription platform
MySQL async message subscription platformMySQL async message subscription platform
MySQL async message subscription platform
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
[Pgday.Seoul 2017] 3. PostgreSQL WAL Buffers, Clog Buffers Deep Dive - 이근오
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
Fluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log ManagementFluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log Management
 
Mongodb replication
Mongodb replicationMongodb replication
Mongodb replication
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and StreamingUsing Apache Spark to Solve Sessionization Problem in Batch and Streaming
Using Apache Spark to Solve Sessionization Problem in Batch and Streaming
 
Building Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::ClientBuilding Scalable, Distributed Job Queues with Redis and Redis::Client
Building Scalable, Distributed Job Queues with Redis and Redis::Client
 
Wait Events 10g
Wait Events 10gWait Events 10g
Wait Events 10g
 
StackExchange.redis
StackExchange.redisStackExchange.redis
StackExchange.redis
 
M|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database InfrastructureM|18 Why Abstract Away the Underlying Database Infrastructure
M|18 Why Abstract Away the Underlying Database Infrastructure
 
GOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQLGOTO 2013: Why Zalando trusts in PostgreSQL
GOTO 2013: Why Zalando trusts in PostgreSQL
 

Ähnlich wie Managing data and operation distribution in MongoDB

Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStoreMariaDB plc
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuNETWAYS
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB
 
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsFrom Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsScyllaDB
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildTim Vaillancourt
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataJihoon Son
 

Ähnlich wie Managing data and operation distribution in MongoDB (20)

Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 
week1slides1704202828322.pdf
week1slides1704202828322.pdfweek1slides1704202828322.pdf
week1slides1704202828322.pdf
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
MariaDB ColumnStore
MariaDB ColumnStoreMariaDB ColumnStore
MariaDB ColumnStore
 
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica SarbuOSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 - Monitor your infrastructure with Elastic Beats by Monica Sarbu
 
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica SarbuOSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
OSMC 2016 | Monitor your Infrastructure with Elastic Beats by Monica Sarbu
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo SeattleMongoDB Auto-Sharding at Mongo Seattle
MongoDB Auto-Sharding at Mongo Seattle
 
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance GainsFrom Postgres to ScyllaDB: Migration Strategies and Performance Gains
From Postgres to ScyllaDB: Migration Strategies and Performance Gains
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 

Mehr von Antonios Giannopoulos

Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticComparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticAntonios Giannopoulos
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Antonios Giannopoulos Percona 2016 WiredTiger Configuration VariablesAntonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Antonios Giannopoulos Percona 2016 WiredTiger Configuration VariablesAntonios Giannopoulos
 
Introduction to Polyglot Persistence
Introduction to Polyglot Persistence Introduction to Polyglot Persistence
Introduction to Polyglot Persistence Antonios Giannopoulos
 

Mehr von Antonios Giannopoulos (7)

Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticComparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
 
Triggers in MongoDB
Triggers in MongoDBTriggers in MongoDB
Triggers in MongoDB
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Antonios Giannopoulos Percona 2016 WiredTiger Configuration VariablesAntonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
Antonios Giannopoulos Percona 2016 WiredTiger Configuration Variables
 
Introduction to Polyglot Persistence
Introduction to Polyglot Persistence Introduction to Polyglot Persistence
Introduction to Polyglot Persistence
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 

Kürzlich hochgeladen

Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 

Kürzlich hochgeladen (20)

Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 

Managing data and operation distribution in MongoDB

  • 1. Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1
  • 3. Overview • Sharded Cluster • Shard Keys Selection • Shard Key Operations • Chunk Management • Data Distribution • Orphaned documents • Q&A www.objectrocket.com 3
  • 4. Sharded Cluster • Cluster Metadata • Data Layer • Query Routing • Cluster Communication www.objectrocket.com 4
  • 7. Replication Data redundancy relies on an idempotent log of operations.
  • 10. Cluster Communication How do independent components become a cluster and communicate? ● Replica Set ○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole ● Mongos Configuration ○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor ● Post Add Shard ○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions
  • 11. Primary Shard … s1 s2 sN Database <foo>
  • 12. Collection UUID Cluster Metadata config.collections Data Layer (mongod) config.collections With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID.
  • 13. Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important • UUID’s for a namespace must match • Use 4.0+ Tools for a sharded cluster restore
  • 14. Shard Key - Selection • Profiling • Identify shard key candidates • Pick a shard key • Challenges www.objectrocket.com 14
  • 15. Sharding … 15 s1 s2 sN Database <foo> Collection <foo> Shards are Physical Partitions chunk chunk Chunks are Logical Partitions chunk chunkchunk chunk
  • 16. What is a Chunk? The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster. ● Maximum size is defined in config.settings ○ Default 64MB ● Before 3.4.11: Hardcoded maximum document count of 250,000 ● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size ● Chunk map is stored in config.chunks ○ Continuous range from MinKey to MaxKey ● Chunk map is cached at both the mongos and mongod ○ Query Routing ○ Sharding Filter ● Chunks distributed by the Balancer ○ Using moveChunk ○ Up to maxSize
  • 17. Shard Key Selection www.objectrocket.com 17 Profiling Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size
  • 18. Shard Key Selection www.objectrocket.com 18 CandidatesProfiling Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates
  • 19. Shard Key Selection www.objectrocket.com 19 Build-in Constraints CandidatesProfiling Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates
  • 20. Shard Key Selection www.objectrocket.com 20 Schema Constraints Build-in Constraints CandidatesProfiling Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations
  • 21. Shard Key Selection www.objectrocket.com 21 Future Schema Constraints Build-in Constraints CandidatesProfiling Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months
  • 22. Shard key - Operations • Apply a shard key • Revert a shard key www.objectrocket.com 22
  • 23. Apply a shard key www.objectrocket.com 23 Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer
  • 24. Sharding … s1 s2 sN Database <foo> Collection <foo> chunk chunk sh.ShardCollection({foo.foo},<key>) sh.startBalancer() chunk chunk chunk chunk Burn Period
  • 25. Revert a shard key www.objectrocket.com 25 Two categories: o Affects functionality (exceptions, inconsistent data,…) o Affects performance (operational hotspots…) Dump/Restore o Requires downtime – write and in some cases read o Time consuming operation o You may restore on a sharded or unsharded collection o Better pre-create indexes o Same or new cluster can be used o Streaming dump/restore is an option o On special cases, like time series data can be fast
  • 26. Revert a shard key www.objectrocket.com 26 Dual writes o Mongo to Mongo connector or Change streams o No downtime o Requires extra capacity o May Increase latency o Same or new cluster can be used o Adds complexity Alter the config database o Requires downtime – but minimal o Easy during burn period o Time consuming, if chunks are distributed o Has overhead during chunk moves
  • 27. Revert a shard key www.objectrocket.com 27 Process: 1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset Rollback: • After step 6, stop all mongos and shards • Stop the running members of the config server ReplSet and wipe their data directory • Start all config server replset members • Start all mongos and shards
  • 28. Revert a shard key www.objectrocket.com 28 Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus: Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases: Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1}) o Possible (and easier) if b’s max and min (per a) are predefined o For example {year:month} to be extended to {year:month:day} Reduce the elements of a shard key (({a:1, b:1} to {a:1}) o Possible (and easier) if all distinct “a” values are in the same shard o There aren’t chunks with the same “a.min” (adds complexity)
  • 29. Revert a shard key www.objectrocket.com 29 Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change *There might be a more optimal code path but the above one worked like a charm
  • 30. Chunk Splitting and Merging • Pre-splitting • Auto Splits • Manual Intervention www.objectrocket.com 30
  • 31. Distribution Goal … 31 s1* s2 s4 Database <foo> 25% 25% 25% 50G 50G 50G Database Size: 200G Primary Shard: s1
  • 32. Pre-Split – Hashed Keys 32 Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 Count = Number of documents / 125000 Limit = Number of shards * 8192 numInitialChunks = Min(Max(Size, Count), Limit) 1,600 = 51,200 / 32 800 = 100,000,000 / 125,000 32,768 = 4 *8192 1600 = Min(Max(1600, 800), 32768) Command db.runCommand( { shardCollection: ”foo.users", key: { "uid": "hashed" }, numInitialChunks : 1600 } );
  • 33. Pre-Split – Deterministic 33 Use Case: Collection containing user profiles with email as the unique key. Prerequisites 1. Shard key analysis complete 2. Understanding of access patterns 3. Knowledge of the data 4. Unique key constraint
  • 37. Automatic Splitting 37 Controlling Auto-Split • sh.enableAutoSplit() • sh.disableAutoSplit() Alternatively Mongos • The component responsible for track statistics • Bytes Written Statistics • Multiple Mongos Servers for HA
  • 38. Sub-Optimal Distribution … 38 s1* s2 s4 Database <foo> 40% 20% 20% 50G 50G 50G Database Size: 200G Primary Shard: s1 Chunks: Balanced
  • 39. Maintenance – Splitting 39 Four Helpful Resources: • collStats • config.chunks • Profiler • Oplog • dataSize
  • 40. Maintenance – Splitting 40 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile
  • 41. Maintenance – Splitting 41 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile Or:
  • 42. Maintenance – Splitting 42 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile *with setProfilingLevel at 2, analyze both read and writes
  • 43. Maintenance – Splitting 43 Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile* *with setProfilingLevel at 2, analyze both read and writes
  • 44. Sub-Optimal Distribution … 44 s1* s2 s4 Database <foo> 40% 20% 20% 50G 50G 50G Database Size: 200G Primary Shard: s1 Chunks: Balanced
  • 48. Balancing • Balancer overview • Balancing with defaults • Create a better distribution • Create a better balancing www.objectrocket.com 48
  • 49. Balancer 49 The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. Takes into account the number of chunks (and not the amount of data) Number of Chunks Migration Threshold Fewer than 20 2 20-79 4 80 and greater 8 Jumbo Chunks: MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection. Prior to 3.4.11 max was 250000 documents
  • 50. Balancer 50 Parallel Migrations: Before 3.4, one migration at a time After 3.4 parallel migrations as long as source and destination aren’t involve in a another migration Settings: chunkSize: Default is 64M – Lives on config.settings _waitForDelete : Default is false – Lives on config.settings _secondaryThrottle Default is true. After 3.4 WT uses false. – Lives on config.settings activeWindow - Default is 24h. – Lives on config.settings maxSize – Default is unlimited. Lives on config.shards disableBalancing: Disables/Enables balancing per collection autoSplit: Disables/Enables splits
  • 51. Balancing 51 Balancer only cares about the number of chunks per shard. Best case Our case Our goal
  • 52. Balancing 52 The “apple algorithm” we are going to introduce is simple For a collection, it requires an ordered chunk map, with attributes: chunk size, chunk bounds (min, max) and the shard each chunk belongs. 1 Pick the first chunk (current) 2 Merge current with next 3 If merged size is lower than a configured threshold then go to step 2 4 else merge current with next and set next as current Lets now see the implementation in Python.
  • 54. Balancing – Basic functions 54
  • 55. Balancing – Main function 55
  • 56. Balancing – Helper functions 56
  • 58. Balancing 58 Can the algorithm do better? Can we improve the balancing post running the script?
  • 59. Balancing 59 Can the algorithm do better? Can we improve the balancing post running the script? Make bounds more strict and add more parameters will improve it. -OR- Chunk Buckets maybe the answer. The script produces chunks between (chunksize/2) and (chunksize) chunks It will improved balancing but, It may not achieve a perfect distribution The idea is to categorize the chunks to buckets between (chunksize/2) and (chunksize) and each shard to have equal number of chunks from each bucket
  • 60. Balancing - Buckets 60 For example, chunksize=64 we can create the following buckets: o Bucket1 for sizes between 32 and 36 MiB o Bucket2 for sizes between 36 and 40 MiB o Bucket3 for sizes between 40 and 44 MiB o Bucket4 for sizes between 44 and 48 MiB o Bucket5 for sizes between 48 and 52 MiB o Bucket6 for sizes between 52 and 56 MiB o Bucket7 for sizes between 56 and 60 MiB o Bucket8 for sizes between 60 and 64 MiB More buckets means more accuracy but it may cause more chunk moves. The diversity of the chunks plays a major role
  • 62. Balancing – Get the code 62 GitHub Repo - https://bit.ly/2M0LnxG
  • 63. Orphaned Documents • Definition • Issues • Cleanup www.objectrocket.com 63
  • 64. Definition/Impact 64 Definition: Orphaned documents are those documents on a shard that also exist in chunks on other shards How can they occur: - Failed migration - Failed cleanup (RangeDeleter) - Direct access to the shards Impact: - Space - Performance - Application consistency
  • 65. Cleanup 65 cleanupOrphaned • Must run on every shard • Removes the Orphans automatically • No dry run / Poor reporting Drain shard(s) • Expensive – storage/performance • Locate shards with orphans
  • 66. Cleanup Cont. 66 There are ways to scan more intelligently: • Skip unsharded collections db.collections.find({"dropped" : false},{_id:1}) • Skip collections without migrations db.changelog.distinct("ns",{"what":"moveChunk.start"}) • Check first event - changelog is a capped collection
  • 67. Cleanup Cont. 67 An offline method to cleanup orphans: mongodump/mongorestore shards with orphans and config.chunks collection Remove documents on all ranges belong to the shard(s) The “leftovers” are the orphaned documents Its a bit more tricky with “hashed” keys:
  • 70. www.objectrocket.com 70 We’re Hiring! Looking to join a dynamic & innovative team? https://www.objectrocket.com/careers/ or email careers@objectrocket.com
  • 71. Thank you! Address: 401 Congress Ave Suite 1950 Austin, TX 78701 Support: 1-800-961-4454 Sales: 1-888-440-3242 www.objectrocket.com 71