Rick Copeland is a consultant who previously worked as a software engineer and wrote books on SQLAlchemy and Python. He discusses how MongoDB can scale better than relational databases by avoiding joins, transactions, and normalization. Some scaling techniques for MongoDB include using documents to improve data locality, optimizing indexes, being aware of working data sets, scaling disks, replication for fault tolerance, and sharding for further read and write scaling.
2. Now a consultant, but formerly…
Software engineer at SourceForge, early adopter of
MongoDB (version 0.8)
Wrote the SQLAlchemy book (I love SQL when it’s
used well)
Mainly write Python now, but have done C++, C#,
Java, Javascript, VHDL, Verilog, …
3. You can do it with an RDBMS as long as you…
Don’t use joins
Don’t use transactions
Use read-only slaves
Use memcached
Denormalize your data
Use custom sharding/partitioning
Do a lot of vertical scaling
▪ (we’re going to need a bigger box)
6. Use documents to improve locality
Optimize your indexes
Be aware of your working set
Scaling your disks
Replication for fault-tolerance and read scaling
Sharding for read and write scaling
7. Relational (SQL) MongoDB
Database Database Dynamic
Typing
Table Collection B-tree
(range-based)
Index Index
Row Document
Think JSON
Column Field
Primitive types +
arrays, documents
8. {
title: "Slides for Scaling with MongoDB",
author: "Rick Copeland",
date: ISODate("20012-02-29T19:30:00Z"),
text: "My slides are available on speakerdeck.com",
comments: [
{ author: "anonymous",
date: ISODate("20012-02-29T19:30:01Z"),
text: "Fristpsot!" },
{ author: "mark”,
date: ISODate("20012-02-29T19:45:23Z"),
text: "Nice slides" } ] }
Embed comment data in
blog post document
16. Working set =
sizeof(frequently used data)
+ sizeof(frequently used indexes)
Right-aligned indexes reduce working set size
Working set should fit in available RAM for best
performance
Page faults are the biggest cause of performance
loss in MongoDB
17. >db.foo.stats()
Data Size
{
"ns" : "test.foo",
"count" : 1338330,
"size" : 46915928, Average doc size
"avgObjSize" : 35.05557523181876,
"storageSize" : 86092032,
"numExtents" : 12,
"nindexes" : 2, Size on disk (or RAM!)
"lastExtentSize" : 20872960,
"paddingFactor" : 1,
"flags" : 0, Size of all indexes
"totalIndexSize" : 99860480,
"indexSizes" : {
"_id_" : 55877632,
"x_1" : 43982848},
"ok" : 1 Size of each index
}
19. ~200 seeks / second ~200 seeks / second ~200 seeks / second
Faster, but less reliable
20. ~400 seeks / second ~400 seeks / second ~400 seeks / second
Faster and more reliable ($$$ though)
21. Old and busted master/slave replication
The new hotness replica sets with automatic
failover
Read / Write Primary
Read Secondary
Read Secondary
22. Primary handles all
writes
Application optionally
sends reads to slaves
Heartbeat manages
automatic failover
23. Special collection (the oplog) records operations
idempotently
Secondaries read from primary oplog and replay
operations locally
Space is preallocated and fixed for the oplog
25. Use heartbeat signal to detect failure
When primary can’t be reached, elect a new one
Replica that’s the most up-to-date is chosen
If there is skew, changes not on new primary are
saved to a .bson file for manual reconciliation
Application can require data to be replicated to a
majority to ensure this doesn’t happen
26. Priority
Slower nodes with lower priority
Backup or read-only nodes to never be primary
slaveDelay
Fat-finger protection
Data center awareness and tagging
Application can ensure complex replication
guarantees
27. Reads scale nicely
As long as the working set fits in RAM
… and you don’t mind eventual consistency
Sharding to the rescue!
Automatically partitioned data sets
Scale writes and reads
Automatic load balancing between the shards
29. Sharding is per-collection and range-based
The highest-impact choice (and hardest to
change decision) you make is the shard key
Random keys: good for writes, bad for reads
Right-aligned index: bad for writes
Small # of discrete keys: very bad
Ideal: balance writes, make reads routable by mongos
Optimal shard key selection is hard
31. Writes and reads both scale (with good choice of
shard key)
Reads scale while remaining strongly consistent
Partitioning ensures you get more usable RAM
Pitfall: don’t wait too long to add capacity
You’d like to just ‘add capacity’ but you end up having to buy a bigger serverBuild your own infrastructure and you pay more for less as you scaleThe cloud can help with this, but only up to a point; what happens when you’re using the largest instance? Time to rearchitect.
There are a lot of features that make RDBMSs attractiveBut as we scale we need to turn off a lot of them to get performance increasesWe end up with something that scales, but it’s hard to use
RAM functions as a cacheReplication ends up caching documents in multiple locationsSharding makes sure documents only have one ‘home’
A single shard is a replica setMongoS is a router that determines where reads and writes goDocuments is ‘chunked’ into ranges. Chunks can be split and migrated to other servers based on load.Configuration servers persist location of particular shard key ranges Cluster is alive when one or more config servers are down, but there can be no migration