Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Scaling with MongoDB

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Nächste SlideShare
MongoDB at Scale
MongoDB at Scale
Wird geladen in …3
×

Hier ansehen

1 von 32 Anzeige

Scaling with MongoDB

Herunterladen, um offline zu lesen

MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.

MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Anzeige

Ähnlich wie Scaling with MongoDB (20)

Anzeige

Aktuellste (20)

Scaling with MongoDB

  1. Rick Copeland @rick446 Arborian Consulting, LLC
  2.  Now a consultant, but formerly…  Software engineer at SourceForge, early adopter of MongoDB (version 0.8)  Wrote the SQLAlchemy book (I love SQL when it’s used well)  Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …
  3.  You can do it with an RDBMS as long as you…  Don’t use joins  Don’t use transactions  Use read-only slaves  Use memcached  Denormalize your data  Use custom sharding/partitioning  Do a lot of vertical scaling ▪ (we’re going to need a bigger box)
  4. +1 Year
  5.  Use documents to improve locality  Optimize your indexes  Be aware of your working set  Scaling your disks  Replication for fault-tolerance and read scaling  Sharding for read and write scaling
  6. Relational (SQL) MongoDB Database Database Dynamic Typing Table Collection B-tree (range-based) Index Index Row Document Think JSON Column Field Primitive types + arrays, documents
  7. { title: "Slides for Scaling with MongoDB", author: "Rick Copeland", date: ISODate("20012-02-29T19:30:00Z"), text: "My slides are available on speakerdeck.com", comments: [ { author: "anonymous", date: ISODate("20012-02-29T19:30:01Z"), text: "Fristpsot!" }, { author: "mark”, date: ISODate("20012-02-29T19:45:23Z"), text: "Nice slides" } ] } Embed comment data in blog post document
  8. Seek = 5+ ms Read = really really fast
  9. Post Comment Author
  10. Post Author Comment Comment Comment Comment Comment
  11. Find where x equals 7 1 2 3 4 5 6 7 Looked at 7 objects
  12. Find where x equals 7 4 2 6 1 3 5 7 Looked at 3 objects
  13. Entire index must fit in RAM
  14. Only small portion in RAM
  15.  Working set =  sizeof(frequently used data)  + sizeof(frequently used indexes)  Right-aligned indexes reduce working set size  Working set should fit in available RAM for best performance  Page faults are the biggest cause of performance loss in MongoDB
  16. >db.foo.stats() Data Size { "ns" : "test.foo", "count" : 1338330, "size" : 46915928, Average doc size "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, Size on disk (or RAM!) "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, Size of all indexes "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848}, "ok" : 1 Size of each index }
  17. ~200 seeks / second
  18. ~200 seeks / second ~200 seeks / second ~200 seeks / second  Faster, but less reliable
  19. ~400 seeks / second ~400 seeks / second ~400 seeks / second  Faster and more reliable ($$$ though)
  20.  Old and busted  master/slave replication  The new hotness  replica sets with automatic failover Read / Write Primary Read Secondary Read Secondary
  21.  Primary handles all writes  Application optionally sends reads to slaves  Heartbeat manages automatic failover
  22.  Special collection (the oplog) records operations idempotently  Secondaries read from primary oplog and replay operations locally  Space is preallocated and fixed for the oplog
  23. { "ts" : Timestamp(1317653790000, 2), Insert "h" : -6022751846629753359, "op" : "i", "ns" : "confoo.People", Collection name "o" : { "_id" : ObjectId("4e89cd1e0364241932324269"), "first" : "Rick", "last" : "Copeland” } } Object to insert
  24.  Use heartbeat signal to detect failure  When primary can’t be reached, elect a new one  Replica that’s the most up-to-date is chosen  If there is skew, changes not on new primary are saved to a .bson file for manual reconciliation  Application can require data to be replicated to a majority to ensure this doesn’t happen
  25.  Priority  Slower nodes with lower priority  Backup or read-only nodes to never be primary  slaveDelay  Fat-finger protection  Data center awareness and tagging  Application can ensure complex replication guarantees
  26.  Reads scale nicely  As long as the working set fits in RAM  … and you don’t mind eventual consistency  Sharding to the rescue!  Automatically partitioned data sets  Scale writes and reads  Automatic load balancing between the shards
  27. Configuration MongoS MongoS Config 1 Config 2 Config 3 Shard 1 Shard 2 Shard 3 Shard 4 0..10 10..20 20..30 30..40 Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary
  28.  Sharding is per-collection and range-based  The highest-impact choice (and hardest to change decision) you make is the shard key  Random keys: good for writes, bad for reads  Right-aligned index: bad for writes  Small # of discrete keys: very bad  Ideal: balance writes, make reads routable by mongos  Optimal shard key selection is hard
  29. Primary Data Center Secondary Data Center Shard 1 Shard 1 Shard 1 Priority 1 Priority 1 Priority 0 Shard 2 Shard 2 Shard 2 Priority 1 Priority 1 Priority 0 Shard 3 Shard 3 Shard 3 RS3 Priority 1 Priority 1 Priority 0 Config 1 Config 2 Config 3
  30.  Writes and reads both scale (with good choice of shard key)  Reads scale while remaining strongly consistent  Partitioning ensures you get more usable RAM  Pitfall: don’t wait too long to add capacity
  31. Rick Copeland @rick446 Arborian Consulting, LLC

Hinweis der Redaktion

  • You’d like to just ‘add capacity’ but you end up having to buy a bigger serverBuild your own infrastructure and you pay more for less as you scaleThe cloud can help with this, but only up to a point; what happens when you’re using the largest instance? Time to rearchitect.
  • There are a lot of features that make RDBMSs attractiveBut as we scale we need to turn off a lot of them to get performance increasesWe end up with something that scales, but it’s hard to use
  • RAM functions as a cacheReplication ends up caching documents in multiple locationsSharding makes sure documents only have one ‘home’
  • A single shard is a replica setMongoS is a router that determines where reads and writes goDocuments is ‘chunked’ into ranges. Chunks can be split and migrated to other servers based on load.Configuration servers persist location of particular shard key ranges Cluster is alive when one or more config servers are down, but there can be no migration

×