- MongoDB 3.0 introduces pluggable storage engines, with WiredTiger as the first integrated engine, providing document-level locking, compression, and improved concurrency over MMAPv1.
- WiredTiger uses a B+tree structure on disk and stores each collection and index in its own file, with no padding or in-place updates. It includes a write ahead transaction log for durability.
- To use WiredTiger, launch mongod with the --storageEngine=wiredTiger option, and upgrade existing deployments through mongodump/mongorestore or initial sync of a replica member. Some MMAPv1 options do not apply to WiredTiger.
4. Storage Engine Layer
● Vision: Many storage engines optimized for
many different use cases
● One data model, one API, one set of
operational concerns – but under the hood,
many options for every use case under the sun
5. Storage Engine Layer
Content
Repo
IoT Sensor
Backend
Ad Service
Customer
Analytics
Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
MMAP V1 WT In-Memory ? ?
Supported in MongoDB 3.0 Future Possible Storage Engines
Management
Security
Example Future State
Experimental
7. Why WT?
• WT addresses weaknesses of MMAPv1
– Compression
– Online Compaction
– Highly concurrent and vertically scalable
• Document level locking
• Allows full hardware utilization
• More tunable
– Higher ceiling for potential improvement
8. Why WT
• Strong Foundation
– Authors Former Members of Berkeley DB
team
• WT product and team acquired by
MongoDB
• Standalone Engine already in use in large
deployments including Amazon
10. Document Level Concurrency
• Improved Concurrency
– Uses algorithms to minimize contention
between threads
• One thread yields on contention to same document
• Atomic update replaces latching/locking
– Writes no longer block all other writes
– CPU utilization directly correlates with
performance
12. Compression
• Compression is on in WT by default
• 2.8 supports two compression algorithms
– snappy ( default)
• Good compression benefits with little
CPU/performance impact
– zlib
• Extremely good compression at a cost of
additional CPU/degraded performance
14. WT Internals
• File format
– Data Stored as conventional B+ tree on disk
– Inserts into one big file per collection
– In memory, on disk format is decoupled
• Variable-length pages on disk
• Inserts into in-memory skiplists
15. WT Internals
• File Layout
– Each collection & index stored in own file
• Can only use one
– Will fail to start if MMAPv1 files found in
dbpath
– No in-place updates
• Rewrites document to end of file every
time
• No padding factor
16. WT Internals
• File Layout
– Does not extend concept of DB to disk layout
• Concept of DB purely logical
– Journal has own folder under dbpath
– Allows for storing indexes on separate
volumes easily
17. WT Internals
• Deprecate MMAPv1-specific catalog metadata
– system.indexes & system.namespaces
– System metadata should be accessed via
explicit commands going forward
18. WT Internals
• Cache
– WT uses two caches
• The WiredTiger engine cache
• The file system cache
• WT cache uses higher value of 50% of
system memory or 1GB (by default)
19. WT Internals
• When is data written to disk?
– Checkpoints
• By default, performed every 60 seconds or
after 2GB are written
• Analogous to MMAPv1 data file flushes
– When the WT cache is full pages will be
written to disk
20. WT Internals
• Durability
– WT uses a write ahead transaction log
• Cannot set journalCommitInterval
• Can use j:true write concern
• Can use --journal or –nojournal
• Old journal files truncated after each checkpoint
– Data files are always consistent, no longer have
need for journal recovery after unclean shutdown
• Will only lose data since last checkpoint (< 60 s.)
21. WT Internals
• Supported Platforms
– Linux
– Windows
– Mac OSX
• Not Supported Platforms
– NO Solaris (yet)
– NO 32Bit (ever)
22. How Do I Install/Upgrade It?
• Installation
– Starting from scratch add 1 additional flag
when launching mongod
• --storageEngine=wiredTiger
23. How Do I Install/Upgrade It?
• Upgrade Procedure
– 2 ways
– Mongodump/Mongorestore
– Initial sync a new replica member running
WT
• We support running replicas with mixed
storage engines
– CANNOT copy raw data files
• WT will fail to start if wrong data format in
dbpath
24. What Options Are Relevant?
• Options that still apply
– journal
– nojournal
– repair
– repairPath
– upgrade
25. What Options Are Relevant?
• MMAPV1 only options
– quota
– quotaFiles
– noprealloc
– nssize
– smallfiles
– syncdelay
– journalOptions
– journalCommitInterval
26. What Options Are Relevant?
• File Format specific options removed from
tools.
– --dbpath
Will allow us to go beyond limitations that previously hindered mongoDB on MMAPV1Add some additional administrative complexity
Still experimenting with other knobs and libraries to determine what is best for different use cases and what our recommendations are
system.profile & system.users still there
listCollections, listInidexes, createIndexes, should be used
MMAPv1 ONLY uses filesystem cache
There will be many different use cases for which these knobs will need to be adjusted… e.g., bulk load with no journaling, or no checkpointing
The purpose of the journal is much different in MMAPv1 and WT…..
In MMAPv1, you need it to recover a consistent view of the data (plus recover the data that was written)
In WT, it’s only to get the data back
Makes running with nojournal w:majority totally safe.
Mongodump/MongoRestore - Without stopping writes to the instance this won’t be a point in time unless oplogDump and --oplog etc used