1. 5 Pitfalls to Avoid with MongoDB
Tim Callaghan
VP/Engineering,
Tokutek
tim@tokutek.com
@tmcallaghan
2. Tokutek: Database Performance Engines
What is Tokutek?
Tokutek® offers high performance and scalability for MySQL,
MariaDB and MongoDB. Our easy-to-use open source solutions
are compatible with your existing code and application
infrastructure.
Tokutek Performance Engines Remove Limitations
-Improve insertion performance by 20X
-Reduce HDD and flash storage requirements up to 90%
-No need to rewrite code
Tokutek Mission:
Empower your database to handle the Big Data
requirements of today’s applications
4. Housekeeping
• This presentation will be available for replay following
the event
• We welcome your questions; please use the console
on the right of your screen and we will answer
following the presentation
• A copy of the presentation is available upon request
5. Agenda
• Describe use-cases that lead to well known pitfalls
• How can they be avoided?
• Test, Measure, and Analyze (benchmark)
8. What is TokuMX?
• TokuMX = MongoDB with improved storage
• Drop in replacement for MongoDB v2.4 applications
• Including replication and sharding
• Same data model
• Same query language
• Drivers just work
• No Full Text or Geospatial
• Open Source
– http://github.com/Tokutek/mongo
10. 1a : Space
• MongoDB databases often grow quite large
• it easily allows users to...
• store large documents
• keep them around for a long time
• de-normalized data needs more space
• Operational challenges
• Big disks are cheap, but not fast
• Cloud storage is even slower
• Fast disks (flash) are expensive
• Backups are large as well
• Unfortunately, MongoDB does not offer compression
• goal = use less disk/flash
11. 1a : Space : Avoidance
• TokuMX offers built-in compression
• 3 compression algorithms
• quicklz, zlib, lzma, (none)
• Everything is compressed
• Field names and values
• Secondary indexes too
13. 13
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
14. 14
1a : Space : Analyze
size on disk, ~31 million inserts (lower is better)
TokuMX achieved
11.6:1 compression
15. 1b : Space
• MongoDB stores field names in each document
• Lots of redundant data
• When field names are long, documents may
contain more field name data than actual values
• Google “mongodb long field names”
• Lots of blogs and advice
• ... but descriptive schemas are useful!
16. 1b : Space : Avoidance
• Again, TokuMX offers built-in compression
• Field names are compressed along with values
• Compression algorithms love redundant data
• Be descriptive and toss that data dictionary!
• Who knows what is in field “zq”, not me?
17. 1b : Space : Test
schema 1 - long field names (10/20/20)
{ first_name : “Tim”,
last_name : “Callaghan”,
email_address : “tim@tokutek.com” }
schema 2 - short field names (26 less bytes per doc)
{ fn : “Tim”,
ln : “Callaghan”,
ea : “tim@tokutek.com” }
18. 1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
19. 1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
TokuMX is substantially
smaller, even without
compression
20. 1b : Space : Analyze
size on disk, 100 million inserts (lower is better)
In TokuMX, field name length
has almost no impact on size due
to compression
MongoDB was ~10%
smaller
23. 2 : Replication : Avoidance
• TokuMX replication allows secondary servers to process
replication without IO
• Simply injecting messages into the Fractal Tree
Indexes on the secondary server
• The “Hard Work” was done on the primary
• Read-before-write
• Uniqueness checking
• Elimination of replication lag
• Your secondaries are fully available for read scaling!
• Run multiple secondaries on a single server
23
24. 2 : Replication : Test
• Sysbench
• Workload
• point + range queries, update, delete, insert
• 16 collections, 10mm rows, 16GB RAM
• Setup
• loaded data on single server
• shutdown and copied data folder
• created secondary
• Ran benchmark
24
27. 3 : Declining Performance
• MongoDB insert/update/delete performance drops
dramatically when the indexes do not fit in memory
• Operations are limited by IOPs
• Generally 1 operation per available IO
• Less if secondary index maintenance, 1 IO for each
• Solution: Add RAM or Shard.
28. 3 : Declining Performance : Avoidance
28
• TokuMX runs on Tokutek’s Fractal Tree indexes
• Message buffers delay IO and reduce cache disruption
• Perform many operations per IO
• Many workloads don’t need additional memory or
sharding, they just need better indexing
• RAM = $$$
• Sharding = $$$ + Complexity
34. 4 : Concurrency
• MongoDB originally implemented a global write lock
• 1 writer at a time
• MongoDB v2.2 moved this lock to the database level
• 1 writer at a time in each database
• This severely limits the write performance of servers
• 36 shards on 1 server example
• Allows for more concurrency
• High operational complexity
• Google “mongodb multiple shards same server”
40. 5 : Got Transactions?
• MongoDB does not support “transactions”
• Each operation is visible to everyone
• There are work-arounds, Google “mongodb transactions”
• http://docs.mongodb.org/manual/tutorial/perform-two-
phase-commits/
This document provides a pattern for doing multi-document
updates or “transactions” using a two-phase commit
approach for writing data to multiple documents.
Additionally, you can extend this process to provide a
rollback like functionality.
(the document is 8 web pages long)
• MongoDB does not support multi-version concurrency control
(MVCC)
• Readers do not get a consistent view of the data, as they can be
interrupted by writers
• People try, Google “mongodb mvcc”
41. • ACID
• In MongoDB, multi-insertion operations allow for
partial success
• Asked to store 5 documents, 3 succeeded
• TokuMX offers “all or nothing” behavior
• Document level locking
• MVCC
• In MongoDB, queries can be interrupted by writers.
• The effect of these writers are visible to the reader
• TokuMX offers MVCC
• Reads are consistent as of the operation start
41
5 : Transactions : Avoidance
42. • Transactions in TokuMX
• db.runCommand({“beginTransaction”})
• ... perform 1 or more operations
• db.runCommand(“rollbackTransaction”) |
db.runCommand(“commitTransaction”)
• Note: not available in sharded environments
• For more information
• http://www.tokutek.com/2013/04/mongodb-transactions-yes/
• http://www.tokutek.com/2013/04/mongodb-multi-statement-
transactions-yes-we-can/
42
5 : Transactions : Avoidance
43. Tokutek: Database Performance Engines
43
Any Questions?
Download TokuMX at www.tokutek.com/download
Register for product updates, access to premium
content, and invitations at www.tokutek.com
Join the Conversation