6. MongoDB
● it is the “new MySQL”
● Project started in 2007 by 10gen (now MongoDB Inc)
● Cross-platform, open-source
● 5th most used DBMS & most used Document Store*
(next DS CouchDB - 21st)
* According to db-engines.com as of Oct 2014
7. Characteristics
● “It's really a hybrid database with features from a few
different places.” (Gaetan Voyer-Perrault on Quora)
● Document Oriented but NO SCHEMA!
● Documents grouped in Collections
● Binary JSON (BSON) format
● Load Balancing (automated sharding, sharding key
can be user defined)
● Replication (Replica Sets)
● Automated failover
8. Characteristics - continued
● Primary and Secondary Indexes
● JavaScript for UDF
● MapReduce
● Capped Collections
● Aggregation Framework since 2.2
● Ad-hoc Query Support
10. Generic performance tips
● Use 64-bit OS
● Lots of RAM, fast disks (was anyone expecting
something else?)
● ensure that at least indexes + working set fit in RAM
(db.stats(), db.<coll>.stats()) - if not, you might want to
try TokuMX
● Design for de-normalized data models
11. Generic performance tips
● Write-Concerns
● Shard early
● Fixed (or at least bounded) record size => better write
performance
● Use short attribute names (reduces index & data size,
OFC!)
● EXT4 or XFS
12. IRL
● virtualized server 8G RAM, 4 vCPU - no sharding, no
replica sets
● 100 inserts/s , 130M doc collection WITH secondary
index (avg doc size 0.6k)
● 20 inserts/s 3M doc collection WITH 18 secondary
indexes (avg doc size 10k)
13. Use Cases
● Logs
● Location Data (Mongo has built in Geospatial ops)
● Account and User Profiles
● Messaging
● (complex) Config Data
● http://www.mongodb.com/who-uses-mongodb (hint:
Expedia, Business Insider, The Weather Channel,
Foursquare, eBay)
15. Redis
● Salvatore Sanfilippo (@antirez)
● Started in 2009
● Key-Value Store
● 11th most used DBMS & most used KV Store* (next
KVS memcached - 19th)
● Sponsored by Pivotal (spinoff EMC/VMware)
* According to db-engines.com as of Oct 2014
16. Characteristics
● Holds all data in memory, persists on disk
● Data Models
○ Strings/Blobs/Bit-Maps (not really Bitmaps)
○ Hashtables
○ Linked Lists
○ Sets
○ Sorted Sets
● HyperLogLog (+2.8.9 - trade accuracy for memory)
● Master Slave Replication
● High Availability (through Sentinel)
17. Characteristics - continued
● Redis Cluster in works (not production ready yet) -
sharding
○ asynchronous replication
○ does not guarantee strong consistency (may ‘forget’ writes)
● AOF sync - default 2s
● Does not support secondary indexes
● Pub/Sub mode since 2.0
● Key expiry
● Server scripting with Lua
18. IRL
● virtualized server 4G RAM, 1vCPU
● +50k get/set per second (redis-benchmark)
● only 128 queries out of 1165550375 over 10ms
(0.00001%)
○ uptime_in_days:439
○ used_memory_human:424.09M
○ used_memory_peak_human:834.94M
○ total_connections_received:1352935
○ db0:keys=610884,expires=355397
19. Generic performance tips
● Use short key names (reduces data size, OFC!)
● You can create secondary indexes (but you have to
maintain them, e.g. using SET)
● You can have ad-hoc queries (actually is query) :
using SORT
20. Use Cases
● Cache
● IPSS/IPC
● Queue mechanisms (see e.g. Resque)
● Log/Task buffers
● Statistics and aggregation datastore
● (anywhere you use memcached)
● http://redis.io/topics/whos-using-redis (hint: Twitter,
GitHub, Snapchat, StackOverflow a.o.)