This document provides an overview and introduction to NoSQL databases. It discusses what NoSQL is, examples of popular NoSQL databases like MongoDB, CouchDB, HBase, Cassandra, Redis, and Hadoop. It covers common concepts like CAP theorem, architectures of these databases, use cases, and how they compare to traditional relational databases. The document also discusses open source projects, academic research, and how hybrid architectures combining multiple databases are becoming more common.
1. NOSQL. WTW?
adicu.com
February 2011
Alexander Sicular
@siculars
2. Who is this blowhard?
Columbia University pays my mortgage
For the better part of a decade in Medical
Informatics
Am not shilling for any of these companies
Am not a computer scientist
Am a computer science enthusiast
particularly in the area of Informatics
3. NoSQL or NOSQL?
Not Only SQL
Non/post relational
Big tent policy
Umbrella term
Fragmented
http://www.flickr.com/photos/morgennebel/2933723145/
4. Your Usage Patterns
Read vs. Write
Mutable vs. Immutable
Product Considerations:
In place updates
Write Only Logs
5. This vs. That
Riak wiki comparisons page
http://wiki.basho.com/Riak-Comparisons.html
Popular one page comparison of a number of
NOSQL players by Kristof Kovacs:
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
6. NOSQL concepts are
Not Brand New
Memcached since 2003 http://memcached.org
Google papers 2004-2006
Amazon Dynamo 2007
Consistent Hashing 2007 http://www.last.fm/user/RJ/journal/
2007/04/10/rz_libketama_-_a_consistent_hashing_algo_for_memcache_clients
Using relational systems as a key-value blob
store
2009 FriendFeed (not the first) http://bret.appspot.com/entry/how-
friendfeed-uses-mysql
7. Why NOSQL
Support for “Vary Large” data sets
Schemaless
Denormalized
Green field
New applications
http://www.flickr.com/photos/gailtang/1243984297/
8. Academia
Google:
Bigtable http://labs.google.com/papers/bigtable.html
GFS http://labs.google.com/papers/gfs.html
M/R http://labs.google.com/papers/mapreduce.html
Amazon:
Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
NOSQL Summer http://nosqlsummer.org/papers
NOSQL Tapes http://nosqltapes.com
9. Under the Hood
Terminology
Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system
Merkle Trees http://en.wikipedia.org/wiki/Hash_tree
B-trees http://en.wikipedia.org/wiki/B-tree
Vector clock http://en.wikipedia.org/wiki/Vector_clock
Bloom filters http://en.wikipedia.org/wiki/Bloom_filters
Big O Notation http://en.wikipedia.org/wiki/Big_o_notation
Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing
13. MongoDB
10Gen, MongoHQ, Soft landing for
MongoLab those coming from
mysql (relational
C++ databases)
huMONGOus Native javascript
Sharded scaling, Secondary indexes
replicated master/
slave
Located in NYC
(go visit them)
17. Hadoop
Cloudera, Apache Huge ecosystem
Foundation
Yahoo, FB, Twitter,
Java Fortune 500
High latency Pig, Hive, Flume
Batch oriented
HDFS is GFS based
Open source Google
stack via the Google
papers
18. HBase
Java
Low latency store
sits on top of Hadoop
Modeled after Google Bigtable
Column oriented
Thrift, protobuf
Backend for new Facebook Messaging service
19. Cassandra
Apache
Java
Column oriented
Like Bigtable and Dynamo
Originated at Facebook
At Twitter, Distributed counting
http://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-King
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
20. Redis
OpenRedis incredibly fast
C memcached on
steroids
REmote
DIctionary replicated
Server master/slave
Specific data
structures
21. Commonalities
Open Source
Adherence to common or standard:
data formats
json, bson, utf8, binary
data trandport mechanisms
http, thrift, protobuf,
simple wire protocols
22. Ok. So Now What?
Analyze your requirements
Mailing lists
IRC, twitter
Project pages, wiki
Github/Google Code/Bitbucket:
project page
specific language clients
23. Variety Pack
Hybrid architectures will become the norm
Twitter - mysql, cassandra, hadoop
Google - mysql, GAE (BT)
Facebook - mysql,
cassandra, hbase,
memcached
Yahoo - mysql, hadoop
LinkedIn - voldemort http://www.flickr.com/photos/uncleweed/82245324/
24. Questions?
adicu.com
February 2011
Alexander Sicular
@siculars