NoSQL - We know what it isn't, but what is it?

NoSQL
Now we know what it’s not... what is it?

What are we running
from?
• Relational databases are the defacto
standard for storing data in a web
application.
• A lot of times, that data isn’t really
relational at all.
• RDBMS’s have lots of rules that can impact
performance.

Rules? What Rules?
• Classic relational databases follow the
ACID rules:
• Atomicity
• Consistency
• Isolation
• Durability

Atomicity
• If any part of the update fails, it all fails.
• Databases have to be able to lock tables
and rows for operations, which can block
or delay other incoming requests.

Consistency
• After a transaction, all copies of the data
must be consistent with each other (my
interpretation).
• Replication across lots of shards is
expensive especially if there’s locking
involved.

Isolation
• Data involved in a transaction must be
inaccessible to other operations.
• Remember the thing about locked rows
and tables?
• It’s a bummer.

Durability
• Once a user is notiﬁed that a transaction
has completed, the data must be accessible
and all integrity constraints have been met.

I come not to bury
MySQL...
• Relational databases are great for a lot of
uses.
• If you have data that’s actually relational and
you need transactions, joins and have a
limited number of data types, then an
RDBMS will work for you.

But...
• RDBMS’s have been
treated like hammers
and used for things
they’re not good at and
weren’t designed for.
• Like the web...

Thus were born...
• Key-Value Stores
• Wide-Column Stores
• Document Stores/Databases
• Graph Databases

All thrown together &
clumsily dubbed...

Which, despite it’s
negative sound,
supposedly means:
“Not Only SQL”

Yeah, I don’t believe it
either...

Key-Value
Just what it sounds like. You set a Key to aValue and
can then retrieve it.

Key-Value Beneﬁts
• Simple
• High performance (usually) because there
are no transactions or relations so it’s a
simple bucket and lookup.
• Extremely ﬂexible
• Commonly used as caches in front of
slower resources (like MySQL - bazinga!)

Popular Players
• memcached - in memory only, extremely
efﬁcient hashing algorithm allows you to
scale easily to hundreds of nodes.
• Redis - persistent, slightly more complex
than memcached (has support for arrays)
but still highly performant.
• Riak - The Rails Machine guys love it. Jesse?

My Uses
• memcached: Read-through cache for
Rails with cache-money.
• redis: persistent cache for results from
our algorithm, partitioned by version and
instance.

Wide Column
• Family of databases modeled on either
Google’s BigTable or Amazon’s Dynamo.
• Pick two out of three from the CAP
theorem in order to get horizontal
scalability.
• Data stored by column instead of by row.

CAP?
• Consistency:All clients always have the
same view of the data.
• Availability: Each client can always read
and write.
• Partition Tolerance:The system works
well despite physical network partitions

Use cases
• Making sense out of large amounts of data
where you know your query scenario
ahead of time.
• Large = 100s of millions of records.
• Data-mining log ﬁles and other sources of
similar data.

Big Players
• HBase
• Cassandra
• Hypertable
• Amazon’s SimpleDB
• Google’s BigTable (the granddaddy of all of
them)

Graph Databases
• Store nodes, edges and properties
• Think of them as Things, Connections and
Properties
• Good for storing properties and
relationships.
• Honestly, I don’t fully understand them...
anyone?

The Players
• Neo4j
• FlockDB
• HyperGraphDB

Document Stores
• Short on relationships, tall on rich data
types.
• Big on eventual consistency and ﬂexible
schemas.
• Hybrid of traditional RDBMS and Key-Value
stores.

Use Cases
• Content Management Systems
• Applications with rapid partial updates
• Anything you don’t need joins or
transactions for that you would normally
use a RDBMS for.

The Players
• CouchDB
• MongoDB
• Terrastore

MongoDB
• Support for rich data types: arrays, hashes,
embedded documents, etc
• Support for adding and removing things
from arrays and embedded documents
(addToSet, for example).
• Map/Reduce support and strong indexes
• Regular expression support in queries

Design Considerations
• Embedded Documents - Use only if it
the embedded document will always be
selected with the parent.
• Indexes - MongoDB punishes you much
earlier for missing indexes than MySQL.
• Document size - Currently, documents
are limited to 4MB, which should be large
enough, but if it’s not...

Real-World MongoDB
• We use MongoDB heavily at MIS.
• Statistics application and reporting
• Top-secret new application
• Web crawler and indexer
• CMS

Real-World Example
Let’s do tags. Everything is taggable now, right?

And to get a “thing’s”
tags?
SELECT `tags`.* FROM `tags`
INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id
WHERE ((`taggings`.taggable_id = 237)
AND (`taggings`.taggable_type = 'Song'))

Yuck!
That’s a lot of pain for something so simple.
And I didn’t even show you ﬁnding things with tag “x”.
Or how to set and unset tags on a “thing”.
Ouch.

The MongoDB Way
Using MongoMapper and Rails 3

class Post
include MongoMapper::Document
key :title, String
key :body, String
key :tags, Array
ensure_index :tags
end

Let’s Make This Easy...
def add_tag(tag)
tag = Post.clean_tag(tag)
self.tags << tag
self.add_to_set(:tags => tag) unless self.new_record?
end
def remove_tag(tag)
tag = Post.clean_tag(tag)
self.tags.delete(tag)
self.pull(:tags => tag) unless self.new_record?
end
def self.clean_tag(str)
str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"")
end
def self.clean_tags(str)
out = []
arr = str.split(",")
arr.each do |t|
out << self.clean_tag(t)
end
out
end

Demo Time
Sorry if you’re looking at this later, but it’s console time!

Why I Love MongoDB
• Document model ﬁts how I build web apps.
• For most apps, I don’t need transactions.
• Eventual consistency is actually OK.
• Partial updates and arrays make things that
are a pain in SQL-land absolutely painless.
• It’s just smart enough without getting in the
way.

What’s NoSQL, really?
• The right tool for the job.
• We’ve got lots of options for storing
application data.
• The key is picking the one that solves our
real problem.
• And if an RDBMS is the right tool, that’s OK
too.

Further Reading
• Visual NoSQL: http://blog.nahurst.com/
visual-guide-to-nosql-systems
• MongoDB: http://mongodb.org
• MongoMapper: http://mongomapper.com/

Thanks!
• Kevin Lawver
• @kplawver
• kevin@lawver.net
• http://kevinlawver.com

NoSQL - We know what it isn't, but what is it?

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Recently uploaded

Recently uploaded (20)

NoSQL - We know what it isn't, but what is it?