2. Who is Steve Storey?
• Hobby Coder for ~20 years
Everything from Spectrum 48k through x86 assembler
• Professional Coder for 12 years
Everything from Pascal + REXX up to Java + PHP
• Application Architect at MOO
Everything from coding to meetings
3. Who are MOO?
• MOO is a London based, online printing company
• Launched in 2006 with one product – The MiniCard
• Expanded to 5 products
• Now has a UK and US printing/shipping facility
• We ship globally – to over 43 countries in our first year
Now well over 100, including Antarctica
5. Where's the NoSQL?
<This slide intentionally left blank>
But who can tell what the future holds?
6. What is NoSQL?
• Mostly an inflammatory descriptive term
• Refers to a database of semi-structured data
“semi-structured” defined however you like it
“database” might or might not have in-built query capability
not “relational” as per RDBMS, but might allow arbitrary relationships between
data nodes
• 4 general types
Key/Value – simple arbitrary data store (unstructured)
Graph databases – Inspired by Euler + graph theory
BigTable – clones of Google's BigTable database
Document – essentially associative arrays
7. What is NoSQL NOT?
• A new idea
• A replacement for SQL
NoSQL = “Not Only SQL” ?
Entirely complementary to RDBMS systems
• Non-transactional
This does slightly depend on your definition of transactional
8. Quick List of popular implementations
• Apache CouchDB
• MongoDB
• Amazon Dynamo
Powers the Amazon S3 web service
• Memcached
• Neo4J
• More at http://en.wikipedia.org/wiki/Structured_storage
9. Quick List of less popular implementations
• Lotus Notes/Domino
In fact – very popular with corporates, just not their employees
1.0 released in 1989
One of its engineers was Damian Katz who later went on to write CouchDB
10. What is CouchDB?
• Document store
A document is an associative array (in fact a JSON associative array)
• Allows developer-defined views on the documents
Akin to materialised views found in Oracle
Views use a Map/Reduce engine
• Restful HTTP interface
Client APIs written for most higher level languages
Also means that you can host an AJAX app entirely in CouchDB
• Built-in fault tolerant replication
NOTE! Not clustering
“Eventually consistent”
Lock-less updates (Multi-version concurrency control)
11. Why do I need documents?
• How much data is document like?
Wiki's
Blogs
SQL tables with CLOB fields (text/mediumtext/longtext)
• Schema-less
Arbitrary fields can be added at any point to any document
The DB doesn't attach any significance to (almost) any field
“_id” and “_rev” are special
• Hierarchical data structures
moo.com Pack data model
12. Woah! No schema?
• Requires thinking a bit differently
Field usage is defined by the code
Less restrictive in reality since different fields can be used for different concerns
No type or null restrictions (but documents can be validated at save time)
A document should represent the complete state of that part of the data model
• Doesn't necessarily mean acting very differently
Does all your code definitely attaches the same meaning to all the DB fields?
Even the meaning of status flags?
How long does it take to add a new column to a MySQL DB?
How much time do developers take learning your ORM solution?
How much time is spent mapping objects and relationships to tables, only to
load the complete tree on every request?
What happens to the careful DB guarantees if you shard your data?
13. Simple Views
• Matches a set of documents on some condition
the WHERE clause
also the FROM clause
• Outputs a set of fields, or parts of the associative array
the SELECT clause
• Usually coded in Javascript
CouchDB does however support alternative view server
View servers for Python, PHP, Ruby, Erlang, Perl available
• Uses only the “map” part of map/reduce
• No joins
But the documents represent the full state of that part of the data model ... right?
14. Advanced Views
• Still no joins
• Can perform complex calculations
You can only rely on the content of the document being processed
But the documents represent the full state of that part of the data model ... right?
• A Reduce function can be used to aggregate calculations
• Map and reduce intermediate results are indexed
Once calculated for a document, they never need to be re-calculated until the
document is updated
It's therefore very fast!
• Not as obvious how to program them
15. What about transactions?
• ACID compliant
On a document-by-document basis
Tolerant of very wide array of failure modes due to Erlang paradigms
The documented way to cleanly stop a CouchDB server is to kill the process
• No user-defined transactions
Essentially it's auto-commit
But the documents represent the full state of that part of the data model ... right?
No isolation levels, so don't run your banking on it …
No isolation levels to get in the way when you're storing data for a single user
Effective isolation is READ_COMMITTED
• No distributed transactions
The world is eventually consistent
A given user tied to a particular CouchDB server will always have a consistent
world view
16. Scaling
• Master/master replication strategy
• Eventually consistent replication
But the documents represent the full state of that part of the data model ... right?
• Requires conflict resolution in the application code
This might as simple as last update wins
Can equally be a user-driven process – the application code sees all conflicts of
a document and can decide how to proceed
• Offline working is easy
In fact – in-built for AJAX applications hosted within the CouchDB database
17. Weaknesses
• DBs require periodic compaction
All document operations (including deletion) are appended to the DB file
• Under heavy update load, storage may be sub-optimal
Try MongoDB, which does in-place updates – but requires greater transactional
overhead as a result
• SQL skills don't map (or reduce) over to CouchDB
18. The known unknowns
What's been left out
• Security
Recently introduced in 0.11.0
Pluggable authentication, defaults to CouchDB hosted _users database
Together with the validation functionality fairly powerful
• Caveat emptor ...
There's more to the details of everything I've talked about
19. Concluding ...
• What's MOO doing with all this?
• NoSQL databases have their place
There's more to the details of everything I've talked about
• SQL can do everything NoSQL can
Might take rather longer to do it
NoSQL is better suited for some use-cases
• Many different implementations for different use-cases
Each as their own strengths and weaknesses
• Download and try a few!
20. Questions?
• Steve Storey - steves@moo.com
• CouchDB - http://couchdb.apache.org/
• Further reading - http://books.couchdb.org/relax/