2. INTRODUCTIONS
• Mike Miller
• not an Apache CouchDB committer
• ParticlePhysics Faculty at U. of
Washington
• Cloudant cofounder, Chief Scientist
• Background: machine learning, analysis,
big data, globally distributed systems
STS 3/10/2010 2 Mike Miller UW/Cloudant
3. NEW PROBLEMS => NEW SOLUTIONS
( http://www.economist.com/specialreports/
PrinterFriendly.cfm?story_id=15557443)
Big Data Dynamic, Realtime
STS 3/10/2010 3 Mike Miller UW/Cloudant
4. NEW PROBLEMS => NEW SOLUTIONS
“old” “new”
Scale this horizontally/linearly
STS 3/10/2010 4 Mike Miller UW/Cloudant
5. CAP THEOREM
read record
consistency availability
partition
tolerance write record
masterless, scalable
Brewer: Pick two (at a time) (e.g., Cloudant’s Couch)
STS 3/10/2010 5 Mike Miller UW/Cloudant
6. NoSQL: Not Only SQL
• Name blame: Eric Evans from Rackspace
• Pragmatic response to new class of problems poorly fit to
RDBMS:
• heterogeneous/dynamic data, geographic/network separation,
scaling/concurrency
• Not an abandonment of 40+ years of DB research
• From home-brew => general solutions
• Best collection of online talks: https://nosqleast.com/2009/#agenda
STS 3/10/2010 6 Mike Miller UW/Cloudant
7. TAXONOMY: 1
E. Eifrem, no:sql(east)
STS 3/10/2010 7 Mike Miller UW/Cloudant
8. TAXONOMY: 2
E. Eifrem, no:sql(east)
STS 3/10/2010 8 Mike Miller UW/Cloudant
9. EMIL’S CLASSIFICATION
but neglects: concurrency, performance, reliability...
STS 3/10/2010 9 Mike Miller UW/Cloudant
11. CouchDB
• easy, flexible, robust, concurrent, replicates
• Document: Schema-free database server
• RESTful JSON API
• views: MapReduce system for generating custom
• replication: Bi-directional incremental
• couchapp: lightweight HTML+JavaScript apps served directly
from CouchDB using views to transform JSON
STS 3/10/2010 11 Mike Miller UW/Cloudant
12. OF THE WEB
Django may be built for the Web, but
CouchDB is built of the Web. I've never
seen software that so completely embraces
the philosophies behind HTTP ... this is
what the software of the future looks like.
Jacob Kaplan-Moss
October 17 2007
http://jacobian.org/writing/of-the-web/
STS 3/10/2010 12 Mike Miller UW/Cloudant
13. SHOULD YOU CARE?
The internet happened, and we ignored it.
In retrospect that was a mistake.
Paraphrased from Bill Warner (Founder Avid, Wildfire)
YCombinator Summer, 2008
CouchDB: An enabling technology
STS 3/10/2010 13 Mike Miller UW/Cloudant
14. FROM INTEREST TO ADOPTION
• 100+ production apps • Active
commercial
•3
development
books being written
• Rapidly maturing
• Vibrant, open
STS 3/10/2010 community 14 Mike Miller UW/Cloudant
15. LETS TAKE A LOOK
STS 3/10/2010 15 Mike Miller UW/Cloudant
16. DOCUMENTS
• Documents are JSON Objects
• Underscore-prefixed fields are reserved
• Documents can have binary attachments
• MVCC _rev deterministically generated from doc content
STS 3/10/2010 16 Mike Miller UW/Cloudant
17. REST API
• Create
PUT /mydb/mydocid
• Retrieve
GET /mydb/mydocid etag header
plug-n-play cache
• Update
PUT /mydb/mydocid
• Delete
DELETE /mydb/mydocid
STS 3/10/2010 17 Mike Miller UW/Cloudant
19. ROBUST
• Never overwrite previously committed data
• Documents stored in append, only b+trees, ‘copy-on-write’
• In the event of a server crash or power failure, just restart
CouchDB -- there is no “repair”
• Take snapshots with “cp”
• Configurable levels of durability: can choose to fsync after
every update, or less often to gain better throughput
STS 3/10/2010 19 Mike Miller UW/Cloudant
20. CONCURRENT
• Erlang approach: lightweight processes to model the natural
concurrency in a problem
• For CouchDB that means one process per TCP connection
• Lock-free architecture; each process works with an MVCC
snapshot of a DB.
• Performance degrades gracefully under heavy concurrent load
STS 3/10/2010 20 Mike Miller UW/Cloudant
21. CONCURRENCY IN ACTION: BBC
• WYSIWYG: graceful at scale: load, volume, concurrency
STS 3/10/2010 21 Mike Miller UW/Cloudant
22. VIEWS: B.Y.O. INDEX
• Custom, persistent representations of document data
• “Close to the metal” -- no dynamic queries in production, so
you know exactly what you’re getting
• Generated using MapReduce functions written in JavaScript
(and other languages)
• Each view must have a map function and may also have a
reduce function
• Leverages view collation, rich view query API
STS 3/10/2010 22 Mike Miller UW/Cloudant
23. VIEWS: INCREMENTAL
• Computing a view can be expensive, so CouchDB saves the
result in a B-tree and keeps it up-to-date
• Leaf nodes store map results, inner nodes store reductions of
children
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
STS 3/10/2010 23 Mike Miller UW/Cloudant
24. VIEWS: EXAMPLE
key value
STS 3/10/2010 24 Mike Miller UW/Cloudant
28. REPLICATION
• Peer-based, bi-directional replication using normal HTTP calls
• Mediated by a replicator process which can live on the
source, target, or somewhere else entirely
• Replicate a subset of documents in a DB meeting criteria
defined in a custom filter function (coming soon)
• Applications (_design documents) replicate along with the
data
• Ideal for offline applications -- “ground computing”
STS 3/10/2010 28 Mike Miller UW/Cloudant
29. REPLICATION
Source Replicator Target
What updates do you
have since X?
Which of these
GET /db/_changes?since=X are you missing?
POST /db/_missing_revs
Target needs these
id@rev documents
Save these documents
[GET /db/doc?open_revs=...]
POST /db/_bulk_docs
Record the latest source update seen by the target
STS 3/10/2010 29 Mike Miller UW/Cloudant
33. CONFLICT RESOLUTION
• Replication can introduce conflicts in a multi-master setup
• CouchDB deterministically chooses a winner; the loser is
saved as a conflict revision
• Conflict revisions are replicated; if you replicate A→B and
B→A, both will agree on the winning and losing revisions
STS 3/10/2010 33 Mike Miller UW/Cloudant
34. CONFLICT RESOLUTION
Case 1: save the same update to multiple replication partners
PUT /a/foo PUT /b/foo
replicate
No Conflict
STS 3/10/2010 34 Mike Miller UW/Cloudant
35. CONFLICT RESOLUTION
Case 2: save different updates to the same document
PUT /a/foo PUT /b/foo
replicate
Conflict
STS 3/10/2010 35 Mike Miller UW/Cloudant
36. CONFLICT RESOLUTION
replicate
No Conflict
STS 3/10/2010 36 Mike Miller UW/Cloudant
37. CONFLICT RESOLUTION
save different updates to the same document to replication
partners
STS 3/10/2010 37 Mike Miller UW/Cloudant
38. REPLICATION IN ACTION: UBUNTU 9.10
“When Ubuntu 9.10 is released on October 29th every single Ubuntu user will
have an address book stored in CouchDB that replicates with one.ubuntu.com,
and Tomboy notes that are replicated via a web API at the application but then
stored in CouchDB and carried along in the CouchDB replication that we have
set up. Optionally they can also store all their Firefox bookmarks in CouchDB
and have those replicated as well. We'll be doing our best to help teach
application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A
couch on every desktop!”
Elliot Murphy, 10/13/2009
replication is a potential game-changer
STS 3/10/2010 38 Mike Miller UW/Cloudant
39. STUFF I DIDN’T TALK ABOUT
• Server-side processing: validate, upsert, show, list
• Authentication using HTTP Basic, Cookie, OAuth
• couchapp: serve lightweight HTML+JavaScript web apps directly out of
CouchDB using _show and _list functions to transform JSON [1]
• _external indexers, including full-text search powered by CouchDB-Lucene [2]
• The many excellent client libraries, view servers, etc. contributed by the
community [3]
• SQLite wrapper for CouchDB [4], and much more...
[1]: http://github.com/couchapp/couchapp
[2]: http://github.com/rnewson/couchdb-lucene
[3]: http://wiki.apache.org/couchdb/Related_Projects
[4]: http://apsw.googlecode.com/svn/publish/couchdb.html
STS 3/10/2010 39 Mike Miller UW/Cloudant
40. TAKE IT FOR A SPIN
hosted free:
cloudant.com
easy offline:
CouchDBX
STS 3/10/2010 40 Mike Miller UW/Cloudant