2. OUTLINE
⢠Introduction and Overview
⢠CouchDB Basics
⢠Special Topics in Relaxation: Scaling CouchDB
⢠Use Cases In the Wild
⢠Takeaways
Windy City DB 2 June 26, 2010
3. HI
⢠Alan Hoffman
⢠@_hoffman
⢠alan@cloudant.com
⢠Experimental particle physicist
⢠Background: machine learning, big
data analysis, distributed systems
⢠Co-founder of Cloudant (Hosted
Couch)
⢠Not a committer, but...
Windy City DB 3 June 26, 2010
4. COUCH: THE BIG PICTURE
⢠Apache project
⢠Schema-free document database management system
⢠Robust, concurrent, fault-tolerant
⢠RESTful JSON API
⢠Custom persistent views using MapReduce
⢠Bi-directional incremental replication
⢠Futon web admin console
Windy City DB 4 June 26, 2010
5. WHO CARES?
The internet happened, and we ignored it.
In retrospect, that was a mistake.
-Bill Warner (Avid, WildďŹre, Techstars)
Summer, 2008
Disruptive technologies enable new business
Windy City DB 5 June 26, 2010
6. DOCUMENTS
Primary Key
MVCC
&
Insta-cache
Nested Structures
⢠Reserved ďŹelds are preďŹxed with an
underscore
⢠MVCC _rev deterministically generated
from doc content
Binary Attachments ⢠Binary attachments
Windy City DB 6 June 26, 2010
7. RESTFUL API
⢠Create
PUT /mydb/mydocid
⢠Retrieve
GET /mydb/mydocid âBuilt of the Web
Completely embraces... HTTPâ
⢠Update
PUT /mydb/mydocid
-Jacob Kaplan-Moss
⢠Delete October 2007
DELETE /mydb/mydocid
GET /mydb/_all_docs?include_docs=true
http://wiki.apache.org/couchdb/Reference
Windy City DB 7 June 26, 2010
8. VIEWS
value
ap du ce
m re
key
⢠Docs can be indexed by any attribute using views. Custom, persistent
representations of the data.
⢠Each view must have a map function and may also have a reduce function
⢠View indices are stored in B-trees for efďŹcient lookup by map key
⢠Stored in special documents called _design documents
Windy City DB 8 June 26, 2010
9. INCREMENTAL
⢠Computing a view can be expensive, so CouchDB saves
the result in a B-tree and keeps it up-to-date
⢠Only new docs or changed docs get âre-indexedâ
⢠Leaf nodes store map results, inner nodes store reductions
of children
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
Windy City DB 9 June 26, 2010
10. ROBUST
⢠Never overwrite previously committed
data
⢠Append only b+trees, âcopy-on-writeâ
⢠Server crash, power failure? just restart
CouchDB -- there is no ârepairâ
⢠Take snapshots with âcpâ
J.C. Anderson
⢠ACID at the single document level
Windy City DB 10 June 26, 2010
11. REPLICATION
source target
progress
The beauty of MVCC one click
CouchDB => âCloud readyâ
Windy City DB 11 June 26, 2010
12. REPLICATION
⢠Peer-based, bi-directional replication using normal HTTP
⢠Mediated by a replicator process which can
live on the source, target, or somewhere else
entirely
⢠Replicate a subset of documents in a DB
meeting criteria deďŹned in a custom ďŹlter
function
⢠Applications (_design documents) replicate
along with the data
⢠Ideal for ofďŹine applications: âground
computingâ
Windy City DB 12 June 26, 2010
13. FILTERED REPLICATION
Write the ďŹlter function
Embed it in a design
doc
Specify in the
replication request
Windy City DB 13 June 26, 2010
14. MULTI-COUCH SETUPS
Master-Slave Robust Multi-Master
Master-Master
Windy City DB 14 June 26, 2010
15. CONFLICTS
PUT /a/foo PUT /b/foo
replicate
ConďŹict
⢠Replication can introduce conďŹicts in a multi-master setup
⢠CouchDB deterministically chooses a winner but the loser is saved with
the document as a conďŹicting rev
⢠ConďŹicting revs are replicated; both source and target will agree on
winning and losing revs
⢠Compacting the DB removes all losing revs
Windy City DB 15 June 26, 2010
16. BUILDING A BIG COUCH
D oesnât
Why CouchDB ^ Doesnât Scale
Windy City DB 16 June 26, 2010
17. WHAT WE TALK ABOUT WHEN WE
TALK ABOUT SCALING
⢠Horizontal scaling: more servers creates more capacity
⢠Transparent to the application: adding more capacity should not affect
the business logic of the application.
⢠No single point of failure. Physics Joke!
Pseudo Scalars
http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/
Windy City DB 17 June 26, 2010
18. COUCHDB LOUNGE
⢠Proxy-based partitioning and clustering PUT/GET
application
⢠Designed originally for use at Meebo Dumbproxy
(nginx)
⢠Uses consistent hashing to partition docs
across nodes
⢠Dumbproxy - nginx module that handles
simple GETs and PUTs
⢠Smartproxy - A twisted/python daemon
that handles view requests Smartproxy
⢠Want to know more? R. Leeds (tilgovi)
http://tilgovi.github.com/couchdb-lounge/
GET /_deisgn/...
Windy City DB 18 June 26, 2010
19. OPEN CLOUDANT
⢠Clustering in a ring (a la Dynamo)
PUT http://alan.cloudant.com/dbname/blah?w=2 ⢠Any node can handle a request
⢠O(1) lookup
N=3
Load Balancer
⢠Quorum system (N, R, W)
W=2
R=2
⢠Views distributed like documents
24
Node 1
No
⢠Distributed erlang
de A B C D de
No B 2
Y
Z
A
C
D
⢠Masterless
X hash(blah) = E E
C N
od â Horiziontally Scalable
e
D 3
E â No SPOF
F
â Transparent to the
D
application
No
E
de
4
F
Coming soon to a
G
github near you!
Windy City DB 19 June 26, 2010
20. IN THE WILD
⢠15+ million deployments
⢠Activecommercial support
⢠3 books
⢠1.0 imminent
⢠Vibrant, open community
Windy City DB 20 June 26, 2010
21. CASE #1: REALTIME ANALYTICS
⢠Analytics on high-rate advertising data
⢠ETL analysis workďŹow too slow for their customers (24 hr cycle)
⢠Needed a realtime solution
⢠Complicated SQL stored procedures for social graph analysis
required 40+ postgres tables
⢠Replaced it all with a single CouchDB document type and two
views:
⢠group level collation to bin data at multiple granularities => customers get
updated results in seconds, not hours
⢠single view (30 lines of JS) for graph analysis.
Windy City DB 21 June 26, 2010
22. MONEY QUOTE
Migrating to CouchDB really opened a lot of doors
for us product-wise. The time delay between data
arriving in our systems and becoming available to our
customers went from 24 hours to less than 30 min - on
similar hardware - even while we greatly increased the
level of granularity that our processing provided
Windy City DB 22 June 26, 2010
23. CASE #2: EASYBIB
⢠Online bibliography service, ~10 years old, initially built on MySQL
(and Coldfusion)
⢠Had suffered through many migrations
⢠Choice: massive sharding and replication of MySQL v. âanother
optionâ
⢠Why Couch:
⢠Schema Free (replacing 40 - 50 tables with 3 DBs)
⢠Easily scalable
⢠Strong community support
âIn your best Borat voice: âGreat Success!ââ
Windy City DB 23 June 26, 2010
24. CASE #3: MEEBO
⢠âAll
your friends and networks, from wherever you are.â
⢠Why Couch?
⢠No Schema (and ergo, no schema migrations)
⢠Replication
⢠Could deal with queries that would break on a sharded RDBMS
⢠REST interface -- easy to re-use existing tools and libraries
⢠Easy to write a proxy layer that keeps sharding out of the app
logic
⢠Wishes? Speed, API stability, native clustering
Windy City DB 24 June 26, 2010
25. PARAPHRASING THE MASSES
⢠Why CouchDB?
⢠Simple, robust, concurrent, fun
⢠successful in production
⢠Why Not Couch?
⢠Missing Features
⢠ad hoc queries
⢠authz/authn
⢠doesnât scale
⢠Too New -- api still changing, still alpha
⢠âToo Slowâ
Windy City DB 25 June 26, 2010
26. PARAPHRASING THE MASSES
⢠Why CouchDB?
⢠Simple, robust, concurrent, fun, scalable, powerful
⢠successful in production, active community, industry adoption
⢠Why Not Couch?
⢠Missing Features
⢠ad hoc queries
⢠authz/authn
⢠doesnât scale
⢠Too New -- api still changing, still alpha
⢠âToo Slowâ
Windy City DB 25 June 26, 2010
27. PARAPHRASING THE MASSES
⢠Why CouchDB?
⢠Simple, robust, concurrent, fun, scalable, powerful
⢠successful in production, active community, industry adoption
⢠Why Not Couch?
⢠Missing Features
⢠ad hoc queries True, by design
⢠authz/authn Included in 0.11
⢠doesnât scale Lounge, Pillow, Open Cloudant, etc
⢠Too New -- api still changing, still alpha
⢠âToo Slowâ 0.11 Feature freeze and 1.0 imminent
Perhaps, but...
Windy City DB 25 June 26, 2010
28. DESERVING OF MORE TIME
⢠CouchApp: HTML+JS framework for building
lightweight, portable apps and serving them directly
from CouchDB
⢠http://github.com/couchapp/couchapp/
⢠External indexers like CouchDB-Lucene
⢠http://github.com/rnewson/couchdb-lucene
⢠The plethora of client libraries and tools...
Windy City DB 26 June 26, 2010
29. TRY IT OUT
Hosted Free:
Cloudant.com
Easy OfďŹine:
CouchDBX
Windy City DB 27 June 26, 2010
30. THANK YOU
⢠Books
⢠CouchDB: The DeďŹnitive Guide. J. Chris Anderson, Jan
Lehnardt, Noah Slater
⢠Beginning CouchDB. Joe Lennon
⢠Web
⢠http://wiki.apache.org/couchdb/
⢠http://planet.couchdb.org/
⢠IRC
⢠Freenode #couchdb
⢠Freenode #cloudant
Windy City DB 28 June 26, 2010
32. AUTHZ/AUTHN
⢠Remember, Couch acts like a web service
⢠Authentication:
⢠0.11+ ships with support for OAuth, cookie, and basic
⢠Handlers speciďŹed in a conďŹg ďŹle
⢠Users deďŹned in authentication database (â_usersâ by default)
⢠Authorization
⢠3 levels: DB reader, DB admin, Server Admin
⢠Per DB roles deďŹned in security document
Windy City DB 30 June 26, 2010
33. EXAMPLES
User Document
Security Document
Caution!
Do not leave arrays blank
http://wiki.apache.org/couchdb/
Security_Features_Overview
Windy City DB 31 June 26, 2010
34. DRAWBACKS
⢠âFuton -- difďŹcult to use for installations that have a lot
of DBs (1000+)â
⢠âTools for managing design docs are deďŹcientâ
⢠âClient
libraries too focused on Couch as the âMâ in
MVC apps.â
⢠âCouch 1.0 is a moving targetâ
Windy City DB 32 June 26, 2010