2. Karel Minařík
→ Independent web designer and developer
→ Ruby, Rails, Git and CouchDB propagandista in .cz
→ Previously: Flash Developer; Art Director; Information Architect;… (see LinkedIn)
→ @karmiq at Twitter
→ karmi.cz
CouchDB — A Database for the Web
5. Apache CouchDB is a distributed, fault-tolerant
and schema-free document-oriented database
accessible via a RESTful HTTP/JSON API.
http://wiki.apache.org/couchdb
CouchDB — A Database for the Web
6. ibu ted
D is t r
Apache CouchDB is a distributed, a-Free
em
fault-tolerant
e nt Sch
and cum d
Do nte
schema-free document-oriented database
rie
accessible via a RESTful ful JSO API.
O
T HTTP/JSON
ES N
R
http://wiki.apache.org/couchdb
CouchDB — A Database for the Web
7. Talk Outline
➡ The NoSQL Moniker
➡ The CouchDB Story
➡ Schema-free Document Storage
➡ HTTP From Top to Bottom
➡ RESTful API
➡ Querying With Map/reduce
➡ Fault-tolerant, Distributed, Highly-available and Concurrent
➡ Demo: Example Application (Address Book)
CouchDB — A Database for the Web
10. NOSQL
Reasons for NoSQL
NoSQL is neither a “protest movement”
nor “trendy bullshit”.
Reasons for developing new
databases are real.
Most stem from some real pain.
CouchDB — A Database for the Web
11. NOSQL
Database denormalization at Digg
SELECT `digdate`, `id` FROM `Diggs`
WHERE `userid` IN (1, 2, 3, 4, ... 1000000)
AND itemid = 123 ORDER BY `digdate` DESC, `id` DESC;
“A full query can actually clock in at 1.5kb, which is many times
larger than the actual data we want. With a cold cache, this query
can take 14 seconds to execute.”
“Non-relational data stores reverse this model completely, because they don’t have the
complex read operations of SQL. The model forces you to shift your computation to the
writes, while reducing most reads to simple operations – the equivalent of SELECT *
FROM `Table`.“
http://about.digg.com/blog/looking-future-cassandra
CouchDB — A Database for the Web
12. NOSQL
Redis: Big O Notation Built–In
redis> rpush mylist 1
redis> rpush mylist 2
redis> lpop mylist
"1"
...
redis> llen mylist
(integer) 1000000
redis> lpop mylist
"2"
$ redis-benchmark
...
====== LPOP ======
10025 requests completed in 0.53 seconds
...
93.43% <= 3 milliseconds
CouchDB — A Database for the Web
13. NOSQL
Use Case: Job Queue
RPUSH
}
LPOP
O(1) Millions of items
http://github.com/defunkt/resque/blob/master/lib/resque.rb#L133-138
CouchDB — A Database for the Web
14. 1 The CouchDB Story
CouchDB — A Database for the Web
15. THE COUCHDB STORY
Damien Katz: CouchDB and Me
Damien Katz
(RubyFringe 2008)
http://www.infoq.com/presentations/katz-couchdb-and-me CouchDB — A Database for the Web
16. THE COUCHDB STORY
Damien Katz: CouchDB and Me
In the beginning, there was C++, XML and
custom query language.
Stuff nobody ever got fired for.
Then came Erlang, HTTP, JSON and map/reduce.
CouchDB — A Database for the Web
17. 2 Schema–free Documents
CouchDB — A Database for the Web
18. SCHEMA-FREE STORAGE
“Relational Data”
OH: “The world is relational!!!”
17 minutes ago via Tweetie for Mac
Retweeted by 10000 people
CouchDB — A Database for the Web
19. That does not mean the world conforms
to the third normal form.
CouchDB — A Database for the Web
20. In fact, it’s rather the exact opposite.
CouchDB — A Database for the Web
21. SCHEMA–FREE DOCUMENTS
The Textbook Example
Design a customer database.
People have names, e-mail, phone numbers, …
How many phone numbers?
CouchDB — A Database for the Web
22. SCHEMA–FREE DOCUMENTS
The Textbook Example
Relational Databases 101
Customers
id INTEGER A N P
first_name VARCHAR
last_name VARCHAR
phone VARCHAR
Now. What about multiple phone numbers?
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values CouchDB — A Database for the Web
23. SCHEMA–FREE DOCUMENTS
The Textbook Example
The “solution”, Pt. 1
Customers
id INTEGER A N P
first_name VARCHAR
last_name VARCHAR
phone VARCHAR
“We will use the database only from the application, anyway.”
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values CouchDB — A Database for the Web
24. SCHEMA–FREE DOCUMENTS
The Textbook Example
The “solution”, Pt. 2
Customers
id INTEGER A N P
first_name VARCHAR
last_name VARCHAR
phone_1 VARCHAR
phone_2 VARCHAR
phone_3 VARCHAR
“This is clearly better design!”
Alright. Then, please answer these questions:
• How do you search for a customers given a phone number?
• Which customers have the same phone number?
• How many phone numbers a customer has?
Then, please add the ability to store four phone numbers. Thanks.
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values CouchDB — A Database for the Web
25. SCHEMA–FREE DOCUMENTS
The Textbook Example
The Right Solution
Customers CustomerPhones
id INTEGER U A N P customer_id INTEGER i N F
first_name VARCHAR phone VARCHAR N
last_name VARCHAR
http://en.wikipedia.org/wiki/First_normal_form#Domains_and_values CouchDB — A Database for the Web
26. SCHEMA–FREE DOCUMENTS
The Textbook Example
mysql> SELECT * FROM Customers LEFT JOIN CustomerPhones
ON Customers.id = CustomerPhones.customer_id;
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐+
| id | first_name | last_name | customer_id | phone |
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐+
| 1 | John | Smith | 1 | 123 |
| 1 | John | Smith | 1 | 456 |
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐+
CouchDB — A Database for the Web
27. SCHEMA–FREE DOCUMENTS
The Textbook Example
mysql> SELECT * FROM Customers WHERE id = 1;
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
| id | first_name | last_name |
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
| 1 | John | Smith |
+‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐‐+‐‐‐‐‐‐‐‐‐‐‐+
mysql> SELECT phone FROM CustomerPhones WHERE customer_id IN (1);
+‐‐‐‐‐‐‐+
| phone |
+‐‐‐‐‐‐‐+
| 123 |
| 456 |
+‐‐‐‐‐‐‐+
CouchDB — A Database for the Web
28. SCHEMA–FREE DOCUMENTS
Structured data
But, damn!, I want something like this:
{
"id" : 1,
"first_name" : "Clara",
"last_name" : "Rice",
"phones" : ["0000 777 888 999", "0000 123 456 789", "0000 314 181 116"]
}
“No problem, you just iterate over the rows and build your object. That’s the way it is!”
“If this would be too painful, we will put some cache there.”
CouchDB — A Database for the Web
29. SCHEMA–FREE DOCUMENTS
Ephemeral data
Not everything needs to be done “right”. Right?
class User < ActiveRecord::Base
serialize :preferences
end
CouchDB — A Database for the Web
30. SCHEMA–FREE DOCUMENTS
“Consistency”
Does the “Right Way“ sometimes fail?
Hell yeah.
EXAMPLE
When designing an invoicing application, you store the
customer for the invoice the “right way”, via foreign keys.
Then, the customer address changes.
Did the address on the invoice also changed?
CouchDB — A Database for the Web
31. SCHEMA–FREE DOCUMENTS
Documents in the Real World
12
3E
Ph VE
on R
Fa e:
gs
Sk
x: 55 YW
5. H
hin
yp 55 ER
e: 444 5 E
cit .44 .55
cityr
A
nt
00 y.r 4. 55 VEN
000 ea 4
St UE
lit 444
g eali
y,
si
Cit y.l
td
de123
Fac
t ory
Str
ee t
CI
TY
,S
ty, lt
T
00
00
d.
0
UNDERGROUND RECORDS
nd re co rd s.c om
in fo @u nd er gr ou
F: 555.555.5555
ITE 000 ANGELES, U.S.A/
P: 555.555.5555123 BOULEVARDAVE, SU LOS m
c o r d s .c o
M: 777.777.7777 roundre
www .u n d e r g
M: 888.000.1111
http://guide.couchdb.org/draft/why.html#better CouchDB — A Database for the Web
32. SCHEMA–FREE DOCUMENTS
Documents in the Real World
{
SON
J
"_id" : "clara-rice",
"_rev" : "1-def456",
"first_name" : "Clara",
"last_name" : "Rice",
"phones" : {
"mobile" : "0000 777 888 999"
"home" : "0000 123 456 789",
"work" : "0000 314 181 116"
},
"addresses" : {
"home" : {
"street" : "Wintheiser Ports",
"number" : "789/23",
"city" : "Erinshire",
"country" : "United Kingdom"
},
},
"occupation" : "model",
"birthday" : "1970/05/01",
"groups" : ["friends", "models"],
"created_at" : "2010/01/01 10:00:00 +0000"
}
CouchDB — A Database for the Web
37. HTTP
Built “Of the Web”
Django may be built for the Web,
but CouchDB is built of the Web.
I’ve never seen soware that so completely
embraces the philosophies behind HTTP.
Jacob Kaplan-Moss, Of the Web (2007)
http://jacobian.org/writing/of-the-web/ CouchDB — A Database for the Web
38. HTTP
Built “Of the Web”
CouchDB makes Django look old-school in the
same way that Django makes ASP look
outdated.
http://jacobian.org/writing/of-the-web/ CouchDB — A Database for the Web
39. HTTP
Built “Of the Web”
HTTP is the lingua anca of our age; if you
speak HTTP, it opens up all sorts of doors.
ere’s something almost subversive about
CouchDB; it’s completely language-, platform-,
and OS-agnostic.
http://jacobian.org/writing/of-the-web/ CouchDB — A Database for the Web
43. HTTP
HTTP from Top to Bottom
$ curl ‐X POST http://localhost:5984/_replicate
‐d '{"source":"database",
"target":"http://example.org/database"}'
CouchDB — A Database for the Web
44. HTTP
Making Real Use of HTTP
$ curl ‐i ‐X GET $HOST/my‐database/abc123
HTTP/1.1 200 OK
Server: CouchDB/1.0.1 (Erlang OTP/R14B)
Etag: "4‐f04f2435e031054d6b5298c5841ae052"
Date: Thu, 23 Sep 2010 12:56:37 GMT
Content‐Type: text/plain;charset=utf‐8
Content‐Length: 73
Cache‐Control: must‐revalidate
{"_id":"abc123","_rev":"4‐f04f2435e031054d6b5298c5841ae052","foo":"bar"}
CouchDB — A Database for the Web
46. HTTP
What is “RESTful”?
REST is a set of principles that define how Web standards, such as HTTP and
URIs, are supposed to be used. (...) In summary, the five key principles are:
➡ Give every “thing” an ID
➡ Link things together
➡ Use standard methods
➡ Resources with multiple representations
➡ Communicate statelessly
Stefan Tilkov, A Brief Introduction to REST
http://www.infoq.com/articles/rest-introduction CouchDB — A Database for the Web
47. HTTP
What is “RESTful”?
The basic idea is even more simple, though.
HTTP is not just a “transfer protocol”.
It is the interface for interacting with “things” itself.
CouchDB — A Database for the Web
48. 4 Fault-Tolerant and Concurrent
CouchDB — A Database for the Web
49. CouchDB has no off switch.
CouchDB has no repair command.
$ kill ‐9 <PID>
CouchDB — A Database for the Web
50. FAULT–TOLERANT
Erlang
Erlang!
http://www.youtube.com/watch?v=uKfKtXYLG78 CouchDB — A Database for the Web
51. FAULT–TOLERANT
Erlang
Erlang's main strength is support for concurrency. It has a small but powerful
set of primitives to create processes and communicate among them.
(…) a benchmark with 20 million processes has been successfully performed.
http://en.wikipedia.org/wiki/Erlang_(programming_language) CouchDB — A Database for the Web
55. MAP/REDUCE
The Concept
module Enumerable
alias :reduce :inject unless method_defined? :reduce
end
(1..3).map { |number| number * 2 }
# => [2, 4, 6]
(1..3).reduce(0) { |sum, number| sum += number}
# => 6
CouchDB — A Database for the Web
56. MAP/REDUCE
The Simplest View
function(doc) {
if (doc.last_name && doc.first_name) {
emit( doc.last_name + ' ' + doc.first_name, doc )
}
}
CouchDB — A Database for the Web
57. MAP/REDUCE
The Simplest View
INPUT
function(doc) {
if (doc.last_name && doc.first_name) {
emit( doc.last_name + ' ' + doc.first_name, doc )
}
}
OUTPUT KEY VALUE
CouchDB — A Database for the Web
58. MAP/REDUCE
The Result of Map
Key Value
_id: "lottie‐armstrong",
_rev: "2‐fcb71b26096957b3ff3ffd2970f3c933",
addresses: {
home: {
city: "Murphyville"
...
"Armstrong Lottie"
}
},
first_name: "Lottie",
last_name: "Armstrong",
occupation: "programmer",
_id: "kaelyn‐bailey",
_rev: "1‐2e25e6c9448520fa796988894423a23b",
addresses: {
home: {
city: "Lake Dedric"
"Bailey Kaelyn" ...
}
},
first_name: "Kaelyn",
last_name: "Bailey",
occupation: "supermodel"
... ...
CouchDB — A Database for the Web
61. MAP/REDUCE
Result of Even Simpler View
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation CouchDB — A Database for the Web
62. MAP/REDUCE
A Simple Reduce
function(keys, values) {
return sum(values)
}
CouchDB — A Database for the Web
63. MAP/REDUCE
Result of a Simple Reduce
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_occupation CouchDB — A Database for the Web
64. MAP/REDUCE
Built–In Erlang Reduce functions
$ couchdb
Apache CouchDB has started. Time to relax.
_count
_sum
_stats
http://wiki.apache.org/couchdb/Built-In_Reduce_Functions#Available_Build-In_Functions CouchDB — A Database for the Web
65. MAP/REDUCE
Map/Reduce for Counting “tag-like stuff”
function(doc) {
for (group in doc.groups) {
emit(doc.groups[group], 1)
}
}
_count
CouchDB — A Database for the Web
66. MAP/REDUCE
Result of the Map phase
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups CouchDB — A Database for the Web
67. MAP/REDUCE
Result of the Reduce Phase
http://localhost:5984/_utils/database.html?addressbook/_design/person/_view/by_groups CouchDB — A Database for the Web
72. QUERYING VIEWS
Parameters for querying views
key
startkey
startkey_docid
endkey
endkey_docid
limit
stale
descending
skip
group
group_level
reduce
include_docs
CouchDB — A Database for the Web
74. QUERYING VIEWS
A Complex Map/Reduce
SELECT
COUNT(*) AS count,
DATE_FORMAT(published_at, "%Y/%m/%d") AS date,
keywords.value AS keyword
FROM feed_entries
INNER JOIN feeds ON feed_entries.feed_id = feeds.id
INNER JOIN keywords ON feeds.keyword_id = keywords.id
WHERE DATE_SUB(CURDATE(), INTERVAL 90 DAY) <= feed_entries.published_at
GROUP BY date, keyword
ORDER BY date, keyword ASC;
CouchDB — A Database for the Web
75. QUERYING VIEWS
A Complex Map/Reduce
But. We don’t need a table. We need the data in a format like this:
Streamgraph.load_data({
max : 170,
keywords : ['ruby', 'python', 'erlang', 'javascript', 'haskell'],
values : [
{ date: '2010/01/01', ruby: 50, python: 20, erlang: 5, javascript: 30, haskell: 50 },
{ date: '2010/02/01', ruby: 20, python: 20, erlang: 2, javascript: 40, haskell: 43 },
{ date: '2010/03/01', ruby: 70, python: 20, erlang: 10, javascript: 80, haskell: 15 },
{ date: '2010/04/01', ruby: 20, python: 40, erlang: 8, javascript: 30, haskell: 12 },
{ date: '2010/05/01', ruby: 150, python: 30, erlang: 12, javascript: 40, haskell: 18 },
{ date: '2010/06/01', ruby: 30, python: 10, erlang: 14, javascript: 170, haskell: 14 }
]
});
CouchDB — A Database for the Web
76. QUERYING VIEWS
The Map Phase
function(doc) {
var fix_date = function(junk) {
var formatted = junk.toString().replace(/‐/g,"/").replace("T"," ").substring(0,19);
return new Date(formatted);
};
// Format integers to have at least two digits.
var f = function(n) { return n < 10 ? '0' + n : n; }
// This is a format that collates in order and tends to work with
// JavaScript's new Date(string) date parsing capabilities, unlike rfc3339.
Date.prototype.toJSON = function() {
return this.getUTCFullYear() + '/' +
f(this.getUTCMonth() + 1) + '/' +
f(this.getUTCDate()) + ' ' +
f(this.getUTCHours()) + ':' +
f(this.getUTCMinutes()) + ':' +
f(this.getUTCSeconds()) + ' +0000';
};
if (doc['couchrest‐type'] == 'Mention') {
for ( keyword in doc.keywords ) {
var key = fix_date(doc.published_at).toJSON().substring(0,10);
var value = {};
value[ doc.keywords[keyword] ] = 1;
emit( key, value);
}
}
}
CouchDB — A Database for the Web
77. QUERYING VIEWS
The Reduce Phase
function(keys, values, rereduce) {
if (rereduce) {
var result = {}
for ( item in values ) {
for (prop in values[item]) {
if ( result[prop] ) { result[prop] += values[item][prop] }
else { result[prop] = values[item][prop] }
}
}
return result;
}
else {
// Prepare the data for the re‐reduce
var date = keys[0][0];
var result = {}
for (value in values) {
var item = values[value];
for (prop in item) {
if ( result[prop] ) { result[prop] += item[prop] }
else { result[prop] = item[prop] }
}
}
return result;
}
}
CouchDB — A Database for the Web
80. QUERYING VIEWS
Complex Queries
So… What if you need something like:
Show me all supermodels who live in Beckerborough.
Out of luck?
CouchDB — A Database for the Web
81. COMPLEX QUERIES
CouchDB–Lucene
This guy knows.
Show me all supermodels who live in Beckerborough.
CouchDB — A Database for the Web
82. COMPLEX QUERIES
foo AND bar
Couchdb-Lucene.
When you need foo AND bar.
http://github.com/rnewson/couchdb-lucene CouchDB — A Database for the Web
83. COUCHDB-LUCENE
Indexing function
function(doc) {
var result = new Document();
if (doc.occupation) {
result.add(doc.occupation, {"field":"occupation"})
}
if (doc.addresses) {
for (address in doc.addresses) {
result.add(doc.addresses[address].city, {"field":"city"})
}
}
return result;
}
http://localhost:5984/addressbook/_fti/_design/person/search?q=occupation:supermodel AND city:Beckerborough
CouchDB — A Database for the Web
84. 6 Distributed
CouchDB — A Database for the Web
88. DISTRIBUTED
Simple Clustering With HTTP Reverse Proxies
http://ephemera.karmi.cz/post/247255194/simple-couchdb-multi-master-clustering-via-nginx CouchDB — A Database for the Web
94. DISTRIBUTED
Resources
➡ http://guide.couchdb.org
➡ https://nosqleast.com/2009/#speaker/miller
➡ http://www.couchone.com/migrating-to-couchdb
➡ http://wiki.apache.org/couchdb/
➡ http://blog.couchone.com/
➡ http://stackoverflow.com/tags/couchdb/
CouchDB — A Database for the Web
95. 8 Demo: Example Application
CouchDB — A Database for the Web
96. DEMO
Application
SOURCE CODE: http://github.com/karmi/couchdb-showcase
http://karmi.couchone.com/addressbook/_design/person/_list/all/all CouchDB — A Database for the Web