This document provides an overview of CouchDB, a document-oriented database:
- CouchDB stores data as JSON documents with dynamic schemas, supports CRUD operations via a RESTful HTTP API, and uses MapReduce functions for data processing and querying.
- Documents can be queried and sorted using views defined with MapReduce, allowing flexible data analysis.
- CouchDB is designed for high availability and scalability, using an incremental MapReduce approach and eventual consistency across nodes through asynchronous replication.
8. Document-oriented Databases
Comparable to documents in the real world
•
Records are stored as schema-less documents
•
Each document is uniquely named
■
Documents are the primary unit of storage
■
Structures are not explicitly defined
•
No tables with uniform, pre-defined fields
■
Every document can have varying fields of different types
■
Documents are self contained
•
Data is not decomposed into tables with relations
■
Documents contain the context needed to understand them
■
9. Document-oriented Databases
Examples
•
Lotus Notes
■
Amazon SimpleDB
■
CouchDB
■
Key-Value Stores
•
Amazon S3
■
Dynamo: Amazon’s Highly Available Key-value Store, DeCandia, et al., 2007
■
Facebook Cassandra
■
Recently accepted as an Apache incubation project
■
Google BigTable
■
Bigtable: A Distributed Storage System for Structured Data, Chang, et al.,
■
2006
11. Document database server
REST API
What is CouchDB? JSON documents
Views with MapReduce
Highly Scalable
12. Document Database Server
Implemented in Erlang
•
Ericsson Language
■
Highly concurrent, functional programming language
■
Designed with modern web applications in mind
•
Atomic Consistent Isolated Durable (ACID)
•
“Crash-only” design
•
Supports external handlers
•
Change notification
■
Custom processing
■
•
13. REST HTTP API
Representational State Transfer
•
A set of principles about how resources are defined and addressed
■
World Wide Web (HTTP) is RESTful
•
Uniform interface for accessing resources
■
Resources identified by URI
■
Actions transmitted in HTTP methods
■
Status communicated in status codes
■
14. REST HTTP API
CRUD
Create, Read, Update, and Delete
•
• In HTTP
POST /some/resource/id
■
GET /some/resource/id
■
PUT /some/resource/id
■
DELETE /some/resource/id
■
15. JSON Documents
JavaScript Object Notation
•
Considered language-independent
■
CouchDB stored XML documents before version 0.8
•
Suitable if content is already in XML
■
Human readable, but can be onerous to type
■
Markup language, requires transformation from/to data structures
■
Represents primitive data types and structures
•
Strings, numbers, booleans
■
Arrays, dictionaries
■
Null
■
Documents can have attachments
•
16. JSON Documents
Example
{
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
17. JSON Documents
Example
{
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
18. JSON Documents
Example
{
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
19. JSON Documents
Example
{
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
21. Views
Used to sort and filter through data
•
Lazily evaluated, highly efficient
•
Similar to indexing in relational databases
■
Defined in design documents
•
Documents named _design/…
■
Consist of map and reduce functions
•
Language independent
■
JavaScript supported by default
■
Mozilla Spidermonkey included
■
22. Data Processing with MapReduce
Programming model for processing and generating large data sets
•
Related, but not equivalent to map and reduce operations in
•
functional languages
Take and produce key/value pairs with map and reduce functions
•
Map functions
•
Take input key/value pairs and produce an intermediate set of key/value pairs
■
Reduce functions
•
Take intermediate key and set of values for the key, and merges them into a
■
possibly smaller set of values
MapReduce: Simplified Data Processing on Large Clusters
•
Jeff Dean, Sanjay Ghemawat, Google Inc.
23. Data Processing with MapReduce
Example
{
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
24. Data Processing with MapReduce
Example
“post1” = {
_id: “post1”,
_rev: “123456”,
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”,
is_published: true
}
25. Data Processing with MapReduce
Example
“post1” = {
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”
}
26. Data Processing with MapReduce
Emit Posts by post_date
“post1” = {
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”
}
1239910768 = {
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”
}
27. Data Processing with MapReduce
Emit Posts by post_date
1208456184 {title: “A bloody long time ago”, …}
1215421546 {title: “A blue moon ago”, …}
1222654641 {title: “Just Yesterday”, …}
1239910768 {title: “A Blog Post”, …}
1246816518 {title: “That was Then”, …}
1251687980 {title: “This is Now”, …}
1264836981 {title: “When Will Then Be Now?”, …}
28. Data Processing with MapReduce
Emit Posts by tag
“post1” = {
title: “A Blog Post”,
tags: [“blue”, “glue”],
post_date: 1239910768,
body: “Once upon a time…”
}
“blue” = { title: “A Blog Post”, … }
“glue” = { title: “A Blog Post”, … }
29. Data Processing with MapReduce
Emit Posts by tag
blue {title: “Just Yesterday”, …}
blue {title: “A Blog Post”, …}
clue {title: “Just Yesterday”, …}
flue {title: “When Will Then Be Now?”, …}
flue {title: “This is Now”, …}
glue {title: “A Blog Post”, …}
wazoo {title: “That was Then”, …}
30. Data Processing with MapReduce
Emit Posts by tag, Reduced
{title: “Just Yesterday”, …},
blue
{title: “A Blog Post”, …}
clue {title: “Just Yesterday”, …}
{title: “When Will Then Be Now?”, …},
flue
{title: “This is Now”, …}
glue {title: “A Blog Post”, …}
wazoo {title: “That was Then”, …}
31. Scalability
Incremental MapReduce
•
Multiversion Concurrency Control (MVCC)
•
Achieves serializability through multiversioning instead of locking
■
Eliminates waits to access objects
■
Updates create new documents
■
Tradeoff point: no waits, increased data storage
■
Incremental Distributed Replication
•
Eventual Consistency
•
Changes eventually propagate through distributed systems
■
Tradeoff point: increase availability and tolerancy, decreased freshness
■