5. Key-Value Stores
• A mapping from a key to a value
• The store doesn't know anything about the the key
or value
• The store doesn't know anything about the insides
of the value
• Operations
• Set, get, or delete a key-value pair
6. Column-Oriented Stores
• Like a relational store, but flipped around: all data
for a column is kept together
• An index provides a means to get a column value for a
record
• Operations:
• Get, insert, delete records; updating fields
• Streaming column data in and out of Hadoop
7. Graph Databases
• Stores vertex-to-vertex edges
• Operations:
• Getting and setting edges
• Sometimes possible to annotate vertices or edges
• Query languages support finding paths between
vertices, subject to various constraints
8. Document Stores
• The store is a container for documents
• Documents are made up of named fields
• Fields may or may not have type definitions
• e.g. XSDs for XML stores, vs. schema-less JSON stores
• Can create "secondary indexes"
• These provide the ability to query on any document field(s)
• Operations:
• Insert and delete documents
• Update fields within documents
9. What is mongoDB?
MongoDB is a scalable, high-performance,
open source NoSQL database.
• Document-oriented storage
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce
• GridFS
10. • Company behind mongoDB
– (A)GPL license, own copyrights, engineering team
– support, consulting, commercial license
• Management
– Google/DoubleClick, Oracle, Apple, NetApp
– Funding: Sequoia, Union Square, Flybridge
– Offices in NYC, Palo Alto, London, Dublin
– 100+ employees
11. Where can you use it?
MongoDB is Implemented in C++
• Platforms 32/64 bit Windows, Linux, Mac OS-X,
FreeBSD, Solaris
Drivers are available in many languages
10gen supported
• C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript,
Perl, PHP, Python, Ruby, Scala, Node.JS
Community supported
• Clojure, ColdFusion, F#, Go, Groovy, Lua, R ...
http://www.mongodb.org/display/DOCS/Drivers
12. History
• First release – February 2009
• v1.0 - August 2009
• v1.2 - December 2009 – MapReduce, ++
• v1.4 - March 2010 – Concurrency, Geo
• V1.6 - August 2010 – Sharding, Replica Sets
• V1.8 – March 2011 – Journaling, Geosphere
• V2.0 - Sep 2011 – V1 Indexes, Concurrency
• V2.2 - Soon - Aggregation, Concurrency
14. Documents
Blog Post Document
> p = { author: "Chris",
date: new ISODate(),
text: "About MongoDB...",
tags: ["tech", "databases"]}
> db.posts.save(p)
15. Querying
> db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date : ISODate("2012-02-02T11:52:27.442Z"),
text : "About MongoDB...",
tags : [ "tech", "databases" ] }
Notes:
_id is unique, but can be anything you'd like
16. Introducing BSON
JSON has powerful, but limited set of datatypes
• arrays, objects, strings, numbers and null
BSON is a binary representation of JSON
• Adds extra dataypes with Date, Int types, Id, …
• Optimized for performance and navigational abilities
• And compression
MongoDB sends and stores data in BSON
• bsonspec.org
17. Secondary Indexes
Create index on any Field in Document
// 1 means ascending, -1 means descending
> db.posts.ensureIndex({author: 1})
> db.posts.findOne({author: 'Chris'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris", ... }
18. Compound Indexes
Create index on multiple fields in a Document
// 1 means ascending, -1 means descending
> db.posts.ensureIndex({author: 1, ts: -1})
> db.posts.find({author: 'Chris'}).sort({ts: -1})
[{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author: "Chris", ...},
{ _id : ObjectId("4f61d325c496820ceba84124"),
author: "Chris", ...}]
21. Atomic Operations
$set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit
// Create a comment
> new_comment = { author: "Fred",
date: new Date(),
text: "Best Post Ever!"}
// Add to post
> db.posts.update({ _id: "..." },
{"$push": {comments: new_comment},
"$inc": {comments_count: 1}
});
22. Nested Documents
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date : "Thu Feb 02 2012 11:50:01",
text : "About MongoDB...",
tags : [ "tech", "databases" ],
comments : [{
author : "Fred",
date : "Fri Feb 03 2012 13:23:11",
text : "Best Post Ever!"
}],
comment_count : 1
}
23. Nested Documents
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Chris",
date : "Thu Feb 02 2012 11:50:01",
text : "About MongoDB...",
tags : [ "tech", "databases" ],
comments : [{
author : "Fred",
date : "Fri Feb 03 2012 13:23:11",
text : "Best Post Ever!"
}],
comment_count : 1
}
24. Secondary Indexes
// Index nested documents
> db.posts.ensureIndex("comments.author": 1)
> db.posts.find({"comments.author": "Fred"})
// Index on tags (multi-key index)
> db.posts.ensureIndex( tags: 1)
> db.posts.find( { tags: "tech" } )
25. Geo
• Geo-spatial queries
• Require a geo index
• Find points near a given point
• Find points within a polygon/sphere
// geospatial index
> db.posts.ensureIndex( "author.location": "2d" )
> db.posts.find( "author.location" :
{ $near : [22, 42] } )
26. Map Reduce
The caller provides map and reduce functions written
in JavaScript
// Emit each tag
> map = "this['tags'].forEach(
function(item) {emit(item, 1);}
);"
// Calculate totals
> reduce = "function(key, values) {
var total = 0;
var valuesSize = values.length;
for (var i=0; i < valuesSize; i++) {
total += parseInt(values[i], 10);
}
return total;
};
29. GridFS
Save files in mongoDB
Stream data back to the client
// (Python) Create a new instance of GridFS
>>> fs = gridfs.GridFS(db)
// Save file to mongo
>>> my_image = open('my_image.jpg', 'r')
>>> file_id = fs.put(my_image)
// Read file
>>> fs.get(file_id).read()
33. Deployment
• Single server
- need a strong backup plan P
• Replica sets
- High availability P S S
- Automatic failover
34. Deployment
• Single server
- need a strong backup plan P
• Replica sets
- High availability P S S
- Automatic failover
• Sharded
- Horizontally scale
- Auto balancing P S S
P S S
35. MongoDB Use Cases
• Archiving
• Content Management
• Ecommerce
• Finance
• Gaming
• Government
• Metadata Storage
• News & Media
• Online Advertising
• Online Collaboration
• Real-time stats/analytics
• Social Networks
• Telecommunications
37. download at mongodb.org
conferences, appearances, and meetups
http://www.10gen.com/events
Facebook | Twitter | LinkedIn
http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo
support, training, and this talk brought to you by
* JSON-style documents with dynamic schemas offer simplicity and power.\n* Index on any attribute, just like you're used to.\n* Mirror across LANs and WANs for scale and peace of mind.\n* Scale horizontally without compromising functionality.\n* Rich, document-based queries.\n* Atomic modifiers for contention-free performance.\n* Flexible aggregation and data processing.\n* Store files of any size without complicating your stack.\n
\n
\n
\n
* No joins for scalability - Doing joins across shards in SQL highly inefficient and difficult to perform.\n* MongoDB is geared for easy scaling - going from a single node to a distributed cluster is easy.\n* Little or no application code changes are needed to scale from a single node to a sharded cluster.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
* If document is always presented as a whole - a single doc gives performance benefits\n* A single doc is not a panacea - as we'll see\n