2. Topics
Introduction
• Basic Data Modeling
• Manipulating Data
• Evolving a schema
Common patterns
• Single table inheritance
• One-to-Many & Many-to-Many
• Trees
• Queues
3. So why model data?
http://www.flickr.com/photos/42304632@N00/493639870/
4. Benefits of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
• MongoDB continues this separation
5. Normalization
Goals
• Avoid anomalies when inserting, updating or
deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias toward a particular query
In MongoDB
• Similar goals apply
• The rules are different
9. DB Considerations
How can we manipulate Access Patterns?
this data?
• Dynamic Queries • Read / Write Ratio
• Secondary Indexes • Types of updates
• Atomic Updates • Types of queries
• Map Reduce • Data life-cycle
Further Considerations
• No Joins
• Document writes are atomic
47. Trees
Full Tree in Document
{ comments: [
{ author: "Bernie", text: "...",
replies: [
{author: "James", text: "...",
replies: [ ] }
] }
]
}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
48. Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
49. Array of Ancestors
- Store all Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
50. Array of Ancestors
- Store all Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
//find all descendants of b:
> db.tree2.find( { ancestors: 'b' } )
//find all direct descendants of b:
> db.tree2.find( { parent: 'b' } )
51. Array of Ancestors
- Store all Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
//find all descendants of b:
> db.tree2.find( { ancestors: 'b' } )
//find all direct descendants of b:
> db.tree2.find( { parent: 'b' } )
//find all ancestors of f:
> ancestors = db.tree2.findOne( { _id: 'f' } ).ancestors
> db.tree2.find( { _id: { $in : ancestors } )
52. Trees as Paths
Store hierarchy as a path expression
- Separate each node by a delimiter, e.g. "/"
- Use text search for find parts of a tree
{ comments: [
{ author: "Bernie", text: "initial post",
path: "/" },
{ author: "Jim", text: "jim’s comment",
path: "/jim" },
{ author: "Bernie", text: "Bernie’s reply to Jim",
path : "/jim/bernie"} ] }
// Find the conversations Jim was a part of
> db.posts.find( { path: /jim/i } )
53. Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
{ inprogress: false,
priority: 1,
...
}
54. Queue
• Need to maintain order and state
• Ensure that updates to the queue are atomic
{ inprogress: false,
priority: 1,
...
}
// find highest priority job and mark as in‐progress
job = db.jobs.findAndModify( {
query: { inprogress: false },
sort: { priority: ‐1 },
update: { $set: {inprogress: true,
started: new Date() } },
new: true } )
55. Summary
Schema design is different in MongoDB
Basic data design principals stay the same
Focus on how the apps manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)
In 1.6.x M/R results stored in temp collection until connection closed. Could define an output collection.\nIn 1.8.x M/R results are stored to a permanent collection unless you specify not to.\n
\n
key: fields to group by\ninitial: The initial value of the aggregation counter\nreduce: The reduce function that aggregates the objects we iterate over.\n
\n
\n
\n
\n
find() will only return documents where the field exists (and in this case it's value is greater than 0\n
Indexes can still be created for fields that don't appear in all documents.\n\nNew in 1.8.x: sparse indexes: The index only includes the documents where the field exists. In a normal index the non-existent fields are treated as null values.\n
\n
\n
The greater the height of the tree the harder it becomes to query.\n
Normalized: Two collections instead of one.\n\nMore flexible but requires more queries to retrieve the same data.\n
Strong life-cycle association: use embedded array\n\nOtherwise you have options: embedded array/tree or normalize the data\n
\n
\n
Two collections\n\nOne option: arrays of keys (pointers) in each document that point to documents in another collection\n
Only one query to find the category for a product given the product id.\n\nOnly one query to find products in a category given the category id.\n
Alternative: only store an array of keys in the documents of one collection.\n\nAdvantage: less storage space required in the categories collection\n\n
Finding all the products in a given category is still one query.\n\n
Disadvantage: Finding all the categories for a given product is two queries.\n
4MB limit in 1.6.x\n16MB in 1.8.x\n
\n
\n
\n
\n
\n
\n
findAndModify returns one result object, update is atomic\n\nquery: The query filter\nsort: if multiple documents match, return the first one in the sorted results\nupdate: a modifier object that specifies the mods to make\nnew: return the modified object, otherwise return the old object\n