This document discusses schema design patterns for MongoDB. It begins by comparing terminology between relational databases and MongoDB. Common patterns for modeling one-to-one, one-to-many, and many-to-many relationships are presented using examples of patrons, books, authors, and publishers. Embedded documents are recommended when related data always appears together, while references are used when more flexibility is needed. The document emphasizes focusing on how the application accesses and manipulates data when deciding between embedded documents and references. It also stresses evolving schemas to meet changing requirements and application logic.
11. Schema Design Considerations
• What is a priority?
– High consistency
– High read performance
– High write performance
• How does the application access and manipulate
data?
– Read/Write Ratio
– Types of Queries / Updates
– Data life-cycle and growth
– Analytics (Map Reduce, Aggregation)
12. Tools for Data Access
• Flexible Schemas
• Embedded data structures
• Secondary Indexes
• Multi-Key Indexes
• Aggregation Framework
– Pipeline operators: $project, $match, $limit,
$skip, $sort, $group, $unwind
• No Joins
18. One to One Relations
• “Contains” relationships are often
embedded.
• Document provides a holistic representation
of objects with embedded entities.
• Optimized read performance.
22. Publishers and Books relation
• Publishers put out many books
• Books have one publisher
23. MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
Book Data
24. book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
Book Model with Embedded Publisher
25. publisher = {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Book Model with Normalized Publisher
26. publisher = {
_id: "oreilly",
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
Link with Publisher _id as a
Reference
27. publisher = {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
books: [ "123456789", ... ]
}
book = {
_id: "123456789",
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Link with Book _ids as a Reference
28. Where do you put the reference?
• Reference to single publisher on books
– Use when items have unbounded growth (unlimited # of
books)
• Array of books in publisher document
– Optimal when many means a handful of items
– Use when there is a bound on potential growth
35. Referencing vs. Embedding
• Embedding is a bit like pre-joining data
• Document level operations are easy for the
server to handle
• Embed when the “many” objects always
appear with (viewed in the context of) their
parents.
• Reference when you need more flexibility
How does your application access and
manipulate data?
41. book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
db.books.find( { authors.name : "Kristina Chodorow" } )
Where do you put the reference?
Think about common queries
42. Where do you put the reference?
Think about indexes
book = {
title: "MongoDB: The Definitive Guide",
authors : [
{ _id: "kchodorow", name: "Kristina Chodorow” },
{ _id: "mdirolf", name: "Mike Dirolf” }
]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
author = {
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "New York"
}
db.books.createIndex( { authors.name : 1 } )
43. Summary
• Schema design is different in MongoDB
• Basic data design principals apply
• Focus on how application accesses and
manipulates data
• Evolve schema to meet changing
requirements
• Application-level logic is important!
Flexibility – Ability to represent rich data structures Performance – Benefit from data locality
Concrete example of typical blog using a document oriented de-normalized approach
Tools for data access
Tools for data manipulation
Slow to get address data every time you query for a user. Requires an extra operation.
Patron may have two addresses, in this case, you would need a separate table in a relation database With MongoDB, you simply start storing the address field as an array Only patrons which have multiple addresses could have this schema! No migration necessary! but Caution: Additional application logic required!
Publisher is repeated for every book, data duplication!
Publisher is better being a separate entity and having its own collection.
Now to create a relation between the two entities, you can choose to reference the publisher from the book document. This is similar to the relational approach for this very same problem.
OR: because we are using MongoDB and documents can have arrays you can choose to model the relation by creating and maintaining an array of books within each publisher entity. Careful with mutable, growing arrays. See next slide.
Costly for a small number of books because to get the publisher
And data locality provides speed
tie back to examples, give some concrete scenarios
Authors often use pseudonyms for a book even though it’s the same individual To get books by a particular author: - get the author - get books that have that author id in array
To get the authors given a book: - Single query To get books by a particular author: - get the author id - get books that have that author id in array
Getting the title of book published by an author is a single query Getting the authors of a book. 2 queries Get the book id Query the author for books in the id