15. > patron = db.patrons.find({ _id : “joe” })
{
_id: "joe",
name: "Joe Bookreader",
address: {
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: 12345
}
}
A Patron and their Address
16. One-to-One Relationships
• “Belongs to” relationships are often embedded.
• Holistic representation of entities with their
embedded attributes and relationships.
• Optimized for read performance
20. Migration Possibilities
• Migrate all documents when the schema changes.
• Migrate On-Demand
– As we pull up a patron’s document, we make the change.
– Any patrons that never come into the library never get
updated.
• Leave it alone
– As long as the application knows about both types…
22. Book
MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
35. > authors = db.authors.find({ _id : “kchodorow” })
{
_id: "kchodorow",
name: "Kristina Chodorow",
hometown: "Cincinnati",
books: [ {id: “123”, title : "MongoDB: The Definitive Guide“ } ]
}
> book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [
{ id: "kchodorow", name: "Kristina Chodorow” },
{ id: "mdirolf", name: "Mike Dirolf” }
]
}
Links on both Authors and Books
36. Linking vs. Embedding
• Embedding
– Great for read performance
– Writes can be slow
– Data integrity needs to be managed
• Linking
– Flexible
– Data integrity is built-in
– Work is done during reads
39. > book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
categories: [“MongoDB”, “Databases”, “Programming”]
}
> db.books.find({ categories: “Databases” })
Categories as an Array
40. > book = db.books.find({ _id : “123” })
{
_id: “123”,
title: "MongoDB: The Definitive Guide",
category: “Programming/Databases/MongoDB”
}
> db.books.find({ category: ^Programming/Databases/* })
Categories as a Path
41. Conclusion
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how an application accesses/manipulates
data
• Evolve the schema to meet requirements as they
change
Schema Design is very important; its impact on your application is pervasive.
We call the “dynamic” nature of a schema in MongoDB an “Application Defined Schema”.
Wrong data structure will hurt you.
Proper data structure can make all the pieces fall into place.
A document is JSON.
A value can be an integer, string, document, array, array of documents, etc…
Focus on the way we store our data, neglecting the way we use it.
Focus on how we use our data, neglecting (sort-of) how we store it.
Has all the answers, but none can be given in an optimal way.
Has zero knowledge of your application’s known queries, use cases, or client-side data structures.
Has all the answers, but also knows what questions are going to be asked.
Takes advantage of known queries, use cases, and client-side data structures.
Imagine a patron walks up to the counter and presents his/her library card to check out some books. The first thing a librarian might want to do is confirm the patron’s address so as to have a place to send the library police when the book isn’t returned in a timely manner.
This is entirely doable, and might be advantageous in a number of other use cases. But since we want to lookup the patron and their address at the same time, this is inefficient as it requires 2 queries.
Embedded directly into the patron document.
Only 1 query is necessary.
Holistic view of a patron.
Read performance is optimized because we only need a single query and a single disk/memory hit. Write performance change is negligible.
Business Requirements Change!
A librarian want’s all the places his/her book might be hiding out, and having more addresses for a patron is more places to look.
Now, just store addresses as an array.
Embedded directly into the patron document.
Only 1 query is necessary.
Holistic view of a patron.
Schema isn’t rigid, but dynamic.
An application defines the schema, and having two ways to represent addresses is entirely possible.
Duplicate publisher in every book that the publisher has published. Data duplication is OK because the publisher is immutable.
Best way to figure out how something is going to perform is to measure.
Still have the previous question, who is the publisher of this book? Takes 2 queries.
Same problems that exist in traditional systems. Foreign keys, while keeping data integrity, tend to erase history.
Unbounded arrays are BAD!
Take advantage of data that’s immutable. Duplicate data is OK.
Recursive search to find all books about databases.
When a category hierarchy gets changed, all documents will need to be re-categorized.
If one category name exists in multiple hierarchies, then further refinement would need to happen.
Uses a multi-key index.
When a category hierarchy gets changed, all documents will need to be re-categorized.
Uses an index because of the anchored regular expression.