12. Tools for Working with Data
• Dynamic Schemas
• Embedded data structures
• Ad-hoc queries
– Simple Queries
– Aggregation Framework
• Secondary indexes
• Multi-Key indexes
13. Tools for Manipulating Data
• On the way out
– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne
– Vector: $in, $nin, $all, $size
• On the way in
– Scalar: $inc, $set, $unset
– Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet
18. One to One Relations
• “Belongs to” relationships are often embedded
• Document model provides a holistic
representation of objects with embedded entities
• Optimized for read performance
19. Use Case #2
As a Librarian, I want to
store multiple addresses
so I have a better chance
of getting my book back.
23. Book Data
MongoDB: The Definitive Guide,
By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
24. Book Model with Embedded
Publisher
book = {
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
25. Book Model with Embedded
Publisher
• Optimized for read performance of Books
• Other queries are difficult
– All publishers
26. Use Case #4
As a Librarian, I want to
see all the publishers in
the system.
27. Book Model with a Publisher
Link
publisher = {
_id: “oreilly”,
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
_id: “123”,
publisher_id: “oreilly”,
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
28. Use Case #5
As a Librarian, I want to
see all the books a
publisher has published.
29. Publisher Model with Book
Links
publisher = {
_id: “oreilly”,
name: "O’Reilly Media",
founded: "1980",
location: "CA“,
books: [“123”,…]
}
book = {
_id: “123”,
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
30. Use Case #6
As a Librarian, I want to
find the author(s) of
book “Foo”.
37. Linking vs. Embedding
• Embedding
– Great for read performance
• One seek to load entire object
• One roundtrip to database
– Writes can be slow
– Maintaining data integrity
• Linking
– More flexibility
– Data integrity is maintained
– Work is done during reads
44. Single Table Inheritance
id type area radius length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
• Sparse data
• Is missing value not required or an error?
46. Summary
• Schema design is different in MongoDB
• Basic data design principals stay the same
• Focus on how application accesses/manipulates
data
• Rapidly evolve schema to meet your
requirements
Concrete example of typical blog using a document oriented de-normalized approach
Represent rich data structures and complex relationships while keeping that data together on disk.
Focus on the way we store our data, neglecting the way we use it.
Document design cares first about how it’s used and we let that drive how we store the data.
Tools for data access
Slow to get address data every time you query for a user. Requires an extra operation.
Patron may have multiple addressesWith MongoDB, you simply start storing the address field as an array
Data duplication is OK!Publisher is immutable.
Best way to figure out something is going to perform is to measure.
What happens when oreilly moves? Do all the books have their publisher location changed?
Keep in mind that consistently growing documents is not good.
To get the authors given a book:- Single queryTo get books by a particular author: - get the author id - get books that have that author id in array
To get the authors given a book:- Single queryTo get books by a particular author: - get the author id - get books that have that author id in array
To get the authors given a book:- Single queryTo get books by a particular author: - get the author id - get books that have that author id in array
Getting the title of book published by an author is a single queryGetting the authors of a book. 2 queriesGet the book idQuery the author for books in the id
Rule is to measure.
Easy to query by parent category.Hard to find in subcategories.
Immediate parent is regexp query that is anchored to beginningAnywhere in the hierarchy is a regexp query.Not indexedHierachy information cannot be changed