- The document discusses modeling data in MongoDB based on cardinality and access patterns.
- It provides examples of embedding related data for one-to-one and one-to-many relationships, and references for large collections.
- The document recommends considering read/write patterns and embedding objects for efficient access, while breaking out data if it grows too large.
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
25% Code JoeD discount on Back to Basics webinar
1.
2. Code JoeD gets you a 25% discount off the list price
Early Bird Registration Ends May 13, 2016
3. Back to Basics 2016 : Webinar 3
Thinking in Documents
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1
4. 4
Review
• Webinar 1 : Introduction to NoSQL
– Types of NoSQL database
– MongoDB is a document database
– Replica Sets and Shards
• Webinar 2
– Building a basic application
– Adding indexes
– Using Explain to measure database operators
5. 5
Thinking in Documents
• Documents in MongoDB are Javascript Objects (JSON)
• Actually they are encoded as BSON
• BSON is “Binary JSON”
• BSON allows efficient encoding and decoding of JSON
• Required for efficient transmission and storage on disk
• Eliminates the need to “text parse” all the sub objects
• Full spec is online at http://bsonspec.org/
6. 6
Example Document
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Fields
Typed field values
Fields can
contain arrays
7. 7
Data Stores – Key Value
Key 1 Value
Key 1 Value
Key 1 Value
8. 8
Data Stores - Relational
Key 1
Value 1
Value 1
Value 1
Value 1
Key 2
Value 1
Value 1
Value 1
Value 1
Key 3
Value 1
Value 1
Value 1
Value 1
Key 4
Value 1
Value 1
Value 1
Value 1
9. 9
Data Stores - Document
Key3
Key4
Key5
Value 3
Value 5
Value 4Key6
Value 5Key7
Value 2
Value 1Key1
Key1
Key1
Key2
11. 11
Some Example Queries
# Will find the first two documents
db.demo.find( { “key1” : “value” } )
# find the second document by nested value
db.demo.find( { "key1.key3.key4" : "value 3" } )
# will find the third document
db.demo.find( { "key1.key6" : "value 6" } )
12. 12
Modelling and Cardinality
• One to One
–Title to blog post
• One to Many
–Blog post to comments
• One to Millions
–Blog post to site views (e.g. Huffington Post)
13. 13
One To One
{
“Title” : “This is a blog post”,
“Body” : “This is the body text of a very
short blog post”,
…
}
We can index on “Title” and “Body”.
14. 14
One to Many
{
“Title” : “This is a blog post”,
“Body” : “This is the body text”,
“Comments” : [ { “name” : “Joe Drumgoole”,
“email” : “Joe.Drumgoole@mongodb.com”,
“comment” : “I love your writing style” },
{ “name” : “John Smith”,
“email” : “John.Smith@example.com”,
“comment” : “I hate your writing style” }]
}
Where we expect a small number of comments we can embed them
in the main document
15. 15
Key Concerns
• What are the write patterns?
–Comments are added more frequently than posts
–Comments may have images, tags, large bodies of text
• What are the read patterns?
–Comments may not be displayed
–May be shown in their own window
–People rarely look at all the comments
16. 16
Approach 2 – Separate Collection
• Keep all comments in a separate comments collection
• Add references to comments as an array of comment IDs
• Requires two queries to display blog post and associated comments
• Requires two writes to create a comments
{
_id : ObjectID( “AAAA” ),
name : “Joe Drumgoole”,
email : “Joe.Drumgoole@mongodb.com”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
name : “John Smith”,
email : “Joe.Drumgoole@mongodb.com”,
comment :“I hate your writing style”,
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [ ObjectID( “AAAA” ),
ObjectID( “AAAB” )]
}
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : []
}
17. 17
Approach 3 – A Hybrid Approach
{
“_id” : ObjectID( “ZZZZ” ),
“Title” : “A Blog Title”,
“Body” : “A blog post”,
“comments” : [{
“_id” : ObjectID( “AAAA” )
“name” : “Joe Drumgoole”,
“email” : “Joe.D@mongodb.com”,
comment :“I love your writing style”,
}
{
_id : ObjectID( “AAAB” ),
name : “John Smith”,
email : “Joe.Drumgoole@mongodb.com”,
comment :“I hate your writing style”,
}]
}
{
“_post_jd” : ObjectID( “ZZZZ” ),
“comments” : [{
“_id” : ObjectID( “AAAA” )
“name” : “Joe Drumgoole”,
“email” : “Joe.D@mongodb.com”,
“comment” :“I love your writing
style”,
}
{...},{...},{...},{...},{...},{...}
,{..},{...},{...},{...} ]
18. 18
What About One to A Million
• What is we were tracking mouse position for heat tracking?
– Each user will generate hundreds of data points per visit
– Thousands of data points per post
– Millions of data points per blog site
• Reverse the model
– Store a blog ID per event
{
“post_id” : ObjectID(“ZZZZ”),
“timestamp” : ISODate("2005-01-02T00:00:00Z”),
“location” : [24, 34]
“click” : False,
}
20. 20
Guidelines
• Embed objects for one to one capabilities
• Look at read and write patterns to determine when to break out data
• Don’t get stuck in “one record” per item thinking
• Embrace the hierarchy
• Think about cardinality
• Grow your data by adding documents not be increasing document size
• Think about your indexes
• Document updates are transactions