SlideShare a Scribd company logo
1 of 54
Thinking in Documents
Perl Engineer & Evangelist, MongoDB, Inc
Mike Friedman
#mongodb
@friedo
Agenda
• What is a Record?
• Core Concepts
• What is an Entity?
• Associating Entities
• General Recommendations
All application development is
Schema Design
Success comes from
Proper Data Structure
What is a Record?
Key → Value
• One-dimensional storage
• Single value is a blob
• Query on key only
• No schema
• Value cannot be updated, only replaced
Key Blob
Relational
• Two-dimensional storage (tuples)
• Each field contains a single value
• Query on any field
• Very structured schema (table)
• In-place updates
• Normalization process requires many tables, joins,
indexes, and poor data locality
Primary
Key
Document
• N-dimensional storage
• Each field can contain 0, 1,
many, or embedded values
• Query on any field & level
• Flexible schema
• Inline updates *
• Embedding related data has optimal data locality,
requires fewer indexes, has better performance
_id
Core Concepts
Traditional Schema Design
Focus on data storage
Document Schema Design
Focus on data use
What answers do I have?
What questions do I
have?
Schema Design is
Flexible
Flexibility
• Choices for schema design
• Each record can have different fields
• Field names consistent for programming
• Common structure can be enforced by application
• Easy to evolve as needed
Building Blocks of
Document Schema
Design
1 - Arrays
[
1, 2, 3, "four",
5, "six", [ 7, 8, 9 ]
]
1 – Arrays
Multiple Values per Field
• Absent
• Set to null
• Set to a single value
• Set to an array of many values
Each field in a document can be:
1 – Arrays
Multiple Values per Field
• Query for any matching value
– Can be indexed and each value in the array is in the
index
2 – Embedded
Documents{
"foo": 42,
"bar": 43,
"stuff": { ... },
...
}
2 - Embedded Documents
• Avalue in a document can be another document
• Nested documents provide structure
• Query any field at any level
– Can be indexed
What is an Entity?
An Entity
• Object in your model
• Associations with other entities
An Entity
• Object in your model
• Associations with other entities
Referencing (Relational) Embedding (Document)
has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_ma
ny
Let's model something
together
How about a business
card?
Business Card
Referencing
Addresses
{
"_id": ,
"street":
,
"city": ,
"state": ",
"zip_code": ,
"country":
}
Contacts
{
"_id": ,
"name": ,
"title":
,
"company": ",
"phone": ,
"address_id":
}
Embedding
Contacts
{
"_id": ,
"name": ,
"title":
,
"company": ,
"address": {
"street": ,
"city": ,
"state": ,
"zip_code": ,
"country":
},
"phone":
}
Relational Schema
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_code
Document Schema
How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• state
• zip_code
Schema Flexibility
{
"name": ,
"title":
,
"company": ,
"address": {
"street": ,
"city": ,
"state": ,
"zip_code":
},
"phone":
}
{
"name": ,
"url": ,
"title": ,
"company": ,
"email": ,
"address": {
"street":
,
"city": ,
"state": ,
"zip_code":
}
"phone": ,
"fax"
}
Example
Let’s Look at an
Address Book
Address Book
• What questions do I have?
• What are my entities?
• What are my associations?
Address Book Entity-Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Associating Entities
One to One
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
One to One
Schema Design Choices
contact
• twitter_id
twitter1 1
contact twitter
• contact_id1 1
Redundant to track relationship on both sides
• Both references must be updated for consistency
• May save a fetch?
Contact
• twitter
twitter 1
One to One
General Recommendation
• Full contact info all at once
– Contact embeds twitter
• Parent-child relationship
– "contains"
• No additional data duplication
• Can query or index on embedded field
– e.g., "twitter.name"
– Exceptional cases…
• Reference portrait which has very large data
Contact
• twitter
twitter 1
One to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
One to Many
Schema Design Choices
contact
• phone_ids: [ ]
phone1 N
contact phone
• contact_id1 N
Redundant to track relationship on both sides
• Both references must be updated for consistency
• Not possible in relational DBs
• Save a fetch?
Contact
• phones
phone N
One to Many
General Recommendation
• Full contact info all at once
– Contact embeds multiple phones
• Parent-children relationship
– "contains"
• No additional data duplication
• Can query or index on any field
– e.g., { "phones.type": "mobile" }
– Exceptional cases…
• Scaling: maximum document size is 16MB
Contact
• phones
phone N
Many to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Many to Many
Traditional Relational Association
Join table
Contacts
• name
• company
• title
• phone
Groups
• name
GroupContacts
• group_id
• contact_id
Use arrays instead
X
Many to Many
Schema Design Choices
group
• contact_ids: [ ]
contactN N
group
contact
• group_ids: [
]
N N
Redundant to track
relationship on both sides
• Both references must be
updated for consistency
Redundant to track
relationship on both sides
• Duplicated data must be
updated for consistency
group
• contacts
contact
N
contact
• groups
group
N
Many to Many
General Recommendation
• Depends on use case
1. Simple address book
• Contact references groups
2. Corporate email groups
• Group embeds contacts for performance
• Exceptional cases
– Scaling: maximum document size is 16MB
– Scaling may affect performance and working set
group
contact
• group_ids: [
]
N N
Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type
• address
thumbnail
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
twitter
• name
• location
• web
• bio
N
N
N
1
1
Document model - holistic and efficient representation
Contact document example
{
"name" : "Gary J. Murakami, Ph.D.",
"company" : "MongoDB, Inc.",
"title" : "Lead Engineer",
"twitter" : {
"name" : "Gary Murakami", "location" : "New Providence, NJ",
"web" : "http://www.nobell.org"
},
"portrait_id" : 1,
"addresses" :
,
"phones" :
,
"emails" :
}
Working Set
To reduce the working set, consider…
• Reference bulk data, e.g., portrait
• Reference less-used data instead of embedding
– Extract into referenced child document
Also for performance issues with large documents
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterate schema design development
Measure performance, find bottlenecks, and embed
1. one to one associations first
2. one to many associations next
3. many to many associations
3. Migrate full dataset to new schema
New SoftwareApplication? Embed by default
Embedding over Referencing
• Embedding is a bit like pre-joined data
– BSON (Binary JSON) document ops are easy for the
server
• Embed (90/10 following rule of thumb)
– When the "one" or "many" objects are viewed in the
context of their parent
– For performance
– For atomicity
• Reference
– When you need more scaling
– For easy consistency with "many to many" associations
without duplicated data
It’s All About Your Application
• Programs+Databases = (Big) DataApplications
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs×MongoDB = Great Big DataApplications
Thank You
Perl Engineer & Evangelist, MongoDB
Mike Friedman
#mongodb
@friedo

More Related Content

What's hot

Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
Kai Zhao
 

What's hot (20)

Schema Design
Schema DesignSchema Design
Schema Design
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design Patterns Advanced Schema Design Patterns
Advanced Schema Design Patterns
 
Couch db
Couch dbCouch db
Couch db
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Introduction to Database
Introduction to DatabaseIntroduction to Database
Introduction to Database
 
Mongo DB Presentation
Mongo DB PresentationMongo DB Presentation
Mongo DB Presentation
 
MongoDB 101
MongoDB 101MongoDB 101
MongoDB 101
 
Copy of MongoDB .pptx
Copy of MongoDB .pptxCopy of MongoDB .pptx
Copy of MongoDB .pptx
 
Mongodb
MongodbMongodb
Mongodb
 
Mongodb tutorial at Easylearning Guru
Mongodb tutorial  at Easylearning GuruMongodb tutorial  at Easylearning Guru
Mongodb tutorial at Easylearning Guru
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Introduction to MongoDB.pptx
Introduction to MongoDB.pptxIntroduction to MongoDB.pptx
Introduction to MongoDB.pptx
 
Sql injection
Sql injectionSql injection
Sql injection
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similar to Back to Basics 1: Thinking in documents

Schema Design
Schema DesignSchema Design
Schema Design
MongoDB
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
Neo4j
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
enterprisesearchmeetup
 

Similar to Back to Basics 1: Thinking in documents (20)

Schema Design
Schema DesignSchema Design
Schema Design
 
Schema Design by Gary Murakami
Schema Design by Gary MurakamiSchema Design by Gary Murakami
Schema Design by Gary Murakami
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
 
Data_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdfData_Modeling_MongoDB.pdf
Data_Modeling_MongoDB.pdf
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Graph databases and the #panamapapers
Graph databases and the #panamapapersGraph databases and the #panamapapers
Graph databases and the #panamapapers
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should Have
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
Salesforce data model
Salesforce data modelSalesforce data model
Salesforce data model
 
Graph all the things - PRathle
Graph all the things - PRathleGraph all the things - PRathle
Graph all the things - PRathle
 
Relevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search TechnologiesRelevancy and Search Quality Analysis - Search Technologies
Relevancy and Search Quality Analysis - Search Technologies
 
Text Mining
Text MiningText Mining
Text Mining
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Back to Basics 1: Thinking in documents

  • 1. Thinking in Documents Perl Engineer & Evangelist, MongoDB, Inc Mike Friedman #mongodb @friedo
  • 2. Agenda • What is a Record? • Core Concepts • What is an Entity? • Associating Entities • General Recommendations
  • 3. All application development is Schema Design
  • 4. Success comes from Proper Data Structure
  • 5. What is a Record?
  • 6. Key → Value • One-dimensional storage • Single value is a blob • Query on key only • No schema • Value cannot be updated, only replaced Key Blob
  • 7. Relational • Two-dimensional storage (tuples) • Each field contains a single value • Query on any field • Very structured schema (table) • In-place updates • Normalization process requires many tables, joins, indexes, and poor data locality Primary Key
  • 8. Document • N-dimensional storage • Each field can contain 0, 1, many, or embedded values • Query on any field & level • Flexible schema • Inline updates * • Embedding related data has optimal data locality, requires fewer indexes, has better performance _id
  • 12. What answers do I have? What questions do I have?
  • 14. Flexibility • Choices for schema design • Each record can have different fields • Field names consistent for programming • Common structure can be enforced by application • Easy to evolve as needed
  • 16. 1 - Arrays [ 1, 2, 3, "four", 5, "six", [ 7, 8, 9 ] ]
  • 17. 1 – Arrays Multiple Values per Field • Absent • Set to null • Set to a single value • Set to an array of many values Each field in a document can be:
  • 18. 1 – Arrays Multiple Values per Field • Query for any matching value – Can be indexed and each value in the array is in the index
  • 19. 2 – Embedded Documents{ "foo": 42, "bar": 43, "stuff": { ... }, ... }
  • 20. 2 - Embedded Documents • Avalue in a document can be another document • Nested documents provide structure • Query any field at any level – Can be indexed
  • 21. What is an Entity?
  • 22. An Entity • Object in your model • Associations with other entities An Entity • Object in your model • Associations with other entities Referencing (Relational) Embedding (Document) has_one embeds_one belongs_to embedded_in has_many embeds_many has_and_belongs_to_ma ny
  • 23. Let's model something together How about a business card?
  • 25. Referencing Addresses { "_id": , "street": , "city": , "state": ", "zip_code": , "country": } Contacts { "_id": , "name": , "title": , "company": ", "phone": , "address_id": }
  • 26. Embedding Contacts { "_id": , "name": , "title": , "company": , "address": { "street": , "city": , "state": , "zip_code": , "country": }, "phone": }
  • 27. Relational Schema Contact • name • company • title • phone Address • street • city • state • zip_code
  • 28. Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • State • zip_code Document Schema
  • 29. How are they different? Why? Contact • name • company • title • phone Address • street • city • state • zip_code Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • state • zip_code
  • 30. Schema Flexibility { "name": , "title": , "company": , "address": { "street": , "city": , "state": , "zip_code": }, "phone": } { "name": , "url": , "title": , "company": , "email": , "address": { "street": , "city": , "state": , "zip_code": } "phone": , "fax" }
  • 32. Let’s Look at an Address Book
  • 33. Address Book • What questions do I have? • What are my entities? • What are my associations?
  • 34. Address Book Entity-Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 36. One to One Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 37. One to One Schema Design Choices contact • twitter_id twitter1 1 contact twitter • contact_id1 1 Redundant to track relationship on both sides • Both references must be updated for consistency • May save a fetch? Contact • twitter twitter 1
  • 38. One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship – "contains" • No additional data duplication • Can query or index on embedded field – e.g., "twitter.name" – Exceptional cases… • Reference portrait which has very large data Contact • twitter twitter 1
  • 39. One to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 40. One to Many Schema Design Choices contact • phone_ids: [ ] phone1 N contact phone • contact_id1 N Redundant to track relationship on both sides • Both references must be updated for consistency • Not possible in relational DBs • Save a fetch? Contact • phones phone N
  • 41. One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – "contains" • No additional data duplication • Can query or index on any field – e.g., { "phones.type": "mobile" } – Exceptional cases… • Scaling: maximum document size is 16MB Contact • phones phone N
  • 42. Many to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 43. Many to Many Traditional Relational Association Join table Contacts • name • company • title • phone Groups • name GroupContacts • group_id • contact_id Use arrays instead X
  • 44. Many to Many Schema Design Choices group • contact_ids: [ ] contactN N group contact • group_ids: [ ] N N Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency group • contacts contact N contact • groups group N
  • 45. Many to Many General Recommendation • Depends on use case 1. Simple address book • Contact references groups 2. Corporate email groups • Group embeds contacts for performance • Exceptional cases – Scaling: maximum document size is 16MB – Scaling may affect performance and working set group contact • group_ids: [ ] N N
  • 46. Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
  • 47. Contact document example { "name" : "Gary J. Murakami, Ph.D.", "company" : "MongoDB, Inc.", "title" : "Lead Engineer", "twitter" : { "name" : "Gary Murakami", "location" : "New Providence, NJ", "web" : "http://www.nobell.org" }, "portrait_id" : 1, "addresses" : , "phones" : , "emails" : }
  • 48. Working Set To reduce the working set, consider… • Reference bulk data, e.g., portrait • Reference less-used data instead of embedding – Extract into referenced child document Also for performance issues with large documents
  • 50. Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design development Measure performance, find bottlenecks, and embed 1. one to one associations first 2. one to many associations next 3. many to many associations 3. Migrate full dataset to new schema New SoftwareApplication? Embed by default
  • 51. Embedding over Referencing • Embedding is a bit like pre-joined data – BSON (Binary JSON) document ops are easy for the server • Embed (90/10 following rule of thumb) – When the "one" or "many" objects are viewed in the context of their parent – For performance – For atomicity • Reference – When you need more scaling – For easy consistency with "many to many" associations without duplicated data
  • 52. It’s All About Your Application • Programs+Databases = (Big) DataApplications • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs×MongoDB = Great Big DataApplications
  • 53.
  • 54. Thank You Perl Engineer & Evangelist, MongoDB Mike Friedman #mongodb @friedo