SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Schema Design
Schema Design – Gary Murakami
Schema Design – Gary Murakami
Chess 4.5 (Northwestern University)
Larry Atkin & Dave Slate
Schema Design – Gary Murakami
Agenda
• What is a Record?
• Core Concepts
• What is an Entity?
• Associating Entities
• General Recommendations
• Questions
Schema Design – Gary Murakami
All application development is
Schema Design
Schema Design – Gary Murakami
Success comes from
Proper Data
Structure
What is a Record?
Schema Design – Gary Murakami
Key → Value
• One-dimensional
• Single value is a blob
• Query on key only
• No schema
• Value cannot be updated, only replaced
Key Blob
Schema Design – Gary Murakami
Relational
• Two-dimensional (tuples)
• Each field is a single value
• Query on any field
• Very structured schema (table)
• In-place updates *
• Normalization requires many tables, joins,
indexes, and poor data locality and
performance
Primary
Key
Schema Design – Gary Murakami
Document
• N-dimensional
• Each field can contain 0, 1,
many, or embedded values
• Query on any field & level
• Flexible schema
• Inline updates *
• Embedding related data has optimal data
locality, requires fewer indexes, has better
performance
_id
Core Concepts
Schema Design – Gary Murakami
Traditional Schema Design
Focus on data storage
Schema Design – Gary Murakami
Document Schema Design
Focus on data use
Schema Design – Gary Murakami
Another way to think about it
Traditional:
What answers do I have?
Document:
What questions do I
have?
Schema Design – Gary Murakami
Three Building Blocks of
Document Schema
Design
Schema Design – Gary Murakami
1 – Flexibility
• Choices for schema design
• Each record can have different fields
• Field names consistent for programming
• Common structure can be enforced by
application
• Easy to evolve as needed
Schema Design – Gary Murakami
2 – Arrays
Multiple Values per Field
• Each field can be:
– Absent
– Set to null
– Set to a single value
– Set to an array of many values
• Query for any matching value
– Can be indexed and each value in the array is in the
index
Schema Design – Gary Murakami
3 - Embedded Documents
• Any value can be a document
• Nested documents provide structure
• Query any field at any level
– Can be indexed
Schema Design – Gary Murakami
Belle and Endgame tablebases
Play chess with God – Ken
Thompson
What is an Entity?
Schema Design – Gary Murakami
An Entity
• Object in your model
• Associations with other entities
Referencing (Relational) Embedding (Document)
has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_ma
ny
MongoDB has both referencing and embedding for universal
coverage
Schema Design – Gary Murakami
Let's model something
together
How about a business
card?
Business Card
Schema Design – Gary Murakami
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Referencing
Schema Design – Gary Murakami
Addresses
{
“_id”: 1,
“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”,
“country”: “USA”
}
Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“address”: {
“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”,
“country”: “USA”
},
“phone”: “408-996-1010”
}
Embedding
Schema Design – Gary Murakami
Schema Design – Gary Murakami
Relational Schema
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_code
Schema Design – Gary Murakami
Document Schema
Schema Design – Gary Murakami
How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• state
• zip_code
{
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“address”: {
“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”
},
“phone”: “408-996-1010”
}
Schema Flexibility
Schema Design – Gary Murakami
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“company”: “Google!”,
“email”: “larry@google.com”,
“address”: {
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“state”: “CA”,
“zip_code”: “94301”
}
“phone”: “650-618-1499”,
“fax”: “650-330-0100”
}
Schema Design – Gary Murakami
Longest “Database Endgame”
Mate
• Augment schema with meta data
– Distance to mate (DTM)
– Distance to conversion (DTC)
• Retrograde analysis of DB
• Longest checkmate
– 6 piece – 262 moves, KRNKNN
– 7 piece – 517 moves, so far
• Completion by 2015
Example
Schema Design – Gary Murakami
Let’s Look at an
Address Book
Schema Design – Gary Murakami
Address Book
• What questions do I have?
• What are my entities?
• What are my associations?
Schema Design – Gary Murakami
Address Book Entity-
Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Associating Entities
Schema Design – Gary Murakami
One to One
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Schema Design – Gary Murakami
One to One
Schema Design Choices
contact
• twitter_id
twitter1 1
contact twitter
• contact_id1 1
Redundant to track relationship on both sides
• Both references must be updated for consistency
• Saves a fetch if no twitter
Contact
• twitter
twitter 1
Schema Design – Gary Murakami
One to One
General Recommendation
• Full contact info all at once
– Contact embeds twitter
• Parent-child relationship
– “contains”
• No additional data duplication
• Can query or index on embedded field
– e.g., “twitter.name”
Contact
• twitter
twitter 1
Schema Design – Gary Murakami
One to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Schema Design – Gary Murakami
One to Many
Schema Design Choices
contact
• phone_ids: [ ]
phone1 N
contact phone
• contact_id1 N
Redundant to track relationship on both sides
• Both references must be updated for consistency
• Not possible in relational DBs
• Saves a fetch if no phones
Contact
• phones
phone N
Schema Design – Gary Murakami
One to Many
General Recommendation
• Full contact info all at once
– Contact embeds multiple phones
• Parent-children relationship
– “contains”
• No additional data duplication
• Can query or index on any field
– e.g., { “phones.type”: “mobile” }
Contact
• phones
phone N
Schema Design – Gary Murakami
Many to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1
Schema Design – Gary Murakami
Many to Many
Traditional Relational Association
Join table
Contacts
• name
• company
• title
• phone
Groups
• name
GroupContacts
• group_id
• contact_id
X
Use arrays instead
Schema Design – Gary Murakami
Many to Many
Schema Design Choices
group
• contact_ids: [ ]
contactN N
group
contact
• group_ids: [
]
N N
Redundant to track
relationship on both sides
• Both references must be
updated for consistency
Redundant to track
relationship on both sides
• Duplicated data must be
updated for consistency
group
• contacts
contact
N
contact
• groups
group
N
Schema Design – Gary Murakami
Many to Many
General Recommendation
• Depends on use case
1. Simple address book
• Contact references groups
2. Corporate email groups
• Group embeds contacts for performance
group
contact
• group_ids: [
]
N N
Schema Design – Gary Murakami
Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type
• address
thumbnail
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
twitter
• name
• location
• web
• bio
N
N
N
1
1
Document model - holistic and efficient representation
{
“name” : “Gary J. Murakami, Ph.D.”,
“company” : “10gen (the MongoDB) company”,
“title” : “Lead Engineer and Ruby Evangelist”,
“twitter” : {
“name” : “GaryMurakami”, “location” : “New Providence, NJ”,
“web” : “http://www.nobell.org”
},
“portrait_id” : 1,
“addresses” : [
{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” :
“10036” }
],
“phones” : [
{ “type” : “work”, “number” : “1-866-237-8815 x8015” }
],
“emails” : [
{ “type” : “work”, “address” : “gary.murakami@10gen.com” },
{ “type” : “home”, “address” : “gjm@nobell.org” }
]
}
Contact document example
Schema Design – Gary Murakami
Schema Design – Gary Murakami
Can We Solve Chess
One Day?
• Chess tablebase problem
– Chess programs often play worse
– Search is not localized, poor cache performance, seeks
– Working set too large for memory
• Endgame database size – big data
– 5 piece: 7 GB compressed 75%
• 157 MB Shredderbase – 1000x
• 441 MB Shredderbase – 10,000x
– 6 piece: 1.2 TB compressed
– 7 piece: 70 TB estimated by 2015
Schema Design – Gary Murakami
Working Set
1. To reduce the working set
– reference less-used data instead of embedding
• extract into referenced child document
– reference bulk data, e.g., portrait
2. To increase resources
– read from secondaries in a replica set
– use sharding
General
Recommendations
Schema Design – Gary Murakami
Embedding over Referencing
• Embed
– When “one” or “many” objects are viewed with their parent
– For performance
– For atomicity
• Reference
– When you need more scaling: max document size is
16MB
– For easy “many to many” associations
– For smaller parent documents and working set
Schema Design – Gary Murakami
Legacy Migration
1. Copy existing schema & some data to
MongoDB
2. Iterate schema design
1. Measure performance and find bottlenecks
2. Denormalize by embedding
1. one to one associations first
2. one to many associations next
3. many to many associations last
3. Examine, measure and analyze, review concerns,
scaling
Schema Design – Gary Murakami
New Application
1. Focus on your application
1. Requests
2. Responses
3. Business-domain model objects / data structures
2. Then persist language object data to
MongoDB
1. Collections
2. Associations
3. Refactor for optimization and add indices
Schema Design – Gary Murakami
It’s All About Your
Application
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs+Databases = (Big) Data Applications
Schema Design – Gary Murakami
It’s All About Your
Application
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs MongoDB = Great Big Data
Applications
• Play chess with God
Schema Design – Gary Murakami
It’s All About Your
Application
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs MongoDB = Great Big Data
Applications
• Play music with God – AAC
Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Questions?
"His pattern indicates
two-dimensional thinking.”
- Spock
Star Trek II: The Wrath of Khan
www.3dchessfederation.com
Thank you so much to our community who
made An Evening with MongoDB Minneapolis
possible:
• David Hussman
• Josh Kennedy
• Matthew Chimento
• Jeffrey Lemmerman
• Dan Chamberlain
• Christopher Rueber
• Erin Newkirk
Thank you DevJam for hosting our event!

Weitere ähnliche Inhalte

Was ist angesagt?

Analysing Query Intent - SearchLeeds
Analysing Query Intent - SearchLeedsAnalysing Query Intent - SearchLeeds
Analysing Query Intent - SearchLeedsRory Truesdale
 
Local SEO Ranking Factors and Citations with Darren Shaw of Whitespark
Local SEO Ranking Factors and Citations with Darren Shaw of WhitesparkLocal SEO Ranking Factors and Citations with Darren Shaw of Whitespark
Local SEO Ranking Factors and Citations with Darren Shaw of WhitesparkGail Gardner
 
2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools RoundtableSusanna Frazier
 
Local seo tricks
Local seo tricksLocal seo tricks
Local seo tricksJR Fisher
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training IntroductionMax De Marzi
 
Visualizing your Graph
Visualizing your GraphVisualizing your Graph
Visualizing your GraphMax De Marzi
 

Was ist angesagt? (7)

Analysing Query Intent - SearchLeeds
Analysing Query Intent - SearchLeedsAnalysing Query Intent - SearchLeeds
Analysing Query Intent - SearchLeeds
 
Local SEO Ranking Factors and Citations with Darren Shaw of Whitespark
Local SEO Ranking Factors and Citations with Darren Shaw of WhitesparkLocal SEO Ranking Factors and Citations with Darren Shaw of Whitespark
Local SEO Ranking Factors and Citations with Darren Shaw of Whitespark
 
2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable
 
Local seo tricks
Local seo tricksLocal seo tricks
Local seo tricks
 
Neo4j Training Introduction
Neo4j Training IntroductionNeo4j Training Introduction
Neo4j Training Introduction
 
Visualizing your Graph
Visualizing your GraphVisualizing your Graph
Visualizing your Graph
 
SEO ppt
SEO pptSEO ppt
SEO ppt
 

Ähnlich wie Schema Design by Gary Murakami

Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsMongoDB
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema DesignMongoDB
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhenDavid Peyruc
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2Neo4j
 
SCP_lnkd_Presentation_kdd2014.pptx
SCP_lnkd_Presentation_kdd2014.pptxSCP_lnkd_Presentation_kdd2014.pptx
SCP_lnkd_Presentation_kdd2014.pptxForward Gradient
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
Graph all the things - PRathle
Graph all the things - PRathleGraph all the things - PRathle
Graph all the things - PRathleNeo4j
 
Graph Query Languages: update from LDBC
Graph Query Languages: update from LDBCGraph Query Languages: update from LDBC
Graph Query Languages: update from LDBCJuan Sequeda
 
Google For Linked In09 25 12
Google For Linked In09 25 12Google For Linked In09 25 12
Google For Linked In09 25 12jhayesdc
 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
 
How We Localize & Mobilize WP Sites - Pubcon 2013
How We Localize & Mobilize WP Sites - Pubcon 2013How We Localize & Mobilize WP Sites - Pubcon 2013
How We Localize & Mobilize WP Sites - Pubcon 2013Search Commander, Inc.
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020Matthew Groves
 
Peepcon schema presentation
Peepcon schema presentationPeepcon schema presentation
Peepcon schema presentationDennis Seymour
 

Ähnlich wie Schema Design by Gary Murakami (20)

Back to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documentsBack to Basics 1: Thinking in documents
Back to Basics 1: Thinking in documents
 
Schema Design
Schema DesignSchema Design
Schema Design
 
Webinar: Schema Design
Webinar: Schema DesignWebinar: Schema Design
Webinar: Schema Design
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
 
Schema Design
Schema DesignSchema Design
Schema Design
 
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And WhentranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
MongoDB Basics
MongoDB BasicsMongoDB Basics
MongoDB Basics
 
SCP_lnkd_Presentation_kdd2014.pptx
SCP_lnkd_Presentation_kdd2014.pptxSCP_lnkd_Presentation_kdd2014.pptx
SCP_lnkd_Presentation_kdd2014.pptx
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Graph all the things - PRathle
Graph all the things - PRathleGraph all the things - PRathle
Graph all the things - PRathle
 
Graph Query Languages: update from LDBC
Graph Query Languages: update from LDBCGraph Query Languages: update from LDBC
Graph Query Languages: update from LDBC
 
Google For Linked In09 25 12
Google For Linked In09 25 12Google For Linked In09 25 12
Google For Linked In09 25 12
 
Schema Design
Schema DesignSchema Design
Schema Design
 
How We Localize & Mobilize WP Sites - Pubcon 2013
How We Localize & Mobilize WP Sites - Pubcon 2013How We Localize & Mobilize WP Sites - Pubcon 2013
How We Localize & Mobilize WP Sites - Pubcon 2013
 
JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020JSON Data Modeling - GDG Indy - April 2020
JSON Data Modeling - GDG Indy - April 2020
 
Local SEO Schema
Local SEO SchemaLocal SEO Schema
Local SEO Schema
 
Peepcon schema presentation
Peepcon schema presentationPeepcon schema presentation
Peepcon schema presentation
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Schema Design by Gary Murakami

  • 1. Lead Engineer / Evangelist Gary J. Murakami, Ph.D. #MongoDB Schema Design
  • 2. Schema Design – Gary Murakami
  • 3. Schema Design – Gary Murakami Chess 4.5 (Northwestern University) Larry Atkin & Dave Slate
  • 4. Schema Design – Gary Murakami Agenda • What is a Record? • Core Concepts • What is an Entity? • Associating Entities • General Recommendations • Questions
  • 5. Schema Design – Gary Murakami All application development is Schema Design
  • 6. Schema Design – Gary Murakami Success comes from Proper Data Structure
  • 7. What is a Record?
  • 8. Schema Design – Gary Murakami Key → Value • One-dimensional • Single value is a blob • Query on key only • No schema • Value cannot be updated, only replaced Key Blob
  • 9. Schema Design – Gary Murakami Relational • Two-dimensional (tuples) • Each field is a single value • Query on any field • Very structured schema (table) • In-place updates * • Normalization requires many tables, joins, indexes, and poor data locality and performance Primary Key
  • 10. Schema Design – Gary Murakami Document • N-dimensional • Each field can contain 0, 1, many, or embedded values • Query on any field & level • Flexible schema • Inline updates * • Embedding related data has optimal data locality, requires fewer indexes, has better performance _id
  • 12. Schema Design – Gary Murakami Traditional Schema Design Focus on data storage
  • 13. Schema Design – Gary Murakami Document Schema Design Focus on data use
  • 14. Schema Design – Gary Murakami Another way to think about it Traditional: What answers do I have? Document: What questions do I have?
  • 15. Schema Design – Gary Murakami Three Building Blocks of Document Schema Design
  • 16. Schema Design – Gary Murakami 1 – Flexibility • Choices for schema design • Each record can have different fields • Field names consistent for programming • Common structure can be enforced by application • Easy to evolve as needed
  • 17. Schema Design – Gary Murakami 2 – Arrays Multiple Values per Field • Each field can be: – Absent – Set to null – Set to a single value – Set to an array of many values • Query for any matching value – Can be indexed and each value in the array is in the index
  • 18. Schema Design – Gary Murakami 3 - Embedded Documents • Any value can be a document • Nested documents provide structure • Query any field at any level – Can be indexed
  • 19. Schema Design – Gary Murakami Belle and Endgame tablebases Play chess with God – Ken Thompson
  • 20. What is an Entity?
  • 21. Schema Design – Gary Murakami An Entity • Object in your model • Associations with other entities Referencing (Relational) Embedding (Document) has_one embeds_one belongs_to embedded_in has_many embeds_many has_and_belongs_to_ma ny MongoDB has both referencing and embedding for universal coverage
  • 22. Schema Design – Gary Murakami Let's model something together How about a business card?
  • 23. Business Card Schema Design – Gary Murakami
  • 24. Contacts { “_id”: 2, “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “phone”: “408-996-1010”, “address_id”: 1 } Referencing Schema Design – Gary Murakami Addresses { “_id”: 1, “street”: “10260 Bandley Dr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”: ”95014”, “country”: “USA” }
  • 25. Contacts { “_id”: 2, “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “address”: { “street”: “10260 Bandley Dr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”: ”95014”, “country”: “USA” }, “phone”: “408-996-1010” } Embedding Schema Design – Gary Murakami
  • 26. Schema Design – Gary Murakami Relational Schema Contact • name • company • title • phone Address • street • city • state • zip_code
  • 27. Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • State • zip_code Schema Design – Gary Murakami Document Schema
  • 28. Schema Design – Gary Murakami How are they different? Why? Contact • name • company • title • phone Address • street • city • state • zip_code Contact • name • company • adress • Street • City • State • Zip • title • phone • address • street • city • state • zip_code
  • 29. { “name”: “Steven Jobs”, “title”: “VP, New Product Development”, “company”: “Apple Computer”, “address”: { “street”: “10260 Bandley Dr”, “city”: “Cupertino”, “state”: “CA”, “zip_code”: ”95014” }, “phone”: “408-996-1010” } Schema Flexibility Schema Design – Gary Murakami { “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “larry@google.com”, “address”: { “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” } “phone”: “650-618-1499”, “fax”: “650-330-0100” }
  • 30. Schema Design – Gary Murakami Longest “Database Endgame” Mate • Augment schema with meta data – Distance to mate (DTM) – Distance to conversion (DTC) • Retrograde analysis of DB • Longest checkmate – 6 piece – 262 moves, KRNKNN – 7 piece – 517 moves, so far • Completion by 2015
  • 32. Schema Design – Gary Murakami Let’s Look at an Address Book
  • 33. Schema Design – Gary Murakami Address Book • What questions do I have? • What are my entities? • What are my associations?
  • 34. Schema Design – Gary Murakami Address Book Entity- Relationship Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 36. Schema Design – Gary Murakami One to One Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 37. Schema Design – Gary Murakami One to One Schema Design Choices contact • twitter_id twitter1 1 contact twitter • contact_id1 1 Redundant to track relationship on both sides • Both references must be updated for consistency • Saves a fetch if no twitter Contact • twitter twitter 1
  • 38. Schema Design – Gary Murakami One to One General Recommendation • Full contact info all at once – Contact embeds twitter • Parent-child relationship – “contains” • No additional data duplication • Can query or index on embedded field – e.g., “twitter.name” Contact • twitter twitter 1
  • 39. Schema Design – Gary Murakami One to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 40. Schema Design – Gary Murakami One to Many Schema Design Choices contact • phone_ids: [ ] phone1 N contact phone • contact_id1 N Redundant to track relationship on both sides • Both references must be updated for consistency • Not possible in relational DBs • Saves a fetch if no phones Contact • phones phone N
  • 41. Schema Design – Gary Murakami One to Many General Recommendation • Full contact info all at once – Contact embeds multiple phones • Parent-children relationship – “contains” • No additional data duplication • Can query or index on any field – e.g., { “phones.type”: “mobile” } Contact • phones phone N
  • 42. Schema Design – Gary Murakami Many to Many Contacts • name • company • title Addresses • type • street • city • state • zip_code Phones • type • number Emails • type • address Thumbnail s • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 N N N 1 1 1 11 Twitters • name • location • web • bio 1 1
  • 43. Schema Design – Gary Murakami Many to Many Traditional Relational Association Join table Contacts • name • company • title • phone Groups • name GroupContacts • group_id • contact_id X Use arrays instead
  • 44. Schema Design – Gary Murakami Many to Many Schema Design Choices group • contact_ids: [ ] contactN N group contact • group_ids: [ ] N N Redundant to track relationship on both sides • Both references must be updated for consistency Redundant to track relationship on both sides • Duplicated data must be updated for consistency group • contacts contact N contact • groups group N
  • 45. Schema Design – Gary Murakami Many to Many General Recommendation • Depends on use case 1. Simple address book • Contact references groups 2. Corporate email groups • Group embeds contacts for performance group contact • group_ids: [ ] N N
  • 46. Schema Design – Gary Murakami Contacts • name • company • title addresses • type • street • city • state • zip_code phones • type • number emails • type • address thumbnail • mime_type • data Portraits • mime_type • data Groups • name N 1 N 1 twitter • name • location • web • bio N N N 1 1 Document model - holistic and efficient representation
  • 47. { “name” : “Gary J. Murakami, Ph.D.”, “company” : “10gen (the MongoDB) company”, “title” : “Lead Engineer and Ruby Evangelist”, “twitter” : { “name” : “GaryMurakami”, “location” : “New Providence, NJ”, “web” : “http://www.nobell.org” }, “portrait_id” : 1, “addresses” : [ { “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” : “10036” } ], “phones” : [ { “type” : “work”, “number” : “1-866-237-8815 x8015” } ], “emails” : [ { “type” : “work”, “address” : “gary.murakami@10gen.com” }, { “type” : “home”, “address” : “gjm@nobell.org” } ] } Contact document example Schema Design – Gary Murakami
  • 48. Schema Design – Gary Murakami Can We Solve Chess One Day? • Chess tablebase problem – Chess programs often play worse – Search is not localized, poor cache performance, seeks – Working set too large for memory • Endgame database size – big data – 5 piece: 7 GB compressed 75% • 157 MB Shredderbase – 1000x • 441 MB Shredderbase – 10,000x – 6 piece: 1.2 TB compressed – 7 piece: 70 TB estimated by 2015
  • 49. Schema Design – Gary Murakami Working Set 1. To reduce the working set – reference less-used data instead of embedding • extract into referenced child document – reference bulk data, e.g., portrait 2. To increase resources – read from secondaries in a replica set – use sharding
  • 51. Schema Design – Gary Murakami Embedding over Referencing • Embed – When “one” or “many” objects are viewed with their parent – For performance – For atomicity • Reference – When you need more scaling: max document size is 16MB – For easy “many to many” associations – For smaller parent documents and working set
  • 52. Schema Design – Gary Murakami Legacy Migration 1. Copy existing schema & some data to MongoDB 2. Iterate schema design 1. Measure performance and find bottlenecks 2. Denormalize by embedding 1. one to one associations first 2. one to many associations next 3. many to many associations last 3. Examine, measure and analyze, review concerns, scaling
  • 53. Schema Design – Gary Murakami New Application 1. Focus on your application 1. Requests 2. Responses 3. Business-domain model objects / data structures 2. Then persist language object data to MongoDB 1. Collections 2. Associations 3. Refactor for optimization and add indices
  • 54. Schema Design – Gary Murakami It’s All About Your Application • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs+Databases = (Big) Data Applications
  • 55. Schema Design – Gary Murakami It’s All About Your Application • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications • Play chess with God
  • 56. Schema Design – Gary Murakami It’s All About Your Application • Your schema is the impedance matcher – Design choices: normalize/denormalize, reference/embed – Melds programming with MongoDB for best of both – Flexible for development and change • Programs MongoDB = Great Big Data Applications • Play music with God – AAC
  • 57.
  • 58. Lead Engineer / Evangelist Gary J. Murakami, Ph.D. #MongoDB Questions? "His pattern indicates two-dimensional thinking.” - Spock Star Trek II: The Wrath of Khan www.3dchessfederation.com
  • 59. Thank you so much to our community who made An Evening with MongoDB Minneapolis possible: • David Hussman • Josh Kennedy • Matthew Chimento • Jeffrey Lemmerman • Dan Chamberlain • Christopher Rueber • Erin Newkirk Thank you DevJam for hosting our event!

Hinweis der Redaktion

  1. A long, long time ago in a state not to far from here, I was in high school.There I discovered the wonder of computer programming. I was on the chess team, …and on the wresting time. I ran laps as conditioning for wrestling, and to keep running, I dreamed up algorithms and data structures to play chess.The importance of data structures was confirmed to me at Northwestern University when I took a course that used Pascal and Niklaus Wirth’s book “Algorithms + Data Structures = Programs.”And such data structures could be used to program computers to play chess.Next slide – skip the followingAt the Illinois High School chess finals, I was astounded by my opponent. Fortunately, it was not by his play on the chess board, but by an extremely thick printout of his Tic-Tac-Toe program.It was one huge nested if statement exhaustively enumerated all of the possibilities.The complexity of this is illustrated in the diagram that shows the map for O – playing second – of optimal moves.I knew that the “program” an abuse of a programming language and a tree, and worse than a chess blunder, a travesty.An application without good Schema Design is a similar travesty.
  2. Chess 4.5 was a pioneering chess program in the 1970s.It was the first program to win a human chess tournament.I enjoyed playing against it at Northwestern, and I even played a rated chess game against the programmer Dave Slate.Chess 4.5 added a database of “book” openings that greatly improved the capability of the program.So the chess program melded algorithms, data structures, and a database to take on human chess masters.Could you do similar great things with good schema design?
  3. Perhaps you will have moments of insight where you say “Aha!”For those of you who say “Of course, I knew that,” may the truth resonate and grow.Some might disagree strongly with my general recommendations.May you all find the presentation interesting and thought provoking.And may it inspire enthusiasm in your schema design work for your applications.
  4. Schema Design is very important; its impact on your application is pervasive.
  5. Wrong data structure will hurt you.Proper data structure can make all the pieces fall into place.
  6. One-dimensional storage can be very fast but is limited with respect to querying.Speed is why key-value stores are popular for modern web applications.
  7. A record in a traditional relational DBs is atOOple or row in a table.This table representation forces normalization of your data.Normalization is good for querying anything that the data can answer, and it is good for new queries.Relational DBs won out over other DBs that came before.To me, the winning technology is that every field or value is first class,In essence, every field can be addressed in queries and can be indexed for faster responses.But normalizationrequires many tables, joins to rehydrate relations, indexes to make joins faster, and it results in poor data locality.For example, in order to represent an array, another table must be used just for that array.Slow performance is whyNoSQL alternatives are becoming popular.In-place updates * SQL storage may use “padding” space for dynamic strings instead of fixed allocation
  8. Document somewhat of a misnomer, not the Constitution or XML  object data (without methods) – often visualized as JSONInline updates * padding factor can reduce the need to move a documentThe essential capability (querying and indexing) persists and gets even better.The document structure can match your data structures – your schema.
  9. Answers  dataQuestions  applicationDoes your schema take advantage of your application-specific knowledge of known queries, use cases, and client-program data structures?Traditional DBs make it hard to take advantage of them.Document DBs make it easy to take advantage of them.MongoDB documents can match your application – given good schema design.
  10. Not “schema-less” but rather “flexible schema”Common structure can be enforced by applicationWhile MongoDB does not enforce common structure, neither does it restrict your applicationDocuments may have a common structure that is optionally extended at the document-levelUse this flexibility for class hierarchy with subclasses- Traditional relational representation requires separate tables- Work around with multiple mostly-empty columns- Example, three days for schema migrationKeywords: flexible, choice, evolve, change, modify
  11. The lack of multivalued fields is usually the first complaint of programmers that don’t wish to pay the cost for normalization.Concept of arrays incorporates multiple values and also associations involving many entities.Keywords: array, multiple, many
  12. Documents may have a common structure that is optionally extended at the document-level.The application mapping can enforce the required and optional fields. What could you do with these building blocks?Perhaps play chess, and beat human chess masters?
  13. Belle (picture on the left) was the first computer built for the sole purpose of chess playing.It wasdeveloped by former coworkers of mineJoe Condon and Ken Thompson at Bell Labs in the 1970s and 1980s.Ken is reknown for developing the Unix operating system in the C programming language.Bell officially became the first master-level machine in 1983 and dominated play throughout the 1980s.Ken used Belle extensively for pioneering research with chess endgame tablebase.Starting from all possible checkmates with 3 pieces, retrograde analysis was used to exhaustively calculate all possible positions with forced mates.Ken completed the endgame tablebase for up to five pieces and published it on CD-ROM.It represents years of compute time and is still available online under the caption “Play chess with God”Good Schema Design matched the endgame tablebase to live chess playing so that Belle could beat human chess masters.Let’s investigate good schema design for an application.
  14. “Vintage” business card
  15. Contact and Address entities areassociated one to one.Traditional relational association is via referencing.In this example, the contact record for Steve Jobs has a reference to his address via the address_id field.
  16. We’ve discussed Entities, Associations, Referencing, Embedding, and business cards, and we’ll build on that knowledge.Chess programmers have built on the endgame database with interesting results.
  17. Entity-Relational diagram
  18. Entity-Relational diagram for embedding documents
  19. Left – relational - requires either two fetches/queries (or a join in a relational DB)Right – document – requires only one fetch/query and has data locality
  20. We have discussed Entities, Associations, Referencing, Embedding, and business cards as sample data.We’ll build on that knowledge.Chess programmers have built on the endgame database with interesting results.
  21. Likewise for your application, use Schema Design and the flexible schema of MongoDB to empower your database analytics
  22. A common example will help us understand the joy of flexible document structure.
  23. Left: One to one We're going to assume users only have on Twitter account. A thumbnail is a small profile image while portrait is a very large profile image.Right: One to manyMiddle: Many to many
  24. Arrays of references are more direct than a join table and save a fetch.
  25. fundamentally not “contains”Concerns – exceptional casesExceeding maximum document size due to large data or scalingTransferring very large documents is probably a performance concernScaling may affect working set sizeSchema can be adjusted to improve performance- Fetch only the data that you need
  26. Embedding entities in the contact document reduces six fetches to one
  27. We’ve completed our address book example, but what about chess?
  28. Chess is not just an interesting challenge that raises philosophical questions about the intelligence of humans and computers.It is also a prime example of the effectiveness of algorithms plus data structures, plus good schema design for databases.And the endgame database has the challenges of big data and working set size that we face in our growing big data applications.
  29. To increase resources with MongoDBUse a replica set and read from secondariesUse sharding
  30. Embedding is a bit like pre-joined dataBSON (Binary JSON) document ops are easy for the serverChoose embedding by default as oppose to referencing.Embed (90/10 following rule of thumb)When the “one” or “many” objects are viewed in the context of their parentReference for easy consistency with “many to many” associations without duplicated dataReferencing is not just the default for relational DBs, there is no other choice.
  31. You no longer have to coerce your data into a form acceptable to a SQL database.You can now architect or tailor your data to your application in your programming language and persist it to MongoDB.
  32. May you build Great Big Data Applications.Perhaps you can say inspiring quotes like Ken Thompson, “Play chess with God.”
  33. Good news – giving power and control back to the programmer and the programming languageKen and I worked on Perceptual Audio Coding, better known as Advanced Audio Coding or AAC as found in the iPod and iPhone.So I hope that this will inspire you to“Play music with God”to build your killer app.How is this made possible?Here’s the technology in MongoDB that makes this all possible.
  34. BSON (Binary JSON) is the “magic” or core technology in MongoDB for data structures and performance.BSON does not have to be parsed like JSON, but is rather a format that can be traversed easily.