Schema Design by Gary Murakami

Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Schema Design

Schema Design – Gary Murakami

Chess 4.5 (Northwestern University)
Larry Atkin & Dave Slate

Agenda
• What is a Record?
• Core Concepts
• What is an Entity?
• Associating Entities
• General Recommendations
• Questions

All application development is
Schema Design

Success comes from
Proper Data
Structure

Key → Value
• One-dimensional
• Single value is a blob
• Query on key only
• No schema
• Value cannot be updated, only replaced
Key Blob

Relational
• Two-dimensional (tuples)
• Each field is a single value
• Query on any field
• Very structured schema (table)
• In-place updates *
• Normalization requires many tables, joins,
indexes, and poor data locality and
performance
Primary
Key

Document
• N-dimensional
• Each field can contain 0, 1,
many, or embedded values
• Query on any field & level
• Flexible schema
• Inline updates *
• Embedding related data has optimal data
locality, requires fewer indexes, has better
performance
_id

Traditional Schema Design
Focus on data storage

Document Schema Design
Focus on data use

Another way to think about it
Traditional:
What answers do I have?
Document:
What questions do I
have?

Three Building Blocks of
Document Schema
Design

1 – Flexibility
• Choices for schema design
• Each record can have different fields
• Field names consistent for programming
• Common structure can be enforced by
application
• Easy to evolve as needed

2 – Arrays
Multiple Values per Field
• Each field can be:
– Absent
– Set to null
– Set to a single value
– Set to an array of many values
• Query for any matching value
– Can be indexed and each value in the array is in the
index

3 - Embedded Documents
• Any value can be a document
• Nested documents provide structure
• Query any field at any level
– Can be indexed

Belle and Endgame tablebases
Play chess with God – Ken
Thompson

An Entity
• Object in your model
• Associations with other entities
Referencing (Relational) Embedding (Document)
has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_ma
ny
MongoDB has both referencing and embedding for universal
coverage

Let's model something
together
How about a business
card?

Business Card

Contacts
{
“_id”: 2,
“name”: “Steven Jobs”,
“title”: “VP, New Product Development”,
“company”: “Apple Computer”,
“phone”: “408-996-1010”,
“address_id”: 1
}
Referencing
Addresses
{
“_id”: 1,
“street”: “10260 Bandley Dr”,
“city”: “Cupertino”,
“state”: “CA”,
“zip_code”: ”95014”,
“country”: “USA”
}

Contacts
{
“_id”: 2,
“address”: {
“zip_code”: ”95014”,
“country”: “USA”
},
“phone”: “408-996-1010”
}
Embedding

Relational Schema
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code

Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• State
• zip_code
Document Schema

How are they different? Why?
Contact
• name
• company
• title
• phone
Address
• street
• city
• state
• zip_code
Contact
• name
• company
• adress
• Street
• City
• State
• Zip
• title
• phone
• address
• street
• city
• state
• zip_code

{
“address”: {
“zip_code”: ”95014”
},
“phone”: “408-996-1010”
}
Schema Flexibility
{
“name”: “Larry Page”,
“url”: “http://google.com/”,
“title”: “CEO”,
“company”: “Google!”,
“email”: “larry@google.com”,
“address”: {
“street”: “555 Bryant, #106”,
“city”: “Palo Alto”,
“zip_code”: “94301”
}
“phone”: “650-618-1499”,
“fax”: “650-330-0100”
}

Longest “Database Endgame”
Mate
• Augment schema with meta data
– Distance to mate (DTM)
– Distance to conversion (DTC)
• Retrograde analysis of DB
• Longest checkmate
– 6 piece – 262 moves, KRNKNN
– 7 piece – 517 moves, so far
• Completion by 2015

Let’s Look at an
Address Book

Address Book
• What questions do I have?
• What are my entities?
• What are my associations?

Address Book Entity-
Relationship
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1

One to One
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1

One to One
Schema Design Choices
contact
• twitter_id
twitter1 1
contact twitter
• contact_id1 1
Redundant to track relationship on both sides
• Both references must be updated for consistency
• Saves a fetch if no twitter
Contact
• twitter
twitter 1

One to One
General Recommendation
• Full contact info all at once
– Contact embeds twitter
• Parent-child relationship
– “contains”
• No additional data duplication
• Can query or index on embedded field
– e.g., “twitter.name”
Contact
• twitter
twitter 1

One to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1

One to Many
contact
• phone_ids: [ ]
phone1 N
contact phone
• contact_id1 N
Redundant to track relationship on both sides
• Both references must be updated for consistency
• Not possible in relational DBs
• Saves a fetch if no phones
Contact
• phones
phone N

One to Many
• Full contact info all at once
– Contact embeds multiple phones
• Parent-children relationship
– “contains”
• No additional data duplication
• Can query or index on any field
– e.g., { “phones.type”: “mobile” }
Contact
• phones
phone N

Many to Many
Contacts
• name
• company
• title
Addresses
• type
• street
• city
• state
• zip_code
Phones
• type
• number
Emails
• type
• address
Thumbnail
s
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
N
N
N
1
1
1
11
Twitters
• name
• location
• web
• bio
1
1

Many to Many
Traditional Relational Association
Join table
Contacts
• name
• company
• title
• phone
Groups
• name
GroupContacts
• group_id
• contact_id
X
Use arrays instead

Many to Many
group
• contact_ids: [ ]
contactN N
group
contact
• group_ids: [
]
N N
Redundant to track
relationship on both sides
• Both references must be
updated for consistency
Redundant to track
relationship on both sides
• Duplicated data must be
updated for consistency
group
• contacts
contact
N
contact
• groups
group
N

Many to Many
• Depends on use case
1. Simple address book
• Contact references groups
2. Corporate email groups
• Group embeds contacts for performance
group
contact
• group_ids: [
]
N N

Contacts
• name
• company
• title
addresses
• type
• street
• city
• state
• zip_code
phones
• type
• number
emails
• type
• address
thumbnail
• mime_type
• data
Portraits
• mime_type
• data
Groups
• name
N
1
N
1
twitter
• name
• location
• web
• bio
N
N
N
1
1
Document model - holistic and efficient representation

{
“name” : “Gary J. Murakami, Ph.D.”,
“company” : “10gen (the MongoDB) company”,
“title” : “Lead Engineer and Ruby Evangelist”,
“twitter” : {
“name” : “GaryMurakami”, “location” : “New Providence, NJ”,
“web” : “http://www.nobell.org”
},
“portrait_id” : 1,
“addresses” : [
{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” :
“10036” }
],
“phones” : [
{ “type” : “work”, “number” : “1-866-237-8815 x8015” }
],
“emails” : [
{ “type” : “work”, “address” : “gary.murakami@10gen.com” },
{ “type” : “home”, “address” : “gjm@nobell.org” }
]
}
Contact document example

Can We Solve Chess
One Day?
• Chess tablebase problem
– Chess programs often play worse
– Search is not localized, poor cache performance, seeks
– Working set too large for memory
• Endgame database size – big data
– 5 piece: 7 GB compressed 75%
• 157 MB Shredderbase – 1000x
• 441 MB Shredderbase – 10,000x
– 6 piece: 1.2 TB compressed
– 7 piece: 70 TB estimated by 2015

Working Set
1. To reduce the working set
– reference less-used data instead of embedding
• extract into referenced child document
– reference bulk data, e.g., portrait
2. To increase resources
– read from secondaries in a replica set
– use sharding

Embedding over Referencing
• Embed
– When “one” or “many” objects are viewed with their parent
– For performance
– For atomicity
• Reference
– When you need more scaling: max document size is
16MB
– For easy “many to many” associations
– For smaller parent documents and working set

Legacy Migration
1. Copy existing schema & some data to
MongoDB
2. Iterate schema design
1. Measure performance and find bottlenecks
2. Denormalize by embedding
1. one to one associations first
2. one to many associations next
3. many to many associations last
3. Examine, measure and analyze, review concerns,
scaling

New Application
1. Focus on your application
1. Requests
2. Responses
3. Business-domain model objects / data structures
2. Then persist language object data to
MongoDB
1. Collections
2. Associations
3. Refactor for optimization and add indices

It’s All About Your
Application
• Your schema is the impedance matcher
– Design choices: normalize/denormalize,
reference/embed
– Melds programming with MongoDB for best of both
– Flexible for development and change
• Programs+Databases = (Big) Data Applications

Application
reference/embed
• Programs MongoDB = Great Big Data
Applications
• Play chess with God

Application
reference/embed
• Programs MongoDB = Great Big Data
Applications
• Play music with God – AAC

Lead Engineer / Evangelist
Gary J. Murakami, Ph.D.
#MongoDB
Questions?
"His pattern indicates
two-dimensional thinking.”
- Spock
Star Trek II: The Wrath of Khan
www.3dchessfederation.com

Thank you so much to our community who
made An Evening with MongoDB Minneapolis
possible:
• David Hussman
• Josh Kennedy
• Matthew Chimento
• Jeffrey Lemmerman
• Dan Chamberlain
• Christopher Rueber
• Erin Newkirk
Thank you DevJam for hosting our event!

Schema Design by Gary Murakami

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Ähnlich wie Schema Design by Gary Murakami

Ähnlich wie Schema Design by Gary Murakami (20)

Mehr von MongoDB

Mehr von MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Schema Design by Gary Murakami

Hinweis der Redaktion