The hardest part of moving from a tabular database world to a modern world of objects and JSON is how to model your data. This year at OSN, Matt from MongoDB will take data modeling one step further than prior years and focus specifically on advanced schema design patterns to optimize the ease-of-use and performance of your data access layer and application.
5. # O S N 2 0 1 8
Why MongoDB?
Best way to
work with data
Intelligently put data
where you need it
Freedom
to run anywhere
Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform
6. # O S N 2 0 1 8
Best way to work with data
Easy: Work with data in a natural,
intuitive way
Flexible: Adapt and make
changes quickly
Fast: Get great performance
with less code
Versatile: Supports a wide
variety of data models and
queries
7. # O S N 2 0 1 8
Easy & Versatile - Rich Query
Functionality MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd St. and
6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without rewriting
the document
• Update just 2 phone numbers out of 10
• Sort on the modified date
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Joins ($lookup)
• Query for all San Francisco residences, lookup their transactions, and
sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
8. # O S N 2 0 1 8
Intelligently put data where you need it
Ability to run both operational &
analytics workloads on same cluster,
for timely insight and lower cost
Workload Isolation
Elastic horizontal scalability -
add/remove capacity dynamically
without downtime
Scalability
Declare data locality rules for
governance (e.g. data sovereignty), tiers of
service & local low latency access
Locality
Built-in multi-geography high
availability, replication & automated
failover
Highly Availability
9. # O S N 2 0 1 8
Freedom to run anywhere
Local
On-premises
Server & Mainframe Private cloud
Fully managed cloud service
Hybrid cloud Public cloud
● Database that runs the same everywhere
● Leverage the benefits of a multi-cloud strategy
● Global coverage
● Avoid lock-in
Convenience: same codebase, same APIs, same tools, wherever you run
10. # O S N 2 0 1 8
MongoDB Atlas: Database as a service
mongodb.com/atlas
Self-service and elastic
• Deploy in minutes
• Scale up/down without
downtime
• Automated upgrades
Global and highly available
• 52 Regions worldwide
• Replica sets optimized for
availability
• Cross-region replication
Secure by default
• Network isolation and Peering
• Encryption in flight and at rest
• Role-based access control
• SOC 2 Type 1 / Privacy Shield
Comprehensive Monitoring
• Performance Advisor
• Dashboards w/ 100+ metrics
• Real Time Performance
• Customizable alerting
Managed Backup
• Point in Time Restore
• Queryable backups
• Consistent snapshots
Cloud Agnostic
• AWS, Azure, and GCP
• Easy migrations
• Consistent experience
11. # O S N 2 0 1 8
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
Enterprise Advanced for Self-Managed
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
LDAP & Kerberos Auditing
In-Memory
Storage Engine
Encryption at Rest
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
24x7Support
(1hourSLA)
13. # O S N 2 0 1 8
• 10 years with the document
model
• Use of a common
methodology and
vocabulary when designing
schemas for MongoDB
• Ability to model schemas
using building blocks
• Less art and more
methodology
Why this Talk?
14. # O S N 2 0 1 8
Ensure:
• Good performance &
scalability
• Fast development
despite constraints
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write (ACID GA soon)
• Data set
• Size of data
Why do we Create Models?
16. # O S N 2 0 1 8
World Movie Database (WMDB)
- Logical Data Model
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
17. # O S N 2 0 1 8
• Frequency of Access
• Subset ✔️
• Approximation
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket ✔️
• Outlier
• Representation
• Entity ✔️
• Document Versioning
✔️
• Schema Versioning ✔️
• Mixed Attributes
• Tree
• Polymorphism
18. # O S N 2 0 1 8
Problem:
• How to get started modeling data in MongoDB, not as a relational
model
• Logical model is spread across tables
• Today’s languages used OOP and JSON
• Hard to use and worse performance spreading across tables
Use cases:
• Most every operational application with modern languages
• Also applicable to analytics environments
Issue #1 – How to Model Data in Documents
19. # O S N 2 0 1 8
Solution:
• Simply store data in the objects or JSON used in the
application/service
Benefits:
• Faster development
• Faster performance
• Easier to partition and scale
Pattern #1 - Entity
20. # O S N 2 0 1 8
Logical Model to Documents
Typically map to objects & JSON
3 collections:
A. movies
B. moviegoers
C. screenings
22. # O S N 2 0 1 8
Possible solutions:
A. Reduce the size of your working set (no extra cost!)
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in RAM
23. # O S N 2 0 1 8
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
24. # O S N 2 0 1 8
Problem:
• There are 1-N or N-N relationships, and only a few fields or
documents that always need to be shown
• Only infrequently do you need to pull all of the related data
Use cases:
• Main actors of a movie
• List of reviews or comments
Generalizing the Subset Pattern
25. # O S N 2 0 1 8
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
26. # O S N 2 0 1 8
• How duplication is handled
A. Update both source and target in real time from application (optional:
Txn)
B. Use Change Streams to subscribe to change and async update the
target
C. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Implementation Reality of Patterns:
Consistency
27. # O S N 2 0 1 8
Change Streams For Sync and Real-Time
Apps
ChangeStreamsAPI
Business
Apps
User Data
Sensors
Clickstream
Real-Time
Event Notifications
Message Queue
Syncing with other
collections/microservices
28. # O S N 2 0 1 8
• CPU is on fire!
Issue #3: High CPU Usage
29. # O S N 2 0 1 8
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations
30. # O S N 2 0 1 8
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
31. # O S N 2 0 1 8
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
32. # O S N 2 0 1 8
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution
33. # O S N 2 0 1 8
• How to quickly change schemas over time with new
requirements?
• How to know what fields are in the results?
Issue #4: Need to change the fields in the
documents
34. # O S N 2 0 1 8
Problem:
• Updating the schema of a collection or database is:
• Not atomic
• Long operation
• Is not necessary, as there is not one schema as in RDBMSs
• May not want to update all documents, only do it going forward
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
35. # O S N 2 0 1 8
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern – Solution
36. # O S N 2 0 1 8
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Always have the option to
loop through and update all
docs but not forced to
Pattern #4: Schema Versioning
37. # O S N 2 0 1 8
• Updating data in place can be seen as deleting previous version
• Regulated industries often require an audit trail for X years
• Insight can be gleaned from measuring changing data (e.g. claims
processing, code check-ins, etc.)
• Many possible approaches here
Issue #5: Need to track and query current
and previous versions of documents
38. # O S N 2 0 1 8
Problem:
• Should we track field-level changes or entire documents?
• Consider how to handle consistency requirements during changes
Use cases:
• Most apps storing business transactions
• Any data useful to see over time
Pattern #5: Document Versioning
39. # O S N 2 0 1 8
Solution:
• Ultimately dependent on the situation
• But 2 main approaches are most common
• Tracking a few updates in one document
• Separate collections for latest and for historical changes
Benefits:
• First option saves on disk space
• Second option gives good performance no matter how many
changes
Document Versioning Pattern – Solution
40. # O S N 2 0 1 8
Have an array of
previous values that
were changed
Compare-and-swap
(on version) for
thread-safe update
to the document
If Few Changes
Movie
{
_id: 100,
current: {
v: 3, name: “Best Movie Ever”, budget: 450, actual: 450
},
prev: [
{v: 1, name: “OK Movie”, budget: 450},
{v: 2, name: “Good Movie”, actual: 400}
]
}
41. # O S N 2 0 1 8
Unbounded Numbers of Changes
Current Collection
{
_id: 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actualBudget: 450
}
History Collection
{
movieId: 100,
v: 1,
name: “OK Movie”,
budget: 450,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 2,
name: “Good Movie”,
budget: 450,
actual: 400,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actual: 450,
t: Date(“2018-06-01…”)
}
42. # O S N 2 0 1 8
• It is known that a series of items are often read/written together
• E.g. last month’s transactions, 100 device samples, prices for an
hour
• Often would store each item in a separate record in RDBMSs
• With arrays in documents, have the option of storing many items
together
Issue #6: Poor Performance
Reading/Writing a Series of Many Items
43. # O S N 2 0 1 8
Problem:
• Do we know a series of items will be access together and not
randomly?
• Should we store a document per item, like with RDBMSs?
• How to balance write vs. read performance?
Use cases:
• Transactions: orders, claims, payments, etc.
• Time series: IoT, market data, tweets, reviews, comments, etc.
• Often used for analytics and reporting
Pattern #6: Bucket Pattern
44. # O S N 2 0 1 8
Solution:
• Store as an array of items in a document (a certain number or
time window)
• Often each item is written by itself, and then rolled into the bucket
asynchronously for high performance reading
• Retainment period can be different for item vs. the bucket
Benefits:
• Reads are many times faster (easily 10x or more)
• Also often saves on disk space as field names are stored less
times
Bucket Pattern – Solution
45. # O S N 2 0 1 8
• Likely need to
write each
item in case
of app failure
(short
retainment)
• Async write
the buckets
• Might keep
buckets
longer than
raw items
Storing Buckets and Optionally
Each Item
Screening
{
_id: 200,
location: “135 W. 34th St., NYC”,
date: Date(“2018-06-01 5:00PM”),
numViewers: 500,
revenues: 5000
}
ScreeningBucket
{ _id: 2000,
movieId: 100,
metro: “New York”,
day: Date(“2018-06-01”),
numViewers: 50000,
...,
screenings: [
{id: 200, t: “5:00”, v: 500},
{id: 201, t: “7:30”, v: 1500},
]
}
46. # O S N 2 0 1 8
Lambda Architecture Helps Balance
Reads/Writes App Writes
Data
Async Processing
(change stream or
periodic batch)
Each Item (MongoDB)
Buckets of Items in MongoDB
Queries
Message Queue
And/Or
48. # O S N 2 0 1 8
What our Patterns did for us
Problem Pattern
How to model data in documents Entity
Using too much RAM Subset
Using too much CPU Computed
No downtime to upgrade schema Schema Versioning
How to track previous versions Document Versioning
How to improve performance of series of
data
Bucket
49. # O S N 2 0 1 8
• Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable
fields
• Approximation* – reducing frequency of calculations with approximate values
• Extended Reference – detailed data stored in separate collection for lookup on drill down
• Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively
traverse
• Polymorphism – each document represents an item, but each item can have different fields
(e.g. product catalog)
• Outlier* - avoid having a few documents drive the design, and impact performance for all
* = covered in other presentations on Mongodb.com
Other Patterns
50. # O S N 2 0 1 8
A. Simple grouping from tables to collections is often not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways
51. # O S N 2 0 1 8
• Previous webinar I extended covers 3 different patterns
https://www.mongodb.com/presentations/advanced-schema-design-patterns
• MongoDB in-person training courses on Schema Design
• MongoDB University
https://university.mongodb.com
• M001: MongoDB Basics
• (Upcoming) M220: Data Modeling
How Can I Learn More About Schema
Design?
52. # O S N 2 0 1 8
For More Information About MongoDB
Resource Location
Public Atlas DBaaS mongodb.com/cloud/atlas
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.com
MongoDB Downloads mongodb.com/download
53. # M D B l o c a l
Thank You for using MongoDB !