SlideShare ist ein Scribd-Unternehmen logo
1 von 53
J U N E 1 4 , 2 0 1 8 | T W I N C I T I E S
# O S N 2 0 1 8
Advanced Schema
Design Patterns
# O S N 2 0 1 8
{ “name”: ”Matt Kalan",
“titles”: [ “Master Solution Architect”,
“Enterprise Architect”],
“location” : "Minneapolis, MN",
“yearsAtMDB” : 5.5,
“contactInfo” : {
“email”: : “matt.kalan@mongodb.com”,
“twitter” : ["@MatthewKalan", "@MongoDB"],
“linkedIn” : ["mkalan", "MongoDB"]
}
}
Who Am I?
# O S N 2 0 1 8
• Quick MongoDB overview
• Review of each Schema Design Pattern
• Patterns we couldn’t get to
• Q&A (and throughout)
Agenda
# O S N 2 0 1 8
Quick MongoDB Overview
# O S N 2 0 1 8
Why MongoDB?
Best way to
work with data
Intelligently put data
where you need it
Freedom
to run anywhere
Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform
# O S N 2 0 1 8
Best way to work with data
Easy: Work with data in a natural,
intuitive way
Flexible: Adapt and make
changes quickly
Fast: Get great performance
with less code
Versatile: Supports a wide
variety of data models and
queries
# O S N 2 0 1 8
Easy & Versatile - Rich Query
Functionality MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd St. and
6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without rewriting
the document
• Update just 2 phone numbers out of 10
• Sort on the modified date
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Joins ($lookup)
• Query for all San Francisco residences, lookup their transactions, and
sum the amount by person
Graph queries
($graphLookup)
• Query for all people within 3 degrees of separation from Mark
# O S N 2 0 1 8
Intelligently put data where you need it
Ability to run both operational &
analytics workloads on same cluster,
for timely insight and lower cost
Workload Isolation
Elastic horizontal scalability -
add/remove capacity dynamically
without downtime
Scalability
Declare data locality rules for
governance (e.g. data sovereignty), tiers of
service & local low latency access
Locality
Built-in multi-geography high
availability, replication & automated
failover
Highly Availability
# O S N 2 0 1 8
Freedom to run anywhere
Local
On-premises
Server & Mainframe Private cloud
Fully managed cloud service
Hybrid cloud Public cloud
● Database that runs the same everywhere
● Leverage the benefits of a multi-cloud strategy
● Global coverage
● Avoid lock-in
Convenience: same codebase, same APIs, same tools, wherever you run
# O S N 2 0 1 8
MongoDB Atlas: Database as a service
mongodb.com/atlas
Self-service and elastic
• Deploy in minutes
• Scale up/down without
downtime
• Automated upgrades
Global and highly available
• 52 Regions worldwide
• Replica sets optimized for
availability
• Cross-region replication
Secure by default
• Network isolation and Peering
• Encryption in flight and at rest
• Role-based access control
• SOC 2 Type 1 / Privacy Shield
Comprehensive Monitoring
• Performance Advisor
• Dashboards w/ 100+ metrics
• Real Time Performance
• Customizable alerting
Managed Backup
• Point in Time Restore
• Queryable backups
• Consistent snapshots
Cloud Agnostic
• AWS, Azure, and GCP
• Easy migrations
• Consistent experience
# O S N 2 0 1 8
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
Enterprise Advanced for Self-Managed
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
LDAP & Kerberos Auditing
In-Memory
Storage Engine
Encryption at Rest
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
24x7Support
(1hourSLA)
# O S N 2 0 1 8
Schema Design Patterns
# O S N 2 0 1 8
• 10 years with the document
model
• Use of a common
methodology and
vocabulary when designing
schemas for MongoDB
• Ability to model schemas
using building blocks
• Less art and more
methodology
Why this Talk?
# O S N 2 0 1 8
Ensure:
• Good performance &
scalability
• Fast development
despite constraints
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write (ACID GA soon)
• Data set
• Size of data
Why do we Create Models?
# O S N 2 0 1 8
However, Don't Over Design!
# O S N 2 0 1 8
World Movie Database (WMDB)
- Logical Data Model
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
# O S N 2 0 1 8
• Frequency of Access
• Subset ✔️
• Approximation
• Extended Reference
Patterns by Category
• Grouping
• Computed ✔️
• Bucket ✔️
• Outlier
• Representation
• Entity ✔️
• Document Versioning
✔️
• Schema Versioning ✔️
• Mixed Attributes
• Tree
• Polymorphism
# O S N 2 0 1 8
Problem:
• How to get started modeling data in MongoDB, not as a relational
model
• Logical model is spread across tables
• Today’s languages used OOP and JSON
• Hard to use and worse performance spreading across tables
Use cases:
• Most every operational application with modern languages
• Also applicable to analytics environments
Issue #1 – How to Model Data in Documents
# O S N 2 0 1 8
Solution:
• Simply store data in the objects or JSON used in the
application/service
Benefits:
• Faster development
• Faster performance
• Easier to partition and scale
Pattern #1 - Entity
# O S N 2 0 1 8
Logical Model to Documents
Typically map to objects & JSON
3 collections:
A. movies
B. moviegoers
C. screenings
# O S N 2 0 1 8
Moviegoer
{
_id: 1,
...
viewings: [
{m: 100, d: 2016-05-24}
{m: 200, d: 2017-03-18}
],
ratings: [
{m: 100, v: 3, c: “great“}
]
}
3 Main Entities
Movie
{
_id: 100,
name: “Best Movie Ever”,
castAndCrew: [
{fn: “Joe”, ln: Smith, …}
… ],
reviews: [
{d: 2018-05-25, r: “awful”, …}
… ],
quotes: […]
}
Screening
{
_id: 200,
movieId: 100
location: “NYC”,
numViewers: 500,
revenues: 100,000
}
# O S N 2 0 1 8
Possible solutions:
A. Reduce the size of your working set (no extra cost!)
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set Doesn’t Fit in RAM
# O S N 2 0 1 8
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
# O S N 2 0 1 8
Problem:
• There are 1-N or N-N relationships, and only a few fields or
documents that always need to be shown
• Only infrequently do you need to pull all of the related data
Use cases:
• Main actors of a movie
• List of reviews or comments
Generalizing the Subset Pattern
# O S N 2 0 1 8
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
# O S N 2 0 1 8
• How duplication is handled
A. Update both source and target in real time from application (optional:
Txn)
B. Use Change Streams to subscribe to change and async update the
target
C. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Implementation Reality of Patterns:
Consistency
# O S N 2 0 1 8
Change Streams For Sync and Real-Time
Apps
ChangeStreamsAPI
Business
Apps
User Data
Sensors
Clickstream
Real-Time
Event Notifications
Message Queue
Syncing with other
collections/microservices
# O S N 2 0 1 8
• CPU is on fire!
Issue #3: High CPU Usage
# O S N 2 0 1 8
{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations
# O S N 2 0 1 8
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
# O S N 2 0 1 8
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
# O S N 2 0 1 8
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
Computed Pattern - Solution
# O S N 2 0 1 8
• How to quickly change schemas over time with new
requirements?
• How to know what fields are in the results?
Issue #4: Need to change the fields in the
documents
# O S N 2 0 1 8
Problem:
• Updating the schema of a collection or database is:
• Not atomic
• Long operation
• Is not necessary, as there is not one schema as in RDBMSs
• May not want to update all documents, only do it going forward
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
# O S N 2 0 1 8
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern – Solution
# O S N 2 0 1 8
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Always have the option to
loop through and update all
docs but not forced to
Pattern #4: Schema Versioning
# O S N 2 0 1 8
• Updating data in place can be seen as deleting previous version
• Regulated industries often require an audit trail for X years
• Insight can be gleaned from measuring changing data (e.g. claims
processing, code check-ins, etc.)
• Many possible approaches here
Issue #5: Need to track and query current
and previous versions of documents
# O S N 2 0 1 8
Problem:
• Should we track field-level changes or entire documents?
• Consider how to handle consistency requirements during changes
Use cases:
• Most apps storing business transactions
• Any data useful to see over time
Pattern #5: Document Versioning
# O S N 2 0 1 8
Solution:
• Ultimately dependent on the situation
• But 2 main approaches are most common
• Tracking a few updates in one document
• Separate collections for latest and for historical changes
Benefits:
• First option saves on disk space
• Second option gives good performance no matter how many
changes
Document Versioning Pattern – Solution
# O S N 2 0 1 8
Have an array of
previous values that
were changed
Compare-and-swap
(on version) for
thread-safe update
to the document
If Few Changes
Movie
{
_id: 100,
current: {
v: 3, name: “Best Movie Ever”, budget: 450, actual: 450
},
prev: [
{v: 1, name: “OK Movie”, budget: 450},
{v: 2, name: “Good Movie”, actual: 400}
]
}
# O S N 2 0 1 8
Unbounded Numbers of Changes
Current Collection
{
_id: 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actualBudget: 450
}
History Collection
{
movieId: 100,
v: 1,
name: “OK Movie”,
budget: 450,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 2,
name: “Good Movie”,
budget: 450,
actual: 400,
t: Date(“2018-06-01…”)
}
History Collection
{
movieId : 100,
v: 3,
name: “Best Movie Ever”,
budget: 450,
actual: 450,
t: Date(“2018-06-01…”)
}
# O S N 2 0 1 8
• It is known that a series of items are often read/written together
• E.g. last month’s transactions, 100 device samples, prices for an
hour
• Often would store each item in a separate record in RDBMSs
• With arrays in documents, have the option of storing many items
together
Issue #6: Poor Performance
Reading/Writing a Series of Many Items
# O S N 2 0 1 8
Problem:
• Do we know a series of items will be access together and not
randomly?
• Should we store a document per item, like with RDBMSs?
• How to balance write vs. read performance?
Use cases:
• Transactions: orders, claims, payments, etc.
• Time series: IoT, market data, tweets, reviews, comments, etc.
• Often used for analytics and reporting
Pattern #6: Bucket Pattern
# O S N 2 0 1 8
Solution:
• Store as an array of items in a document (a certain number or
time window)
• Often each item is written by itself, and then rolled into the bucket
asynchronously for high performance reading
• Retainment period can be different for item vs. the bucket
Benefits:
• Reads are many times faster (easily 10x or more)
• Also often saves on disk space as field names are stored less
times
Bucket Pattern – Solution
# O S N 2 0 1 8
• Likely need to
write each
item in case
of app failure
(short
retainment)
• Async write
the buckets
• Might keep
buckets
longer than
raw items
Storing Buckets and Optionally
Each Item
Screening
{
_id: 200,
location: “135 W. 34th St., NYC”,
date: Date(“2018-06-01 5:00PM”),
numViewers: 500,
revenues: 5000
}
ScreeningBucket
{ _id: 2000,
movieId: 100,
metro: “New York”,
day: Date(“2018-06-01”),
numViewers: 50000,
...,
screenings: [
{id: 200, t: “5:00”, v: 500},
{id: 201, t: “7:30”, v: 1500},
]
}
# O S N 2 0 1 8
Lambda Architecture Helps Balance
Reads/Writes App Writes
Data
Async Processing
(change stream or
periodic batch)
Each Item (MongoDB)
Buckets of Items in MongoDB
Queries
Message Queue
And/Or
# O S N 2 0 1 8
Extremely Common with Time Series &
IoT
SensorSample
{
_id: 200,
loc: {
type: “Point”,
coordinates: [-93, 45] },
date: Date(“2018-06-01 5:00PM”),
temp: 54
}
SampleBucket
{ _id: 2000,
loc: {
type: “Point”,
coordinates: [-93, 45] },
startTime: Date(“2018-06-01 5:00PM”),
endTime: Date(“2018-06-01 6:00PM”),
minTemp: 50, maxTemp: 60, ...,
samples: [
{t: Date(“2018-06-01 5:00PM”), v: 51.5},
{t: Date(“2018-06-01 5:01PM”), v: 52},
...
]
}
# O S N 2 0 1 8
What our Patterns did for us
Problem Pattern
How to model data in documents Entity
Using too much RAM Subset
Using too much CPU Computed
No downtime to upgrade schema Schema Versioning
How to track previous versions Document Versioning
How to improve performance of series of
data
Bucket
# O S N 2 0 1 8
• Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable
fields
• Approximation* – reducing frequency of calculations with approximate values
• Extended Reference – detailed data stored in separate collection for lookup on drill down
• Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively
traverse
• Polymorphism – each document represents an item, but each item can have different fields
(e.g. product catalog)
• Outlier* - avoid having a few documents drive the design, and impact performance for all
* = covered in other presentations on Mongodb.com
Other Patterns
# O S N 2 0 1 8
A. Simple grouping from tables to collections is often not optimal
B. Learn a common vocabulary for designing schemas with MongoDB
C. Use patterns as "plug-and-play" to improve performance
Take Aways
# O S N 2 0 1 8
• Previous webinar I extended covers 3 different patterns
https://www.mongodb.com/presentations/advanced-schema-design-patterns
• MongoDB in-person training courses on Schema Design
• MongoDB University
https://university.mongodb.com
• M001: MongoDB Basics
• (Upcoming) M220: Data Modeling
How Can I Learn More About Schema
Design?
# O S N 2 0 1 8
For More Information About MongoDB
Resource Location
Public Atlas DBaaS mongodb.com/cloud/atlas
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.com
MongoDB Downloads mongodb.com/download
# M D B l o c a l
Thank You for using MongoDB !

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNodeXperts
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases MongoDB
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to ShardingMongoDB
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDBCésar Trigo
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsLewis Lin 🦊
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMike Friedman
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 

Was ist angesagt? (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Common MongoDB Use Cases
Common MongoDB Use Cases Common MongoDB Use Cases
Common MongoDB Use Cases
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
MongoDB
MongoDBMongoDB
MongoDB
 
MongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance ImplicationsMongoDB Schema Design and its Performance Implications
MongoDB Schema Design and its Performance Implications
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Document Database
Document DatabaseDocument Database
Document Database
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 

Ähnlich wie Advanced MongoDB Schema Design Patterns for OSN 2018

L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneMongoDB
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-aprMongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design PatternsMongoDB
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, whenEugenio Minardi
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learnconfluent
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldRandy Shoup
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsSam_Francis
 

Ähnlich wie Advanced MongoDB Schema Design Patterns for OSN 2018 (20)

L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-apr
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
 
Advanced Schema Design Patterns
Advanced Schema Design PatternsAdvanced Schema Design Patterns
Advanced Schema Design Patterns
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
CDC to the Max!
CDC to the Max!CDC to the Max!
CDC to the Max!
 
Effective Microservices In a Data-centric World
Effective Microservices In a Data-centric WorldEffective Microservices In a Data-centric World
Effective Microservices In a Data-centric World
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 

Kürzlich hochgeladen

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Kürzlich hochgeladen (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

Advanced MongoDB Schema Design Patterns for OSN 2018

  • 1. J U N E 1 4 , 2 0 1 8 | T W I N C I T I E S # O S N 2 0 1 8 Advanced Schema Design Patterns
  • 2. # O S N 2 0 1 8 { “name”: ”Matt Kalan", “titles”: [ “Master Solution Architect”, “Enterprise Architect”], “location” : "Minneapolis, MN", “yearsAtMDB” : 5.5, “contactInfo” : { “email”: : “matt.kalan@mongodb.com”, “twitter” : ["@MatthewKalan", "@MongoDB"], “linkedIn” : ["mkalan", "MongoDB"] } } Who Am I?
  • 3. # O S N 2 0 1 8 • Quick MongoDB overview • Review of each Schema Design Pattern • Patterns we couldn’t get to • Q&A (and throughout) Agenda
  • 4. # O S N 2 0 1 8 Quick MongoDB Overview
  • 5. # O S N 2 0 1 8 Why MongoDB? Best way to work with data Intelligently put data where you need it Freedom to run anywhere Intelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data PlatformIntelligent Operational Data Platform
  • 6. # O S N 2 0 1 8 Best way to work with data Easy: Work with data in a natural, intuitive way Flexible: Adapt and make changes quickly Fast: Get great performance with less code Versatile: Supports a wide variety of data models and queries
  • 7. # O S N 2 0 1 8 Easy & Versatile - Rich Query Functionality MongoDB Expressive Queries • Find anyone with phone # “1-212…” • Check if the person with number “555…” is on the “do not call” list Geospatial • Find the best offer for the customer at geo coordinates of 42nd St. and 6th Ave Text Search • Find all tweets that mention the firm within the last 2 days Aggregation • Count and sort number of customers by city Native Binary JSON support • Add an additional phone number to Mark Smith’s without rewriting the document • Update just 2 phone numbers out of 10 • Sort on the modified date { customer_id : 1, first_name : "Mark", last_name : "Smith", city : "San Francisco", phones: [ { number : “1-212-777-1212”, dnc : true, type : “home” }, { number : “1-212-777-1213”, type : “cell” }] } Joins ($lookup) • Query for all San Francisco residences, lookup their transactions, and sum the amount by person Graph queries ($graphLookup) • Query for all people within 3 degrees of separation from Mark
  • 8. # O S N 2 0 1 8 Intelligently put data where you need it Ability to run both operational & analytics workloads on same cluster, for timely insight and lower cost Workload Isolation Elastic horizontal scalability - add/remove capacity dynamically without downtime Scalability Declare data locality rules for governance (e.g. data sovereignty), tiers of service & local low latency access Locality Built-in multi-geography high availability, replication & automated failover Highly Availability
  • 9. # O S N 2 0 1 8 Freedom to run anywhere Local On-premises Server & Mainframe Private cloud Fully managed cloud service Hybrid cloud Public cloud ● Database that runs the same everywhere ● Leverage the benefits of a multi-cloud strategy ● Global coverage ● Avoid lock-in Convenience: same codebase, same APIs, same tools, wherever you run
  • 10. # O S N 2 0 1 8 MongoDB Atlas: Database as a service mongodb.com/atlas Self-service and elastic • Deploy in minutes • Scale up/down without downtime • Automated upgrades Global and highly available • 52 Regions worldwide • Replica sets optimized for availability • Cross-region replication Secure by default • Network isolation and Peering • Encryption in flight and at rest • Role-based access control • SOC 2 Type 1 / Privacy Shield Comprehensive Monitoring • Performance Advisor • Dashboards w/ 100+ metrics • Real Time Performance • Customizable alerting Managed Backup • Point in Time Restore • Queryable backups • Consistent snapshots Cloud Agnostic • AWS, Azure, and GCP • Easy migrations • Consistent experience
  • 11. # O S N 2 0 1 8 MongoDB Compass MongoDB Connector for BI MongoDB Enterprise Server Enterprise Advanced for Self-Managed CommercialLicense (NoAGPLCopyleftRestrictions) Platform Certifications MongoDB Ops Manager Monitoring & Alerting Query Optimization Backup & Recovery Automation & Configuration Schema Visualization Data Exploration Ad-Hoc Queries Visualization Analysis Reporting LDAP & Kerberos Auditing In-Memory Storage Engine Encryption at Rest REST APIEmergency Patches Customer Success Program On-Demand Online Training Warranty Limitation of Liability Indemnification 24x7Support (1hourSLA)
  • 12. # O S N 2 0 1 8 Schema Design Patterns
  • 13. # O S N 2 0 1 8 • 10 years with the document model • Use of a common methodology and vocabulary when designing schemas for MongoDB • Ability to model schemas using building blocks • Less art and more methodology Why this Talk?
  • 14. # O S N 2 0 1 8 Ensure: • Good performance & scalability • Fast development despite constraints • Hardware • RAM faster than Disk • Disk cheaper than RAM • Network latency • Reduce costs $$$ • Database Server • Maximum size for a document • Atomicity of a write (ACID GA soon) • Data set • Size of data Why do we Create Models?
  • 15. # O S N 2 0 1 8 However, Don't Over Design!
  • 16. # O S N 2 0 1 8 World Movie Database (WMDB) - Logical Data Model Any events, characters and entities depicted in this presentation are fictional. Any resemblance or similarity to reality is entirely coincidental
  • 17. # O S N 2 0 1 8 • Frequency of Access • Subset ✔️ • Approximation • Extended Reference Patterns by Category • Grouping • Computed ✔️ • Bucket ✔️ • Outlier • Representation • Entity ✔️ • Document Versioning ✔️ • Schema Versioning ✔️ • Mixed Attributes • Tree • Polymorphism
  • 18. # O S N 2 0 1 8 Problem: • How to get started modeling data in MongoDB, not as a relational model • Logical model is spread across tables • Today’s languages used OOP and JSON • Hard to use and worse performance spreading across tables Use cases: • Most every operational application with modern languages • Also applicable to analytics environments Issue #1 – How to Model Data in Documents
  • 19. # O S N 2 0 1 8 Solution: • Simply store data in the objects or JSON used in the application/service Benefits: • Faster development • Faster performance • Easier to partition and scale Pattern #1 - Entity
  • 20. # O S N 2 0 1 8 Logical Model to Documents Typically map to objects & JSON 3 collections: A. movies B. moviegoers C. screenings
  • 21. # O S N 2 0 1 8 Moviegoer { _id: 1, ... viewings: [ {m: 100, d: 2016-05-24} {m: 200, d: 2017-03-18} ], ratings: [ {m: 100, v: 3, c: “great“} ] } 3 Main Entities Movie { _id: 100, name: “Best Movie Ever”, castAndCrew: [ {fn: “Joe”, ln: Smith, …} … ], reviews: [ {d: 2018-05-25, r: “awful”, …} … ], quotes: […] } Screening { _id: 200, movieId: 100 location: “NYC”, numViewers: 500, revenues: 100,000 }
  • 22. # O S N 2 0 1 8 Possible solutions: A. Reduce the size of your working set (no extra cost!) B. Add more RAM per machine C. Start sharding or add more shards Issue #2: Working Set Doesn’t Fit in RAM
  • 23. # O S N 2 0 1 8 In this example, we can: • Limit the list of actors and crew to 20 • Limit the embedded reviews to the top 20 • … Pattern #2: Subset
  • 24. # O S N 2 0 1 8 Problem: • There are 1-N or N-N relationships, and only a few fields or documents that always need to be shown • Only infrequently do you need to pull all of the related data Use cases: • Main actors of a movie • List of reviews or comments Generalizing the Subset Pattern
  • 25. # O S N 2 0 1 8 Solution: • Keep duplicates of a small subset of fields in the main collection Benefits: • Allows for fast data retrieval and a reduced working set size • One query brings all the information needed for the "main page" Subset Pattern - Solution
  • 26. # O S N 2 0 1 8 • How duplication is handled A. Update both source and target in real time from application (optional: Txn) B. Use Change Streams to subscribe to change and async update the target C. Update target from source at regular intervals. Examples: • Most popular items => update nightly • Revenues from a movie => update every hour • Last 10 reviews => update hourly? daily? Implementation Reality of Patterns: Consistency
  • 27. # O S N 2 0 1 8 Change Streams For Sync and Real-Time Apps ChangeStreamsAPI Business Apps User Data Sensors Clickstream Real-Time Event Notifications Message Queue Syncing with other collections/microservices
  • 28. # O S N 2 0 1 8 • CPU is on fire! Issue #3: High CPU Usage
  • 29. # O S N 2 0 1 8 { title: "The Shape of Water", ... viewings: 5,000 viewers: 385,000 revenues: 5,074,800 } Issue #3: ..caused by repeated calculations
  • 30. # O S N 2 0 1 8 For example: • Apply a sum, count, ... • rollup data by minute, hour, day • As long as you don’t mess with your source, you can recreate the rollups Pattern #3: Computed
  • 31. # O S N 2 0 1 8 Problem: • There is data that needs to be computed • The same calculations would happen over and over • Reads outnumber writes: • example: 1K writes per hour vs 1M read per hour Use cases: • Have revenues per movie showing, want to display sums • Time series data, Event Sourcing Computed Pattern
  • 32. # O S N 2 0 1 8 Solution: • Apply a computation or operation on data and store the result Benefits: • Avoid re-computing the same thing over and over Computed Pattern - Solution
  • 33. # O S N 2 0 1 8 • How to quickly change schemas over time with new requirements? • How to know what fields are in the results? Issue #4: Need to change the fields in the documents
  • 34. # O S N 2 0 1 8 Problem: • Updating the schema of a collection or database is: • Not atomic • Long operation • Is not necessary, as there is not one schema as in RDBMSs • May not want to update all documents, only do it going forward Use cases: • Practically any database that will go to production Schema Versioning Pattern
  • 35. # O S N 2 0 1 8 Solution: • Have a field keeping track of the schema version Benefits: • Don't need to update all the documents at once • May not have to update documents until their next modification Schema Versioning Pattern – Solution
  • 36. # O S N 2 0 1 8 Add a field to track the schema version number, per document Does not have to exist for version 1 Always have the option to loop through and update all docs but not forced to Pattern #4: Schema Versioning
  • 37. # O S N 2 0 1 8 • Updating data in place can be seen as deleting previous version • Regulated industries often require an audit trail for X years • Insight can be gleaned from measuring changing data (e.g. claims processing, code check-ins, etc.) • Many possible approaches here Issue #5: Need to track and query current and previous versions of documents
  • 38. # O S N 2 0 1 8 Problem: • Should we track field-level changes or entire documents? • Consider how to handle consistency requirements during changes Use cases: • Most apps storing business transactions • Any data useful to see over time Pattern #5: Document Versioning
  • 39. # O S N 2 0 1 8 Solution: • Ultimately dependent on the situation • But 2 main approaches are most common • Tracking a few updates in one document • Separate collections for latest and for historical changes Benefits: • First option saves on disk space • Second option gives good performance no matter how many changes Document Versioning Pattern – Solution
  • 40. # O S N 2 0 1 8 Have an array of previous values that were changed Compare-and-swap (on version) for thread-safe update to the document If Few Changes Movie { _id: 100, current: { v: 3, name: “Best Movie Ever”, budget: 450, actual: 450 }, prev: [ {v: 1, name: “OK Movie”, budget: 450}, {v: 2, name: “Good Movie”, actual: 400} ] }
  • 41. # O S N 2 0 1 8 Unbounded Numbers of Changes Current Collection { _id: 100, v: 3, name: “Best Movie Ever”, budget: 450, actualBudget: 450 } History Collection { movieId: 100, v: 1, name: “OK Movie”, budget: 450, t: Date(“2018-06-01…”) } History Collection { movieId : 100, v: 2, name: “Good Movie”, budget: 450, actual: 400, t: Date(“2018-06-01…”) } History Collection { movieId : 100, v: 3, name: “Best Movie Ever”, budget: 450, actual: 450, t: Date(“2018-06-01…”) }
  • 42. # O S N 2 0 1 8 • It is known that a series of items are often read/written together • E.g. last month’s transactions, 100 device samples, prices for an hour • Often would store each item in a separate record in RDBMSs • With arrays in documents, have the option of storing many items together Issue #6: Poor Performance Reading/Writing a Series of Many Items
  • 43. # O S N 2 0 1 8 Problem: • Do we know a series of items will be access together and not randomly? • Should we store a document per item, like with RDBMSs? • How to balance write vs. read performance? Use cases: • Transactions: orders, claims, payments, etc. • Time series: IoT, market data, tweets, reviews, comments, etc. • Often used for analytics and reporting Pattern #6: Bucket Pattern
  • 44. # O S N 2 0 1 8 Solution: • Store as an array of items in a document (a certain number or time window) • Often each item is written by itself, and then rolled into the bucket asynchronously for high performance reading • Retainment period can be different for item vs. the bucket Benefits: • Reads are many times faster (easily 10x or more) • Also often saves on disk space as field names are stored less times Bucket Pattern – Solution
  • 45. # O S N 2 0 1 8 • Likely need to write each item in case of app failure (short retainment) • Async write the buckets • Might keep buckets longer than raw items Storing Buckets and Optionally Each Item Screening { _id: 200, location: “135 W. 34th St., NYC”, date: Date(“2018-06-01 5:00PM”), numViewers: 500, revenues: 5000 } ScreeningBucket { _id: 2000, movieId: 100, metro: “New York”, day: Date(“2018-06-01”), numViewers: 50000, ..., screenings: [ {id: 200, t: “5:00”, v: 500}, {id: 201, t: “7:30”, v: 1500}, ] }
  • 46. # O S N 2 0 1 8 Lambda Architecture Helps Balance Reads/Writes App Writes Data Async Processing (change stream or periodic batch) Each Item (MongoDB) Buckets of Items in MongoDB Queries Message Queue And/Or
  • 47. # O S N 2 0 1 8 Extremely Common with Time Series & IoT SensorSample { _id: 200, loc: { type: “Point”, coordinates: [-93, 45] }, date: Date(“2018-06-01 5:00PM”), temp: 54 } SampleBucket { _id: 2000, loc: { type: “Point”, coordinates: [-93, 45] }, startTime: Date(“2018-06-01 5:00PM”), endTime: Date(“2018-06-01 6:00PM”), minTemp: 50, maxTemp: 60, ..., samples: [ {t: Date(“2018-06-01 5:00PM”), v: 51.5}, {t: Date(“2018-06-01 5:01PM”), v: 52}, ... ] }
  • 48. # O S N 2 0 1 8 What our Patterns did for us Problem Pattern How to model data in documents Entity Using too much RAM Subset Using too much CPU Computed No downtime to upgrade schema Schema Versioning How to track previous versions Document Versioning How to improve performance of series of data Bucket
  • 49. # O S N 2 0 1 8 • Mixed Attributes* – using key/values in arrays for allow searching on dozens of variable fields • Approximation* – reducing frequency of calculations with approximate values • Extended Reference – detailed data stored in separate collection for lookup on drill down • Trees – store 1 or multiple levels as one document and/or use $graphLookup to recursively traverse • Polymorphism – each document represents an item, but each item can have different fields (e.g. product catalog) • Outlier* - avoid having a few documents drive the design, and impact performance for all * = covered in other presentations on Mongodb.com Other Patterns
  • 50. # O S N 2 0 1 8 A. Simple grouping from tables to collections is often not optimal B. Learn a common vocabulary for designing schemas with MongoDB C. Use patterns as "plug-and-play" to improve performance Take Aways
  • 51. # O S N 2 0 1 8 • Previous webinar I extended covers 3 different patterns https://www.mongodb.com/presentations/advanced-schema-design-patterns • MongoDB in-person training courses on Schema Design • MongoDB University https://university.mongodb.com • M001: MongoDB Basics • (Upcoming) M220: Data Modeling How Can I Learn More About Schema Design?
  • 52. # O S N 2 0 1 8 For More Information About MongoDB Resource Location Public Atlas DBaaS mongodb.com/cloud/atlas Case Studies mongodb.com/customers Presentations mongodb.com/presentations Free Online Training university.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.com MongoDB Downloads mongodb.com/download
  • 53. # M D B l o c a l Thank You for using MongoDB !