SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Solutions Architect, MongoDB
Jay Runkel
@jayrunkel
Time Series Data – Part 1
Schema Design
Our Mission Today
We need to prepare for this
Develop Nationwide traffic monitoring
system
Traffic sensors to monitor interstate
conditions
• 16,000 sensors
• Measure
• Speed
• Travel time
• Weather, pavement, and traffic conditions
• Support desktop, mobile, and car navigation
systems
Model After NY State Solution
Other requirements
• Need to keep 3 year history
• Three data centers
• NJ, Chicago, LA
• Need to support 5M simultaneous users
• Peak volume (rush hour)
• Every minute, each request the 10 minute average
speed for 50 sensors
Master Agenda
• Successfully deploy a MongoDB application at
scale
• Use case: traffic data
• Presentation Components
1. Schema Design
2. Aggregation
3. ClusterArchitecture
Time Series Data Schema
Design
Agenda
• Similarities between MongoDB and Olympic
weight lifting
• What is time series data?
• Schema design considerations
• Analysis of alternative schemas
• Questions
Before we get started…
Lifting heavy things requires
• Technique
• Planning
• Practice
• Analysis
• Tuning
Without planning…
Tailor your schema to your
application workload
Time Series
A time series is a sequence of data points, measured
typically at successive points in time spaced at
uniform time intervals.
– Wikipedia
0 2 4 6 8 10 12
time
Time Series Data is Everywhere
• Free hosted service for monitoring MongoDB systems
– 100+ system metrics visualized and alerted
• 25,000+ MongoDB systems submitting data every 60
seconds
• 90% updates, 10% reads
• ~75,000 updates/second
• ~5.4B operations/day
• 8 commodity servers
Example: MongoDB Monitoring Service
Time Series Data is Everywhere
Application Requirements
Event Resolution
Analysis
– Dashboards
– Analytics
– Reporting
Data Retention Policies
Event and Query Volumes
Schema Design
Aggregation Queries
Cluster Architecture
Schema Design
Considerations
Schema Design Goal
Store Event Data
SupportAnalytical Queries
Find best compromise of:
– Memory utilization
– Write performance
– Read/Analytical Query Performance
Accomplish with realistic amount of hardware
Designing For Reading, Writing, …
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Document Per Event
{
segId: “I80_mile23”,
speed: 63,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload
Document Per Minute (Average)
{
segId: “I80_mile23”,
speed_num: 18,
speed_sum: 1134,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
Document Per Minute (By Second)
{
segId: “I80_mile23”,
speed: { 0: 63, 1: 58, …, 58: 66, 59: 64 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
Document Per Hour (By Second)
{
segId: “I80_mile23”,
speed: { 0: 63, 1: 58, …, 3598: 45, 3599: 55 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
Document Per Hour (By Second)
{
segId: “I80_mile23”,
speed: {
0: {0: 47, …, 59: 45},
….
59: {0: 65, …, 59: 66}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
Characterizing Write Differences
• Example: data generated every second
• For 1 minute:
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Document Per Event
60 writes
Document Per Minute
1 write, 59 updates
Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
Document Per Event
3600 reads
Document Per Minute
60 reads
Characterizing Memory Differences
• _id index for 1 billion events:
• _id index plus segId and ts index:
• Memory requirements significantly reduced
– Fewer shards
– Lower capacity servers
Document Per Event
~32 GB
Document Per Minute
~.5 GB
Document Per Event
~100 GB
Document Per Minute
~2 GB
Traffic Monitoring System
Schema
Quick Analysis
Writes
– 16,000 sensors, 1 update per minute
– 16,000 / 60 = 267 updates per second
Reads
– 5M simultaneous users
– Each requests data for 50 sensors per minute
Tailor your schema to your
application workload
Reads: Impact of Alternative
Schemas
10 minute average query
Schema 1 sensor 50 sensors
1 doc per event 10 500
1 doc per 10 min 1.9 95
1 doc per hour 1.3 65
Query: Find the average speed over the
last
ten minutes
10 minute average query with 5M
users
Schema ops/sec
1 doc per event 42M
1 doc per 10 min 8M
1 doc per hour 5.4M
Writes: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema Inserts Updates
doc/event 60 0
doc/10 min 6 54
doc/hour 1 59
16000 Sensors – 1 Day
Schema Inserts Updates
doc/event 23M 0
doc/10 min 2.3M 21M
doc/hour .38M 22.7M
Queries will require two indexes
{
“segId” : “20484097”,
”ts" : ISODate(“2013-10-10T23:06:37.000Z”),
”time" : "237",
"speed" : "52",
“pavement”: “Wet Spots”,
“status” : “Wet Conditions”,
“weather” : “Light Rain”
}
~70 bytes per document
Memory: Impact of alternative
schemas
1 Sensor - 1 Hour
Schema
# of
Documents
Index Size
(bytes)
doc/event 60 4200
doc/10 min 6 420
doc/hour 1 70
16000 Sensors – 1 Day
Schema
# of
Documents Index Size
doc/event 23M 1.3 GB
doc/10 min 2.3M 131 MB
doc/hour .38M 1.4 MB
Tailor your schema to your
application workload
Summary
• Tailor your schema to your application workload
• Aggregating events will
– Improve write performance: inserts  updates
– Improve analytics performance: fewer document reads
– Reduce index size  reduce memory requirements
Questions?
@jayrunkel
jay.runkel@mongodb.com
Part 2 – July 9th, 2:00 PM EST
Part 3 - July 16th, 2:00 PM EST

Weitere ähnliche Inhalte

Was ist angesagt?

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
confluent
 

Was ist angesagt? (20)

Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
Big Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best PracticesBig Data Architectural Patterns and Best Practices
Big Data Architectural Patterns and Best Practices
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
MongoDB for Time Series Data
MongoDB for Time Series DataMongoDB for Time Series Data
MongoDB for Time Series Data
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 

Ähnlich wie MongoDB for Time Series Data: Schema Design

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB
 

Ähnlich wie MongoDB for Time Series Data: Schema Design (20)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
MongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB for Time Series Data: Setting the Stage for Sensor Management
MongoDB for Time Series Data: Setting the Stage for Sensor Management
 
Mongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - BrignoliMongo db 2.4 time series data - Brignoli
Mongo db 2.4 time series data - Brignoli
 
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy IndustriesWebinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
Webinar: MongoDB Use Cases within the Oil, Gas, and Energy Industries
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017Performance Monitoring for the Cloud - Java2Days 2017
Performance Monitoring for the Cloud - Java2Days 2017
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
A Low-Cost IoT Application for the Urban Traffic of Vehicles, Based on Wirele...
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case StudyMongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
MongoDB IoT City Tour LONDON: Managing the Database Complexity, by Arthur Vie...
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

MongoDB for Time Series Data: Schema Design

  • 1. Solutions Architect, MongoDB Jay Runkel @jayrunkel Time Series Data – Part 1 Schema Design
  • 3. We need to prepare for this
  • 4. Develop Nationwide traffic monitoring system
  • 5.
  • 6. Traffic sensors to monitor interstate conditions • 16,000 sensors • Measure • Speed • Travel time • Weather, pavement, and traffic conditions • Support desktop, mobile, and car navigation systems
  • 7. Model After NY State Solution
  • 8. Other requirements • Need to keep 3 year history • Three data centers • NJ, Chicago, LA • Need to support 5M simultaneous users • Peak volume (rush hour) • Every minute, each request the 10 minute average speed for 50 sensors
  • 9. Master Agenda • Successfully deploy a MongoDB application at scale • Use case: traffic data • Presentation Components 1. Schema Design 2. Aggregation 3. ClusterArchitecture
  • 10. Time Series Data Schema Design
  • 11. Agenda • Similarities between MongoDB and Olympic weight lifting • What is time series data? • Schema design considerations • Analysis of alternative schemas • Questions
  • 12. Before we get started…
  • 13.
  • 14. Lifting heavy things requires • Technique • Planning • Practice • Analysis • Tuning
  • 16.
  • 17. Tailor your schema to your application workload
  • 18. Time Series A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. – Wikipedia 0 2 4 6 8 10 12 time
  • 19. Time Series Data is Everywhere
  • 20. • Free hosted service for monitoring MongoDB systems – 100+ system metrics visualized and alerted • 25,000+ MongoDB systems submitting data every 60 seconds • 90% updates, 10% reads • ~75,000 updates/second • ~5.4B operations/day • 8 commodity servers Example: MongoDB Monitoring Service
  • 21. Time Series Data is Everywhere
  • 22. Application Requirements Event Resolution Analysis – Dashboards – Analytics – Reporting Data Retention Policies Event and Query Volumes Schema Design Aggregation Queries Cluster Architecture
  • 24. Schema Design Goal Store Event Data SupportAnalytical Queries Find best compromise of: – Memory utilization – Write performance – Read/Analytical Query Performance Accomplish with realistic amount of hardware
  • 25. Designing For Reading, Writing, … • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  • 26. Document Per Event { segId: “I80_mile23”, speed: 63, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload
  • 27. Document Per Minute (Average) { segId: “I80_mile23”, speed_num: 18, speed_sum: 1134, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  • 28. Document Per Minute (By Second) { segId: “I80_mile23”, speed: { 0: 63, 1: 58, …, 58: 66, 59: 64 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  • 29. Document Per Hour (By Second) { segId: “I80_mile23”, speed: { 0: 63, 1: 58, …, 3598: 45, 3599: 55 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  • 30. Document Per Hour (By Second) { segId: “I80_mile23”, speed: { 0: {0: 47, …, 59: 45}, …. 59: {0: 65, …, 59: 66} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  • 31. Characterizing Write Differences • Example: data generated every second • For 1 minute: • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits Document Per Event 60 writes Document Per Minute 1 write, 59 updates
  • 32. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks Document Per Event 3600 reads Document Per Minute 60 reads
  • 33. Characterizing Memory Differences • _id index for 1 billion events: • _id index plus segId and ts index: • Memory requirements significantly reduced – Fewer shards – Lower capacity servers Document Per Event ~32 GB Document Per Minute ~.5 GB Document Per Event ~100 GB Document Per Minute ~2 GB
  • 35. Quick Analysis Writes – 16,000 sensors, 1 update per minute – 16,000 / 60 = 267 updates per second Reads – 5M simultaneous users – Each requests data for 50 sensors per minute
  • 36. Tailor your schema to your application workload
  • 37. Reads: Impact of Alternative Schemas 10 minute average query Schema 1 sensor 50 sensors 1 doc per event 10 500 1 doc per 10 min 1.9 95 1 doc per hour 1.3 65 Query: Find the average speed over the last ten minutes 10 minute average query with 5M users Schema ops/sec 1 doc per event 42M 1 doc per 10 min 8M 1 doc per hour 5.4M
  • 38. Writes: Impact of alternative schemas 1 Sensor - 1 Hour Schema Inserts Updates doc/event 60 0 doc/10 min 6 54 doc/hour 1 59 16000 Sensors – 1 Day Schema Inserts Updates doc/event 23M 0 doc/10 min 2.3M 21M doc/hour .38M 22.7M
  • 39. Queries will require two indexes { “segId” : “20484097”, ”ts" : ISODate(“2013-10-10T23:06:37.000Z”), ”time" : "237", "speed" : "52", “pavement”: “Wet Spots”, “status” : “Wet Conditions”, “weather” : “Light Rain” } ~70 bytes per document
  • 40. Memory: Impact of alternative schemas 1 Sensor - 1 Hour Schema # of Documents Index Size (bytes) doc/event 60 4200 doc/10 min 6 420 doc/hour 1 70 16000 Sensors – 1 Day Schema # of Documents Index Size doc/event 23M 1.3 GB doc/10 min 2.3M 131 MB doc/hour .38M 1.4 MB
  • 41. Tailor your schema to your application workload
  • 42. Summary • Tailor your schema to your application workload • Aggregating events will – Improve write performance: inserts  updates – Improve analytics performance: fewer document reads – Reduce index size  reduce memory requirements
  • 43. Questions? @jayrunkel jay.runkel@mongodb.com Part 2 – July 9th, 2:00 PM EST Part 3 - July 16th, 2:00 PM EST