SlideShare ist ein Scribd-Unternehmen logo
1 von 38
#mongodb

Time Series Data in MongoDB
Sandeep Parikh
Partner Technical Services, MongoDB Inc.
Agenda
• What is time series data?
• Schema design considerations
• Broader use case: operational intelligence

• MMS Monitoring schema design
• Thinking ahead
• Questions
What is time series data?
Time Series Data is Everywhere
• Financial markets pricing (stock ticks)
• Sensors (temperature, pressure, proximity)
• Industrial fleets (location, velocity, operational)

• Social networks (status updates)
• Mobile devices (calls, texts)
• Systems (server logs, application logs)
Time Series Data at a Higher Level
• Widely applicable data model
• Applies to several different “data use cases”
• Various schema and modeling options

• Application requirements drive schema design
Time Series Data Considerations
• Resolution of raw events
• Resolution needed to support
– Applications
– Analysis
– Reporting
• Data retention policies
– Data ages out
– Retention
Schema Design
Considerations
Designing For Writing and Reading
• Document per event
• Document per minute (average)
• Document per minute (second)

• Document per hour
Document Per Event
{

server: “server1”,
load: 92,
ts: ISODate("2013-10-16T22:07:38.000-0500")
}
• Relational-centric approach
• Insert-driven workload
• Aggregations computed at application-level
Document Per Minute (Average)
{

server: “server1”,
load_num: 92,
load_sum: 4500,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
Document Per Minute (By Second)
{

server: “server1”,
load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload

• Pre-allocate structure to avoid document moves
Document Per Hour (By Second)
{

server: “server1”,
load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
Document Per Hour (By Second)
{

server: “server1”,
load: {
0: {0: 15, …, 59: 45},
….
59: {0: 25, …, 59: 75}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}

• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
Characterzing Write Differences
• Example: data generated every second
• Capturing data per minute requires:
– Document per event: 60 writes
– Document per minute: 1 write, 59 updates
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
– Document per event: 3600 reads
– Document per minute: 60 reads
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
MMS Monitoring Schema
Design
MMS Monitoring
• MongoDB Management System Monitoring
• Available in two flavors
– Free cloud-hosted monitoring
– On-premise with MongoDB Enterprise
• Monitor single node, replica set, or sharded cluster

deployments
• Metric dashboards and custom alert triggers
MMS Monitoring
MMS Monitoring
MMS Application Requirements
Resolution defines granularity of
stored data
Range controls the retention
policy, e.g. after 24 hours only 5minute resolution
Display dictates the stored preaggregations, e.g. total and count
Monitoring Schema Design
{
timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”),
num_samples: 58,
total_samples: 108000000,
type: “memory_used”,
values: {
0: 999999,
…
59: 1800000
}
}
• Per-minute document model
• Documents store individual metrics and counts
• Supports “total” and “avg/sec” display
Monitoring Data Updates
db.metrics.update(
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”
},
{
{$set: {“values.59”: 2000000 }},
{$inc: {num_samples: 1, total_samples: 2000000 }}
}
)

• Single update required to add new data and

increment associated counts
Monitoring Data Management
• Data stored at different granularity levels for read

performance
• Collections are organized into specific intervals
• Retention is managed by simply dropping

collections as they age out
• Document structure is pre-created to maximize write

performance
Use Case: Operational
Intelligence
What is Operational Intelligence
• Storing log data
– Capturing application and/or server generated events
• Hierarchical aggregation
– Rolling approach to generate rollups
– e.g. hourly > daily > weekly > monthly
• Pre-aggregated reports
– Processing data to generate reporting from raw events
Storing Log Data
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
"[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en]
(Win98; I ;Nav)”

{

_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
user: 'frank',
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
request: "GET /apache_pb.gif HTTP/1.0",
status: 200,
response_size: 2326,
referrer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)"
}
Pre-Aggregation
• Analytics across raw events can involve many reads
• Alternative schemas can improve read and write

performance
• Data can be organized into more coarse buckets
• Transition from insert-driven to update-driven

workloads
Pre-Aggregated Log Data
{
timestamp_minute: ISODate("2000-10-10T20:55:00Z"),
resource: "/index.html",
page_views: {
0: 50,
…
59: 250
}

}
• Leverage time-series style bucketing
• Track individual metrics (ex. page views)
• Improve performance for reads/writes

• Minimal processing overhead
Hierarchical Aggregation
• Analytical approach as opposed to schema

approach
– Leverage built-in Aggregation Framework or MapReduce

• Execute multiple tasks sequentially to aggregate at

varying levels
• Raw events  Hourly  Weekly  Monthly
• Rolling approach distributes the aggregation

workload
Thinking Ahead
Before You Start
• What are the application requirements?
• Is pre-aggregation useful for your application?
• What are your retention and age-out policies?

• What are the gotchas?
– Pre-create document structure to avoid fragmentation and
performance problems
– Organize your data for growth – time series data grows
fast!
Down The Road
• Scale-out considerations
– Vertical vs. horizontal (with sharding)
• Understanding the data
– Aggregation
– Analytics
– Reporting
• Deeper data analysis
– Patterns
– Predictions
Scaling Time Series Data in
MongoDB
• Vertical growth
– Larger instances with more CPU and memory
– Increased storage capacity
• Horizontal growth
– Partitioning data across many machines
– Dividing and distributing the workload
Time Series Sharding
Considerations
• What are the application requirements?
– Primarily collecting data
– Primarily reporting data
– Both
• Map those back to
– Write performance needs
– Read/write query distribution
– Collection organization (see MMS Monitoring)
• Example: {metric name, coarse timestamp}
Aggregates, Analytics, Reporting
• Aggregation Framework can be used for analysis
– Does it work with the chosen schema design?
– What sorts of aggregations are needed?
• Reporting can be done on predictable, rolling basis
– See “Hierarchical Aggregation”
• Consider secondary reads for analytical operations
– Minimize load on production primaries
Deeper Data Analysis
• Leverage MongoDB-Hadoop connector
– Bi-directional support for reading/writing
– Works with online and offline data (e.g. backup files)
• Compute using MapReduce
– Patterns
– Recommendations
– Etc.
• Explore data
– Pig
– Hive
Questions?
Resources
• Schema Design for Time Series Data in MongoDB

http://blog.mongodb.org/post/65517193370/schema-design-for-time-seriesdata-in-mongodb
• Operational Intelligence Use Case

http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence
• Data Modeling in MongoDB

http://docs.mongodb.org/manual/data-modeling/
• Schema Design (webinar)

http://www.mongodb.com/events/webinar/schema-design-oct2013

Weitere ähnliche Inhalte

Andere mochten auch

Old & wise(에듀시니어)
Old & wise(에듀시니어)Old & wise(에듀시니어)
Old & wise(에듀시니어)
Jungku Hong
 
Migrating to git
Migrating to gitMigrating to git
Migrating to git
Xpand IT
 
Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017
WhatConts
 
2016 SRA Globalization Poster_Justice_Caruson
2016 SRA Globalization Poster_Justice_Caruson2016 SRA Globalization Poster_Justice_Caruson
2016 SRA Globalization Poster_Justice_Caruson
Sandy Justice
 

Andere mochten auch (17)

Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.
Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.
Nhật ký Đặng Thùy Trâm. Bản gốc. Quyển 1.
 
Augmenting RDBMS with MongoDB for ecommerce
Augmenting RDBMS with MongoDB for ecommerceAugmenting RDBMS with MongoDB for ecommerce
Augmenting RDBMS with MongoDB for ecommerce
 
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
 
Data Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQLData Modeling IoT and Time Series data in NoSQL
Data Modeling IoT and Time Series data in NoSQL
 
Old & wise(에듀시니어)
Old & wise(에듀시니어)Old & wise(에듀시니어)
Old & wise(에듀시니어)
 
Review: Leadership Frameworks
Review: Leadership FrameworksReview: Leadership Frameworks
Review: Leadership Frameworks
 
Introduction Pentaho 5.0
Introduction Pentaho 5.0 Introduction Pentaho 5.0
Introduction Pentaho 5.0
 
Challenges in opening up qualitative research data
Challenges in opening up qualitative research dataChallenges in opening up qualitative research data
Challenges in opening up qualitative research data
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Migrating to git
Migrating to gitMigrating to git
Migrating to git
 
Heyat terzi report (Mart 2016)
Heyat terzi report (Mart 2016)Heyat terzi report (Mart 2016)
Heyat terzi report (Mart 2016)
 
Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017Samanage-Website-Redesign-Jan2017
Samanage-Website-Redesign-Jan2017
 
Data meets Creativity - Webbdagarna 2015
Data meets Creativity - Webbdagarna 2015Data meets Creativity - Webbdagarna 2015
Data meets Creativity - Webbdagarna 2015
 
Creative Overview
Creative OverviewCreative Overview
Creative Overview
 
2016 SRA Globalization Poster_Justice_Caruson
2016 SRA Globalization Poster_Justice_Caruson2016 SRA Globalization Poster_Justice_Caruson
2016 SRA Globalization Poster_Justice_Caruson
 
Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21Cartagena Data Festival | Telling Stories with Data 2015 04-21
Cartagena Data Festival | Telling Stories with Data 2015 04-21
 
Revving Up Revenue By Replenishing
Revving Up Revenue By ReplenishingRevving Up Revenue By Replenishing
Revving Up Revenue By Replenishing
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Webinar: Time-Series Data in MongoDB

  • 1. #mongodb Time Series Data in MongoDB Sandeep Parikh Partner Technical Services, MongoDB Inc.
  • 2. Agenda • What is time series data? • Schema design considerations • Broader use case: operational intelligence • MMS Monitoring schema design • Thinking ahead • Questions
  • 3. What is time series data?
  • 4. Time Series Data is Everywhere • Financial markets pricing (stock ticks) • Sensors (temperature, pressure, proximity) • Industrial fleets (location, velocity, operational) • Social networks (status updates) • Mobile devices (calls, texts) • Systems (server logs, application logs)
  • 5. Time Series Data at a Higher Level • Widely applicable data model • Applies to several different “data use cases” • Various schema and modeling options • Application requirements drive schema design
  • 6. Time Series Data Considerations • Resolution of raw events • Resolution needed to support – Applications – Analysis – Reporting • Data retention policies – Data ages out – Retention
  • 8. Designing For Writing and Reading • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
  • 9. Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload • Aggregations computed at application-level
  • 10. Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
  • 11. Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
  • 12. Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
  • 13. Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
  • 14. Characterzing Write Differences • Example: data generated every second • Capturing data per minute requires: – Document per event: 60 writes – Document per minute: 1 write, 59 updates • Transition from insert driven to update driven – Individual writes are smaller – Performance and concurrency benefits
  • 15. Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: – Document per event: 3600 reads – Document per minute: 60 reads • Read performance is greatly improved – Optimal with tuned block sizes and read ahead – Fewer disk seeks
  • 17. MMS Monitoring • MongoDB Management System Monitoring • Available in two flavors – Free cloud-hosted monitoring – On-premise with MongoDB Enterprise • Monitor single node, replica set, or sharded cluster deployments • Metric dashboards and custom alert triggers
  • 20. MMS Application Requirements Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5minute resolution Display dictates the stored preaggregations, e.g. total and count
  • 21. Monitoring Schema Design { timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } } • Per-minute document model • Documents store individual metrics and counts • Supports “total” and “avg/sec” display
  • 22. Monitoring Data Updates db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } ) • Single update required to add new data and increment associated counts
  • 23. Monitoring Data Management • Data stored at different granularity levels for read performance • Collections are organized into specific intervals • Retention is managed by simply dropping collections as they age out • Document structure is pre-created to maximize write performance
  • 25. What is Operational Intelligence • Storing log data – Capturing application and/or server generated events • Hierarchical aggregation – Rolling approach to generate rollups – e.g. hourly > daily > weekly > monthly • Pre-aggregated reports – Processing data to generate reporting from raw events
  • 26. Storing Log Data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)” { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }
  • 27. Pre-Aggregation • Analytics across raw events can involve many reads • Alternative schemas can improve read and write performance • Data can be organized into more coarse buckets • Transition from insert-driven to update-driven workloads
  • 28. Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } } • Leverage time-series style bucketing • Track individual metrics (ex. page views) • Improve performance for reads/writes • Minimal processing overhead
  • 29. Hierarchical Aggregation • Analytical approach as opposed to schema approach – Leverage built-in Aggregation Framework or MapReduce • Execute multiple tasks sequentially to aggregate at varying levels • Raw events  Hourly  Weekly  Monthly • Rolling approach distributes the aggregation workload
  • 31. Before You Start • What are the application requirements? • Is pre-aggregation useful for your application? • What are your retention and age-out policies? • What are the gotchas? – Pre-create document structure to avoid fragmentation and performance problems – Organize your data for growth – time series data grows fast!
  • 32. Down The Road • Scale-out considerations – Vertical vs. horizontal (with sharding) • Understanding the data – Aggregation – Analytics – Reporting • Deeper data analysis – Patterns – Predictions
  • 33. Scaling Time Series Data in MongoDB • Vertical growth – Larger instances with more CPU and memory – Increased storage capacity • Horizontal growth – Partitioning data across many machines – Dividing and distributing the workload
  • 34. Time Series Sharding Considerations • What are the application requirements? – Primarily collecting data – Primarily reporting data – Both • Map those back to – Write performance needs – Read/write query distribution – Collection organization (see MMS Monitoring) • Example: {metric name, coarse timestamp}
  • 35. Aggregates, Analytics, Reporting • Aggregation Framework can be used for analysis – Does it work with the chosen schema design? – What sorts of aggregations are needed? • Reporting can be done on predictable, rolling basis – See “Hierarchical Aggregation” • Consider secondary reads for analytical operations – Minimize load on production primaries
  • 36. Deeper Data Analysis • Leverage MongoDB-Hadoop connector – Bi-directional support for reading/writing – Works with online and offline data (e.g. backup files) • Compute using MapReduce – Patterns – Recommendations – Etc. • Explore data – Pig – Hive
  • 38. Resources • Schema Design for Time Series Data in MongoDB http://blog.mongodb.org/post/65517193370/schema-design-for-time-seriesdata-in-mongodb • Operational Intelligence Use Case http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence • Data Modeling in MongoDB http://docs.mongodb.org/manual/data-modeling/ • Schema Design (webinar) http://www.mongodb.com/events/webinar/schema-design-oct2013