Time series data can be found everywhere around you, from financial markets to social networks to sensors. There are a multitude of sources of time series data but they have some common attributes: large in volume, ordered by time, and primarily aggregated for access. Time series data is a great fit for MongoDB and in this webinar we will take a closer look at how to model time series data in MongoDB by exploring the schema of a tool that has become very popular in the community: MongoDB Management Service (MMS). We'll walk through different schema design considerations and how those impact the features and functionality of MMS and review workload differences across different designs.
4. Time Series Data is Everywhere
• Financial markets pricing (stock ticks)
• Sensors (temperature, pressure, proximity)
• Industrial fleets (location, velocity, operational)
• Social networks (status updates)
• Mobile devices (calls, texts)
• Systems (server logs, application logs)
5. Time Series Data at a Higher Level
• Widely applicable data model
• Applies to several different “data use cases”
• Various schema and modeling options
• Application requirements drive schema design
6. Time Series Data Considerations
• Resolution of raw events
• Resolution needed to support
– Applications
– Analysis
– Reporting
• Data retention policies
– Data ages out
– Retention
10. Document Per Minute (Average)
{
server: “server1”,
load_num: 92,
load_sum: 4500,
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Pre-aggregate to compute average per minute more easily
• Update-driven workload
• Resolution at the minute-level
11. Document Per Minute (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 58: 45, 59: 40 }
ts: ISODate("2013-10-16T22:07:00.000-0500")
}
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
12. Document Per Hour (By Second)
{
server: “server1”,
load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 }
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
13. Document Per Hour (By Second)
{
server: “server1”,
load: {
0: {0: 15, …, 59: 45},
….
59: {0: 25, …, 59: 75}
ts: ISODate("2013-10-16T22:00:00.000-0500")
}
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59+59 steps
14. Characterzing Write Differences
• Example: data generated every second
• Capturing data per minute requires:
– Document per event: 60 writes
– Document per minute: 1 write, 59 updates
• Transition from insert driven to update driven
– Individual writes are smaller
– Performance and concurrency benefits
15. Characterizing Read Differences
• Example: data generated every second
• Reading data for a single hour requires:
– Document per event: 3600 reads
– Document per minute: 60 reads
• Read performance is greatly improved
– Optimal with tuned block sizes and read ahead
– Fewer disk seeks
17. MMS Monitoring
• MongoDB Management System Monitoring
• Available in two flavors
– Free cloud-hosted monitoring
– On-premise with MongoDB Enterprise
• Monitor single node, replica set, or sharded cluster
deployments
• Metric dashboards and custom alert triggers
20. MMS Application Requirements
Resolution defines granularity of
stored data
Range controls the retention
policy, e.g. after 24 hours only 5minute resolution
Display dictates the stored preaggregations, e.g. total and count
21. Monitoring Schema Design
{
timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”),
num_samples: 58,
total_samples: 108000000,
type: “memory_used”,
values: {
0: 999999,
…
59: 1800000
}
}
• Per-minute document model
• Documents store individual metrics and counts
• Supports “total” and “avg/sec” display
22. Monitoring Data Updates
db.metrics.update(
{
timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
type: “memory_used”
},
{
{$set: {“values.59”: 2000000 }},
{$inc: {num_samples: 1, total_samples: 2000000 }}
}
)
• Single update required to add new data and
increment associated counts
23. Monitoring Data Management
• Data stored at different granularity levels for read
performance
• Collections are organized into specific intervals
• Retention is managed by simply dropping
collections as they age out
• Document structure is pre-created to maximize write
performance
25. What is Operational Intelligence
• Storing log data
– Capturing application and/or server generated events
• Hierarchical aggregation
– Rolling approach to generate rollups
– e.g. hourly > daily > weekly > monthly
• Pre-aggregated reports
– Processing data to generate reporting from raw events
27. Pre-Aggregation
• Analytics across raw events can involve many reads
• Alternative schemas can improve read and write
performance
• Data can be organized into more coarse buckets
• Transition from insert-driven to update-driven
workloads
31. Before You Start
• What are the application requirements?
• Is pre-aggregation useful for your application?
• What are your retention and age-out policies?
• What are the gotchas?
– Pre-create document structure to avoid fragmentation and
performance problems
– Organize your data for growth – time series data grows
fast!
32. Down The Road
• Scale-out considerations
– Vertical vs. horizontal (with sharding)
• Understanding the data
– Aggregation
– Analytics
– Reporting
• Deeper data analysis
– Patterns
– Predictions
33. Scaling Time Series Data in
MongoDB
• Vertical growth
– Larger instances with more CPU and memory
– Increased storage capacity
• Horizontal growth
– Partitioning data across many machines
– Dividing and distributing the workload
34. Time Series Sharding
Considerations
• What are the application requirements?
– Primarily collecting data
– Primarily reporting data
– Both
• Map those back to
– Write performance needs
– Read/write query distribution
– Collection organization (see MMS Monitoring)
• Example: {metric name, coarse timestamp}
35. Aggregates, Analytics, Reporting
• Aggregation Framework can be used for analysis
– Does it work with the chosen schema design?
– What sorts of aggregations are needed?
• Reporting can be done on predictable, rolling basis
– See “Hierarchical Aggregation”
• Consider secondary reads for analytical operations
– Minimize load on production primaries
36. Deeper Data Analysis
• Leverage MongoDB-Hadoop connector
– Bi-directional support for reading/writing
– Works with online and offline data (e.g. backup files)
• Compute using MapReduce
– Patterns
– Recommendations
– Etc.
• Explore data
– Pig
– Hive
38. Resources
• Schema Design for Time Series Data in MongoDB
http://blog.mongodb.org/post/65517193370/schema-design-for-time-seriesdata-in-mongodb
• Operational Intelligence Use Case
http://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence
• Data Modeling in MongoDB
http://docs.mongodb.org/manual/data-modeling/
• Schema Design (webinar)
http://www.mongodb.com/events/webinar/schema-design-oct2013