SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Real Time Analytics
Chad Tindel
chad.tindel@10gen.com
The goal
Real Time
Analytics Engine
Real Time
Analytics Engine
Data
Sourc
e
Data
Sourc
e
Data
Sourc
e
Solution goals
Simple log storage
Design Pattern
Aggregation - PipelinesAggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
are passed through a pipeline to produce
a result
– Similar to a Unix command-line pipe
Aggregation PipelineAggregation Pipeline
Aggregation - PipelinesAggregation - Pipelines
db.collection.aggregate(
[ {$match: … },
{$group: … },
{$limit: …}, etc
]
Pipeline OperationsPipeline Operations
• $match
– Uses a query predicate (like .find({…})) as a
filter
{ $match : { author : "dave" } }
{ $match : { score : { $gt : 50, $lte : 90 } } }
Pipeline OperationsPipeline Operations
• $project
– Uses a sample document to determine the
shape of the result (similar to .find()’s 2nd
optional argument)
• Include or exclude fields
• Compute new fields
– Arithmetic expressions, including built-in functions
– Pull fields from nested documents to the top
– Push fields from the top down into new virtual documents
Pipeline OperationsPipeline Operations
• $unwind
– Hands out array elements one at a time
{ $unwind : {"$myarray" } }
• $unwind “streams” arrays
– Array values are doled out one at time in the
context of their surrounding document
– Makes it possible to filter out elements before
returning
Pipeline OperationsPipeline Operations
• $group
– Aggregates items into buckets defined by a
key
GroupingGrouping
• $group aggregation expressions
– Define a grouping key as the _id of the result
– Total grouped column values: $sum
– Average grouped column values: $avg
– Collect grouped column values in an array or
set: $push, $addToSet
– Other functions
• $min, $max, $first, $last
Pipeline OperationsPipeline Operations
• $sort
– Sort documents
– Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
{ $sort : {“total”:-1} }
Pipeline OperationsPipeline Operations
• $limit
– Only allow the specified number of documents
to pass
{ $limit : 20 }
Pipeline OperationsPipeline Operations
• $skip
– Skip over the specified number of documents
{ $skip : 10 }
Computed ExpressionsComputed Expressions
• Available in $project operations
• Prefix expression language
– Add two fields: $add:[“$field1”, “$field2”]
– Provide a value for a missing field: $ifNull:
[“$field1”, “$field2”]
– Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]
(continued)
Computed ExpressionsComputed Expressions
(continued)(continued)
• String functions
– toUpper, toLower, substr
• Date field extraction
– Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
Oracle nvl())
• Ternary conditional
– Return one of two values based on a predicate
• Other functions….
– And we can easily add more as required
Sample data
Original
Event
Data
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”
As JSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}
Insert to
MongoDB
db.logs.insert( doc )
Dynamic Queries
Find all
logs for
a URL
db.logs.find( { ‘path’ : ‘/index.html’ } )
Find all
logs for
a time
range
db.logs.find( { ‘time’ :
{ ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012,1) } } );
Find all
logs for
a host
over a
range of
dates
db.logs.find( {
‘host’ : ‘127.0.0.1’,
‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );
Aggregation Framework
Request
s per
day by
URL
db.logs.aggregate( [
{ '$match': {
'time': {
'$gte': new Date(2012,0),
'$lt': new Date(2012,1) } } },
{ '$project': {
'path': 1,
'date': {
'y': { '$year': '$time' },
'm': { '$month': '$time' },
'd': { '$dayOfMonth': '$time' } } } },
{ '$group': {
'_id': {
'p':'$path’,
'y': '$date.y',
'm': '$date.m',
'd': '$date.d' },
'hits': { '$sum': 1 } } },
])
Aggregation Framework
{
‘ok’: 1,
‘result’: [
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} }
]
}
Roll-ups with map-
reduce
Design Pattern
Map Reduce – Map Phase
Generat
e hourly
rollups
from log
data
var map = function() {
var key = {
p: this.path,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0) };
emit( key, { hits: 1 } );
}
Map Reduce – Reduce Phase
Generat
e hourly
rollups
from log
data
var reduce = function(key, values) {
var r = { hits: 0 };
values.forEach(function(v) {
r.hits += v.hits;
});
return r;
}
)
Map Reduce
Generat
e hourly
rollups
from log
data
cutoff = new Date(2012,0,1)
query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
db.logs.mapReduce( map, reduce, {
‘query’: query,
‘out’: { ‘reduce’ : ‘stats.hourly’ } } )
last_run = cutoff
Map Reduce Output
> db.stats.hourly.find()
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) },
’value': { ’hits’: 124 } },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) },
’value': { ’hits’: 245} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) },
’value': { ’hits’: 322} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) },
’value': { ’hits’: 175} },
... More ...
Chained Map Reduce
Collection 1 :
Raw Logs
Collection 1 :
Raw Logs
Map
Reduce
Map
Reduce
Collection 2:
Hourly Stats
Collection 2:
Hourly Stats
Collection 3:
Daily Stats
Collection 3:
Daily Stats
Map
Reduce
Map
Reduce
Runs
every hour
Runs
every day
Pre-aggregated
documents
Design Pattern
Pre-Aggregation
Data for
URL /
Date
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": 3612,
"1": 3241,
...
"1439": 2819 }
}
Pre-Aggregation
Data for
URL /
Date
id_daily = dt_utc.strftime('%Y%m%d/') + site + page
hour = dt_utc.hour
minute = dt_utc.minute
# Get a datetime that only includes date info
d = datetime.combine(dt_utc.date(), time.min)
query = {
'_id': id_daily,
'metadata': { 'date': d, 'site': site, 'page': page } }
update = { '$inc': {
‘daily’ : 1,
'hourly.%d' % (hour,): 1,
'minute.%d.%d' % (hour,minute): 1 } }
db.stats.daily.update(query, update, upsert=True)
Pre-Aggregation
Data for
URL /
Date
db.stats.daily.findOne(
{'metadata': {'date':dt,
'site':'site-1',
'page':'/index.html'}},
{ 'minute': 1 }
);
Solution Architect, 10gen

Weitere ähnliche Inhalte

Was ist angesagt?

Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)MongoDB
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyPrasun Anand
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an IntroductionBuilding Applications with MongoDB - an Introduction
Building Applications with MongoDB - an IntroductionMongoDB
 
Building a web application with mongo db
Building a web application with mongo dbBuilding a web application with mongo db
Building a web application with mongo dbMongoDB
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話日経平均上下予想Botを作った話
日経平均上下予想Botを作った話dokechin
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch MiscAcademy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch MiscBinary Studio
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindsetEric Normand
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020InfluxData
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides   mongodb web application - mongo philly 2011Nosh slides   mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011MongoDB
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_youShrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_youSHRUG GIS
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Sparksamthemonad
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databasesBinh Le
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 

Was ist angesagt? (20)

Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an IntroductionBuilding Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
 
Building a web application with mongo db
Building a web application with mongo dbBuilding a web application with mongo db
Building a web application with mongo db
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話日経平均上下予想Botを作った話
日経平均上下予想Botを作った話
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch MiscAcademy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch Misc
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindset
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides   mongodb web application - mongo philly 2011Nosh slides   mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
Mysql 4.0 casual
Mysql 4.0 casualMysql 4.0 casual
Mysql 4.0 casual
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_youShrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databases
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Bubble in link list
Bubble in link listBubble in link list
Bubble in link list
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Programs
ProgramsPrograms
Programs
 

Ähnlich wie Schema Design by Chad Tindel, Solution Architect, 10gen

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBTakahiro Inoue
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Featuresajhannan
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases labFabio Fumarola
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_newMongoDB
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkChris Westin
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationMongoDB
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5SAP Concur
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindEMC
 

Ähnlich wie Schema Design by Chad Tindel, Solution Architect, 10gen (20)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Schema Design by Chad Tindel, Solution Architect, 10gen

  • 1. Real Time Analytics Chad Tindel chad.tindel@10gen.com
  • 2. The goal Real Time Analytics Engine Real Time Analytics Engine Data Sourc e Data Sourc e Data Sourc e
  • 5. Aggregation - PipelinesAggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a Unix command-line pipe
  • 7. Aggregation - PipelinesAggregation - Pipelines db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit: …}, etc ]
  • 8. Pipeline OperationsPipeline Operations • $match – Uses a query predicate (like .find({…})) as a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }
  • 9. Pipeline OperationsPipeline Operations • $project – Uses a sample document to determine the shape of the result (similar to .find()’s 2nd optional argument) • Include or exclude fields • Compute new fields – Arithmetic expressions, including built-in functions – Pull fields from nested documents to the top – Push fields from the top down into new virtual documents
  • 10. Pipeline OperationsPipeline Operations • $unwind – Hands out array elements one at a time { $unwind : {"$myarray" } } • $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning
  • 11. Pipeline OperationsPipeline Operations • $group – Aggregates items into buckets defined by a key
  • 12. GroupingGrouping • $group aggregation expressions – Define a grouping key as the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions • $min, $max, $first, $last
  • 13. Pipeline OperationsPipeline Operations • $sort – Sort documents – Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }
  • 14. Pipeline OperationsPipeline Operations • $limit – Only allow the specified number of documents to pass { $limit : 20 }
  • 15. Pipeline OperationsPipeline Operations • $skip – Skip over the specified number of documents { $skip : 10 }
  • 16. Computed ExpressionsComputed Expressions • Available in $project operations • Prefix expression language – Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] (continued)
  • 17. Computed ExpressionsComputed Expressions (continued)(continued) • String functions – toUpper, toLower, substr • Date field extraction – Get year, month, day, hour, etc, from ISODate • Date arithmetic • Null value substitution (like MySQL ifnull(), Oracle nvl()) • Ternary conditional – Return one of two values based on a predicate • Other functions…. – And we can easily add more as required
  • 18. Sample data Original Event Data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)” As JSON doc = { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)” } Insert to MongoDB db.logs.insert( doc )
  • 19. Dynamic Queries Find all logs for a URL db.logs.find( { ‘path’ : ‘/index.html’ } ) Find all logs for a time range db.logs.find( { ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012,1) } } ); Find all logs for a host over a range of dates db.logs.find( { ‘host’ : ‘127.0.0.1’, ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012, 1) } } );
  • 20. Aggregation Framework Request s per day by URL db.logs.aggregate( [ { '$match': { 'time': { '$gte': new Date(2012,0), '$lt': new Date(2012,1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p':'$path’, 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } }, ])
  • 21. Aggregation Framework { ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} } ] }
  • 23. Map Reduce – Map Phase Generat e hourly rollups from log data var map = function() { var key = { p: this.path, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } ); }
  • 24. Map Reduce – Reduce Phase Generat e hourly rollups from log data var reduce = function(key, values) { var r = { hits: 0 }; values.forEach(function(v) { r.hits += v.hits; }); return r; } )
  • 25. Map Reduce Generat e hourly rollups from log data cutoff = new Date(2012,0,1) query = { 'ts': { '$gt': last_run, '$lt': cutoff } } db.logs.mapReduce( map, reduce, { ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.hourly’ } } ) last_run = cutoff
  • 26. Map Reduce Output > db.stats.hourly.find() { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) }, ’value': { ’hits’: 124 } }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) }, ’value': { ’hits’: 245} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) }, ’value': { ’hits’: 322} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) }, ’value': { ’hits’: 175} }, ... More ...
  • 27. Chained Map Reduce Collection 1 : Raw Logs Collection 1 : Raw Logs Map Reduce Map Reduce Collection 2: Hourly Stats Collection 2: Hourly Stats Collection 3: Daily Stats Collection 3: Daily Stats Map Reduce Map Reduce Runs every hour Runs every day
  • 29. Pre-Aggregation Data for URL / Date { _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }
  • 30. Pre-Aggregation Data for URL / Date id_daily = dt_utc.strftime('%Y%m%d/') + site + page hour = dt_utc.hour minute = dt_utc.minute # Get a datetime that only includes date info d = datetime.combine(dt_utc.date(), time.min) query = { '_id': id_daily, 'metadata': { 'date': d, 'site': site, 'page': page } } update = { '$inc': { ‘daily’ : 1, 'hourly.%d' % (hour,): 1, 'minute.%d.%d' % (hour,minute): 1 } } db.stats.daily.update(query, update, upsert=True)
  • 31. Pre-Aggregation Data for URL / Date db.stats.daily.findOne( {'metadata': {'date':dt, 'site':'site-1', 'page':'/index.html'}}, { 'minute': 1 } );