SlideShare a Scribd company logo
1 of 33
Aggregation
Framework
Quick Overview of
Quick Overview of
Document-oriented
Schemaless
JSON-style documents
Rich Queries
Scales Horizontally
db.users.find({
last_name: 'Smith',
age: {$gt : 10}
});
SELECT * FROM users WHERE
last_name=‘Smith’ AND age > 10;
Computing Aggregations in
Databases
SQL-based
RDBMS
JOIN
GROUP BY
AVG(),
COUNT(),
SUM(), FIRST(),
LAST(),
etc.
MongoDB 2.0
MapReduce
MongoDB 2.2+
MapReduce
Aggregation Framework
MapReduce
var map = function()
{
...
emit(key, val);
}
var reduce = function(key, vals)
{
...
return resultVal;
}
Data
Map()
emit(k,v)
Sort(k)
Group(k)
Reduce(k,values)
k,v
Finalize(k,v)
k,v
MongoDB
map iterates on
documents
Document is $this
1 at time per shard
Input matches output
Can run multiple times
What’s wrong with just using
MapReduce?
Map/Reduce is very
powerful, but often overkill
Lots of users relying on it
for simple aggregation tasks
•
•
What’s wrong with just using
MapReduce?
Easy to screw up JavaScript
Debugging a M/R job sucks
Writing more JS for simple tasks should not be necessary
•
•
•
(ಠ︿ಠ)
Aggregation
Framework
Declarative (no need to write JS)
Implemented directly in C++
Expression Evaluation
Return computed values
Framework: We can extend it with new
ops
•
•
•
•
•
Input
Data
(collection)
Filter
Project
Unwind
Group
Sort
Limit
Result
(document)
db.article.aggregate(
{ $project : {author : 1,tags : 1}},
{ $unwind : "$tags" },
{ $group : {_id : “$tags”,
authors:{ $addToSet:"$author"}}
}
);
An aggregation command looks like:
db.article.aggregate(
{ $project : {author : 1, tags : 1}},
{ $unwind : "$tags" },
{ $group : {
_id : “$tags”,
authors : { $addToSet:"$author"}
}}
);
New Helper
Method:
.aggregate()
Operator
pipeline
db.runCommand({
aggregate : "article",
pipeline : [ {$op1, $op2, ...} ]
}
{
"result" : [
{ "_id" : "art", "authors" : [ "bill", "bob" ] },
{ "_id" : "sports", "authors" : [ "jane", "bob" ] },
{ "_id" : "food", "authors" : [ "jane", "bob" ] },
{ "_id" : "science", "authors" : [ "jane", "bill", "bob" ] }
],
"ok" : 1
}
Output Document Looks like this:
result: array of pipeline
output
ok: 1 for success, 0
otherwise
Pipeline
Input to the start of the pipeline is a collection
Series of operators - each one filters or transforms its
input
Passes output data to next operator in the pipeline
Output of the pipeline is the result document
•
•
•
•
ps -ax | tee processes.txt | more
Kind of like UNIX:
Let’s do:
1. Tour of the pipeline
operators
2. A couple examples based on
common SQL aggregation tasks
$match
$unwind
$group
$project
$skip $limit $sort
filters documents from pipeline with a query predicate
filtered with:
{$match: {author:”bob”}}
$match
{author: "bob", pageViews:5, title:"Lorem Ipsum..."}
{author: "bill", pageViews:3, title:"dolor sit amet..."}
{author: "joe", pageViews:52, title:"consectetur adipi..."}
{author: "jane", pageViews:51, title:"sed diam..."}
{author: "bob", pageViews:14, title:"magna aliquam..."}
{author: "bob", pageViews:53, title:"claritas est..."}
filtered with:
{$match: {pageViews:{$gt:50}}
{author:"bob",pageViews:5,title:"Lorem Ipsum..."}
{author:"bob",pageViews:14,title:"magna aliquam..."}
{author:"bob",pageViews:53,title:"claritas est..."}
{author: "joe", pageViews:52, title:"consectetur adipiscing..."}
{author: "jane", pageViews:51, title:"sed diam..."}
{author: "bob", pageViews:53, title:"claritas est..."}
Input:
$unwind
{
"_id" : ObjectId("4f...146"),
"author" : "bob",
"tags" :[ "fun","good","awesome"]
}
explode the “tags” array with:
{ $unwind : ”$tags” }
{ _id : ObjectId("4f...146"), author : "bob", tags:"fun"},
{ _id : ObjectId("4f...146"), author : "bob", tags:"good"},
{ _id : ObjectId("4f...146"), author : "bob", tags:"awesome"}
produces output:
Produce a new document for
each value in an input array
Bucket a subset of docs together,
calculate an aggregated output doc from the bucket
$sum
$max, $min
$avg
$first, $last
$addToSet
$push
db.article.aggregate(
{ $group : {
_id : "$author",
viewsPerAuthor : { $sum :
"$pageViews" }
}
}
);
$group
Output
Calculation
Operators:
db.article.aggregate(
{ $group : {
_id : "$author",
viewsPerAuthor : { $sum : "$pageViews" }
}
}
);
_id: selects a field to use as
bucket key for grouping
Output field name Operation used to calculate the
output value
($sum, $max, $avg, etc.)
$group (cont’d)
dot notation (nested fields)
a constant
a multi-key expression inside
{...}
•
•
•
also allowed here:
An example with $match and $group
SELECT SUM(price) FROM orders
WHERE customer_id = 4;
MongoDB:
SQL:
db.orders.aggregate(
{$match : {“$customer_id” : 4}},
{$group : { _id : null,
total: {$sum : “price”}})
English: Find the sum of all prices of the
orders placed by customer #4
An example with $unwind and $group
MongoDB:
SQL:
English:
db.posts.aggregate(
{ $unwind : "$tags" },
{ $group : {
_id : “$tags”,
authors : { $addToSet : "$author" }
}}
);
For all tags used in blog posts, produce a list of
authors that have posted under each tag
SELECT tag, author FROM post_tags LEFT
JOIN posts ON post_tags.post_id =
posts.id GROUP BY tag, author;
More operators - Controlling Pipeline Input
$skip
$limit
$sort
Similar to:
.skip()
.limit()
.sort()
in a regular Mongo query
$sort
specified the same way as index keys:
{ $sort : { name : 1, age: -1 } }
Must be used in order to take
advantage of $first/$last with
$group.
order input documents
$limit
limit the number of input documents
{$limit : 5}
$skip
skips over documents
{$skip : 5}
$project
Use for:
Add, Remove, Pull up, Push down, Rename
Fields
Building computed fields
Reshape a document
$project
(cont’d)
Include or exclude fields
{$project :
{ title : 1,
author : 1} }
Only pass on fields
“title” and “author”
{$project : { comments : 0}
Exclude
“comments” field,
keep everything
else
Moving + Renaming fields
{$project :
{ page_views : “$pageViews”,
catName : “$category.name”,
info : {
published : “$ctime”,
update : “$mtime”
}
}
}
Rename page_views to pageViews
Take nested field
“category.name”, move
it into top-level field
called “catName”
Populate a new
sub-document
into the output
$project
(cont’d)
db.article.aggregate(
{ $project : {
name : 1,
age_fixed : { $add:["$age", 2] }
}}
);
Building a Computed Field
Output
(computed field) Operands
Expression
$project
(cont’d)
Lots of Available
Expressions
$project
(cont’d)
Numeric $add $sub $mod $divide $multiply
Logical $eq $lte/$lt $gte/$gt $and $not $or $eq
Dates
$dayOfMonth $dayOfYear $dayOfWeek $second $minute
$hour $week $month $isoDate
Strings $substr $add $toLower $toUpper $strcasecmp
Example: $sort → $limit → $project→
$group
MongoDB:
SQL:
English: Of the most recent 1000 blog posts, how many
were posted within each calendar year?
SELECT YEAR(pub_time) as pub_year,
COUNT(*) FROM
(SELECT pub_time FROM posts ORDER BY
pub_time desc)
GROUP BY pub_year;
db.test.aggregate(
{$sort : {pub_time: -1}},
{$limit : 1000},
{$project:{pub_year:{$year:["$pub_time"]}}},
{$group: {_id:"$pub_year", num_year:{$sum:1}}}
)
Some Usage Notes
In BSON, order matters - so computed
fields always show up after regular fields
We use $ in front of field names to
distinguish fields from string literals
in expressions “$name”
“name”
vs.
Some Usage Notes
Use a $match,$sort and $limit
first in pipeline if possible
Cumulative Operators $group:
be aware of memory usage
Use $project to discard unneeded fields
Remember the 16MB output limit
Aggregation vs.
MapReduce
Framework is geared towards counting/accumulating
If you need something more exotic, use
MapReduce
No 16MB constraint on output size with
MapReduce
JS in M/R is not limited to any fixed set of expressions
•
•
•
•
thanks! ✌(-‿-)✌
questions?
$$$ BTW: we are hiring!
http://10gen.com/jobs $$$
@mpobrien
github.com/mpobrien
hit me up:

More Related Content

What's hot

Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query OptimizationMongoDB
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
Nestjs MasterClass Slides
Nestjs MasterClass SlidesNestjs MasterClass Slides
Nestjs MasterClass SlidesNir Kaufman
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Mongoose and MongoDB 101
Mongoose and MongoDB 101Mongoose and MongoDB 101
Mongoose and MongoDB 101Will Button
 
ES6 presentation
ES6 presentationES6 presentation
ES6 presentationritika1
 
Solid NodeJS with TypeScript, Jest & NestJS
Solid NodeJS with TypeScript, Jest & NestJSSolid NodeJS with TypeScript, Jest & NestJS
Solid NodeJS with TypeScript, Jest & NestJSRafael Casuso Romate
 
[Final] ReactJS presentation
[Final] ReactJS presentation[Final] ReactJS presentation
[Final] ReactJS presentation洪 鹏发
 

What's hot (20)

Indexing & Query Optimization
Indexing & Query OptimizationIndexing & Query Optimization
Indexing & Query Optimization
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Nestjs MasterClass Slides
Nestjs MasterClass SlidesNestjs MasterClass Slides
Nestjs MasterClass Slides
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Angular Directives
Angular DirectivesAngular Directives
Angular Directives
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Express js
Express jsExpress js
Express js
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
Laravel Introduction
Laravel IntroductionLaravel Introduction
Laravel Introduction
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Mongoose and MongoDB 101
Mongoose and MongoDB 101Mongoose and MongoDB 101
Mongoose and MongoDB 101
 
ES6 presentation
ES6 presentationES6 presentation
ES6 presentation
 
Angularjs PPT
Angularjs PPTAngularjs PPT
Angularjs PPT
 
NodeJS for Beginner
NodeJS for BeginnerNodeJS for Beginner
NodeJS for Beginner
 
Indexing
IndexingIndexing
Indexing
 
Solid NodeJS with TypeScript, Jest & NestJS
Solid NodeJS with TypeScript, Jest & NestJSSolid NodeJS with TypeScript, Jest & NestJS
Solid NodeJS with TypeScript, Jest & NestJS
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
[Final] ReactJS presentation
[Final] ReactJS presentation[Final] ReactJS presentation
[Final] ReactJS presentation
 

Viewers also liked

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDBKishor Parkhe
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
Mongo db aggregation guide
Mongo db aggregation guideMongo db aggregation guide
Mongo db aggregation guideDeysi Gmarra
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDBMongoDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...Herninda N. Shabrina
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !Sébastien Prunier
 

Viewers also liked (17)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Mongo db aggregation guide
Mongo db aggregation guideMongo db aggregation guide
Mongo db aggregation guide
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Cnidaria & Ctenophora
Cnidaria & CtenophoraCnidaria & Ctenophora
Cnidaria & Ctenophora
 
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...
Cnidaria Dan Ctenophora (Ciri-ciri, struktur tubuh, klasifikasi dan contoh, r...
 
Meetup MUG-RS KingHost
Meetup MUG-RS KingHostMeetup MUG-RS KingHost
Meetup MUG-RS KingHost
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !
 

Similar to MongoDB Aggregation Framework

Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & AggregationMongoDB
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databasesBinh Le
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10genMongoDB
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingManish Kapoor
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - GuilinJackson Tian
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaGuido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...confluent
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Andrii Lashchenko
 
CouchDB on Rails - RailsWayCon 2010
CouchDB on Rails - RailsWayCon 2010CouchDB on Rails - RailsWayCon 2010
CouchDB on Rails - RailsWayCon 2010Jonathan Weiss
 
CouchDB on Rails - FrozenRails 2010
CouchDB on Rails - FrozenRails 2010CouchDB on Rails - FrozenRails 2010
CouchDB on Rails - FrozenRails 2010Jonathan Weiss
 

Similar to MongoDB Aggregation Framework (20)

Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databases
 
MongoDB World 2018: Keynote
MongoDB World 2018: KeynoteMongoDB World 2018: Keynote
MongoDB World 2018: Keynote
 
Schema Design by Chad Tindel, Solution Architect, 10gen
Schema Design  by Chad Tindel, Solution Architect, 10genSchema Design  by Chad Tindel, Solution Architect, 10gen
Schema Design by Chad Tindel, Solution Architect, 10gen
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Mongoskin - Guilin
Mongoskin - GuilinMongoskin - Guilin
Mongoskin - Guilin
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.
 
CouchDB on Rails
CouchDB on RailsCouchDB on Rails
CouchDB on Rails
 
CouchDB on Rails - RailsWayCon 2010
CouchDB on Rails - RailsWayCon 2010CouchDB on Rails - RailsWayCon 2010
CouchDB on Rails - RailsWayCon 2010
 
CouchDB on Rails - FrozenRails 2010
CouchDB on Rails - FrozenRails 2010CouchDB on Rails - FrozenRails 2010
CouchDB on Rails - FrozenRails 2010
 
Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
 

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

MongoDB Aggregation Framework

  • 3. Quick Overview of Document-oriented Schemaless JSON-style documents Rich Queries Scales Horizontally db.users.find({ last_name: 'Smith', age: {$gt : 10} }); SELECT * FROM users WHERE last_name=‘Smith’ AND age > 10;
  • 4. Computing Aggregations in Databases SQL-based RDBMS JOIN GROUP BY AVG(), COUNT(), SUM(), FIRST(), LAST(), etc. MongoDB 2.0 MapReduce MongoDB 2.2+ MapReduce Aggregation Framework
  • 5. MapReduce var map = function() { ... emit(key, val); } var reduce = function(key, vals) { ... return resultVal; } Data Map() emit(k,v) Sort(k) Group(k) Reduce(k,values) k,v Finalize(k,v) k,v MongoDB map iterates on documents Document is $this 1 at time per shard Input matches output Can run multiple times
  • 6. What’s wrong with just using MapReduce? Map/Reduce is very powerful, but often overkill Lots of users relying on it for simple aggregation tasks • •
  • 7. What’s wrong with just using MapReduce? Easy to screw up JavaScript Debugging a M/R job sucks Writing more JS for simple tasks should not be necessary • • • (ಠ︿ಠ)
  • 8. Aggregation Framework Declarative (no need to write JS) Implemented directly in C++ Expression Evaluation Return computed values Framework: We can extend it with new ops • • • • •
  • 10. db.article.aggregate( { $project : {author : 1,tags : 1}}, { $unwind : "$tags" }, { $group : {_id : “$tags”, authors:{ $addToSet:"$author"}} } ); An aggregation command looks like:
  • 11. db.article.aggregate( { $project : {author : 1, tags : 1}}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet:"$author"} }} ); New Helper Method: .aggregate() Operator pipeline db.runCommand({ aggregate : "article", pipeline : [ {$op1, $op2, ...} ] }
  • 12. { "result" : [ { "_id" : "art", "authors" : [ "bill", "bob" ] }, { "_id" : "sports", "authors" : [ "jane", "bob" ] }, { "_id" : "food", "authors" : [ "jane", "bob" ] }, { "_id" : "science", "authors" : [ "jane", "bill", "bob" ] } ], "ok" : 1 } Output Document Looks like this: result: array of pipeline output ok: 1 for success, 0 otherwise
  • 13. Pipeline Input to the start of the pipeline is a collection Series of operators - each one filters or transforms its input Passes output data to next operator in the pipeline Output of the pipeline is the result document • • • • ps -ax | tee processes.txt | more Kind of like UNIX:
  • 14. Let’s do: 1. Tour of the pipeline operators 2. A couple examples based on common SQL aggregation tasks $match $unwind $group $project $skip $limit $sort
  • 15. filters documents from pipeline with a query predicate filtered with: {$match: {author:”bob”}} $match {author: "bob", pageViews:5, title:"Lorem Ipsum..."} {author: "bill", pageViews:3, title:"dolor sit amet..."} {author: "joe", pageViews:52, title:"consectetur adipi..."} {author: "jane", pageViews:51, title:"sed diam..."} {author: "bob", pageViews:14, title:"magna aliquam..."} {author: "bob", pageViews:53, title:"claritas est..."} filtered with: {$match: {pageViews:{$gt:50}} {author:"bob",pageViews:5,title:"Lorem Ipsum..."} {author:"bob",pageViews:14,title:"magna aliquam..."} {author:"bob",pageViews:53,title:"claritas est..."} {author: "joe", pageViews:52, title:"consectetur adipiscing..."} {author: "jane", pageViews:51, title:"sed diam..."} {author: "bob", pageViews:53, title:"claritas est..."} Input:
  • 16. $unwind { "_id" : ObjectId("4f...146"), "author" : "bob", "tags" :[ "fun","good","awesome"] } explode the “tags” array with: { $unwind : ”$tags” } { _id : ObjectId("4f...146"), author : "bob", tags:"fun"}, { _id : ObjectId("4f...146"), author : "bob", tags:"good"}, { _id : ObjectId("4f...146"), author : "bob", tags:"awesome"} produces output: Produce a new document for each value in an input array
  • 17. Bucket a subset of docs together, calculate an aggregated output doc from the bucket $sum $max, $min $avg $first, $last $addToSet $push db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } } } ); $group Output Calculation Operators:
  • 18. db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } } } ); _id: selects a field to use as bucket key for grouping Output field name Operation used to calculate the output value ($sum, $max, $avg, etc.) $group (cont’d) dot notation (nested fields) a constant a multi-key expression inside {...} • • • also allowed here:
  • 19. An example with $match and $group SELECT SUM(price) FROM orders WHERE customer_id = 4; MongoDB: SQL: db.orders.aggregate( {$match : {“$customer_id” : 4}}, {$group : { _id : null, total: {$sum : “price”}}) English: Find the sum of all prices of the orders placed by customer #4
  • 20. An example with $unwind and $group MongoDB: SQL: English: db.posts.aggregate( { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }} ); For all tags used in blog posts, produce a list of authors that have posted under each tag SELECT tag, author FROM post_tags LEFT JOIN posts ON post_tags.post_id = posts.id GROUP BY tag, author;
  • 21. More operators - Controlling Pipeline Input $skip $limit $sort Similar to: .skip() .limit() .sort() in a regular Mongo query
  • 22. $sort specified the same way as index keys: { $sort : { name : 1, age: -1 } } Must be used in order to take advantage of $first/$last with $group. order input documents
  • 23. $limit limit the number of input documents {$limit : 5} $skip skips over documents {$skip : 5}
  • 24. $project Use for: Add, Remove, Pull up, Push down, Rename Fields Building computed fields Reshape a document
  • 25. $project (cont’d) Include or exclude fields {$project : { title : 1, author : 1} } Only pass on fields “title” and “author” {$project : { comments : 0} Exclude “comments” field, keep everything else
  • 26. Moving + Renaming fields {$project : { page_views : “$pageViews”, catName : “$category.name”, info : { published : “$ctime”, update : “$mtime” } } } Rename page_views to pageViews Take nested field “category.name”, move it into top-level field called “catName” Populate a new sub-document into the output $project (cont’d)
  • 27. db.article.aggregate( { $project : { name : 1, age_fixed : { $add:["$age", 2] } }} ); Building a Computed Field Output (computed field) Operands Expression $project (cont’d)
  • 28. Lots of Available Expressions $project (cont’d) Numeric $add $sub $mod $divide $multiply Logical $eq $lte/$lt $gte/$gt $and $not $or $eq Dates $dayOfMonth $dayOfYear $dayOfWeek $second $minute $hour $week $month $isoDate Strings $substr $add $toLower $toUpper $strcasecmp
  • 29. Example: $sort → $limit → $project→ $group MongoDB: SQL: English: Of the most recent 1000 blog posts, how many were posted within each calendar year? SELECT YEAR(pub_time) as pub_year, COUNT(*) FROM (SELECT pub_time FROM posts ORDER BY pub_time desc) GROUP BY pub_year; db.test.aggregate( {$sort : {pub_time: -1}}, {$limit : 1000}, {$project:{pub_year:{$year:["$pub_time"]}}}, {$group: {_id:"$pub_year", num_year:{$sum:1}}} )
  • 30. Some Usage Notes In BSON, order matters - so computed fields always show up after regular fields We use $ in front of field names to distinguish fields from string literals in expressions “$name” “name” vs.
  • 31. Some Usage Notes Use a $match,$sort and $limit first in pipeline if possible Cumulative Operators $group: be aware of memory usage Use $project to discard unneeded fields Remember the 16MB output limit
  • 32. Aggregation vs. MapReduce Framework is geared towards counting/accumulating If you need something more exotic, use MapReduce No 16MB constraint on output size with MapReduce JS in M/R is not limited to any fixed set of expressions • • • •
  • 33. thanks! ✌(-‿-)✌ questions? $$$ BTW: we are hiring! http://10gen.com/jobs $$$ @mpobrien github.com/mpobrien hit me up: