SlideShare a Scribd company logo
1 of 65
Exploring the
Aggregation Framework
Jason Mimick - Senior Consulting Engineer
jason.mimick@mongodb.com @jmimick
Original Slide Credits:
Jay Runkel jay.runkel@mongodb.com
et al
2
Warning or Whew
This is a “101” beginner talk!
Assuming you know some basics about
MongoDB
But basically nothing about the Aggregation
Framework
3
Agenda
1. Analytics in MongoDB?
2. Aggregation Framework
3. Aggregation Framework in Action
– US Census Data
– Aggregation Framework Options
4. New 3.2 stuff
– Friends of friends $lookup for self-joins
4
Analytics in MongoDB?
Create
Read
Update
Delete
Analytics
?
Group
Count
Derive Values
Filter
Average
Sort
5
For Example: US Census Data
• Census data from 1990, 2000, 2010
• Question:
Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
Division = a group of US States
Population density = Area of division/# of people
Data is provided at the state level
6
US Regions and Divisions
7
How would we solve this in SQL?
SELECT GROUP BY HAVING
Of course, we don’t have SQL
we’re a noSQL database
8
The Aggregation Framework
9
Core Concept: Pipeline
ps -ef | grep mongod
10
What is the Aggregation Pipeline?
A Series of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a cursor or a collection
Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
11
An Example Aggregation Pipeline
12
Syntax
>db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ])
mongo shell
1 db - variable pointing to current database
2 collection name
3 aggregate - method on collection
4 array of objects, each a pipeline operator
5 pipeline operators
1 2 3 4 ...5...
13
Syntax - Driver - Java
db.hospital.aggregate( [
{ "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } },
{ "$match" : { "count" : { "$gte" : 5 } } },
{ "$sort" : { "count" : -1 } } ] )
14
Some Popular Pipeline Operators
$match Filter documents
$project Reshape documents
$group Summarize documents
$unwind Expand arrays in documents
$sort Order documents
$limit/$skip Paginate documents
$redact Restrict documents
$geoNear Proximity sort documents
$let,$map Define variables
15
80+ operators available as of MongoDB 3.2
Aggregation Framework in Action
(let’s play with the census data)
17
cData Collection
• Document For Each State
– Name
– Region
– Division
• Census Data For 1990, 2000, 2010
– Population
– Housing Units
– Occupied Housing Units
• Census Data is an array with three subdocuments
18
Count, Distinct
• Check out cData docs
• count()
• distinct()
When you starting building your
aggregations you need to ‘get to know’ your
data!
19
Simple $group
Census data has a collection called regions
> db.regions.findOne()
{
"_id" : ObjectId("54d0e1ac28099359f5660f9f"),
"state" : "Connecticut",
"region" : "Northeast",
"regNum" : 1,
"division" : "New England",
"divNum" : 1
}
How can we find out how many states are in each
region?
20
> db.regions.aggregate( [
{ "$group" : { "_id" : "$region",
"count" : { "$sum" : 1 }
}
} ] )
{ "_id" : "West", "count" : 13 }
{ "_id" : "South", "count" : 17 }
{ "_id" : "Midwest", "count" : 12 }
{ "_id" : "Northeast", "count" : 9 }
// make more readable - store your pipeline ops in variables
>var group = { "$group" : { "_id" : "$region", "count" : {
"$sum" : 1 } } };
db.regions.aggregate( [ group ] )
21
$group
• Group documents by value
– _id - field reference, object,
constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
22
Total US Area
Back to cData…
Can we use $group to find the total area of the
US (according to these data)?
23
db.cData.aggregate([
{"$group" : {"_id" : null,
"totalArea" : {$sum : "$areaM"},
"avgArea" : {$avg : "$areaM"}
}
}])
{ "_id" : null,
"totalArea" : 3802067.0700000003,
"avgArea" : 73116.67442307693 }
24
Area By Regiondb.cData.aggregate([
{"$group" : {"_id" : "$region",
"totalArea" : {$sum : "$areaM"},
"avgArea" : {$avg : "$areaM"},
"numStates" : {$sum : 1},
"states" : {$push : "$name"}}}
])
{ "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of
Columbia", "Puerto Rico" ] }
{ "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states"
: [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut",
"Massachusetts", "New York" ] }
{ "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12,
"states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota",
"Kansas", "South Dakota", "Michigan", "Nebraska" ] }
{ "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13,
"states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New
Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] }
{ "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [
"Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana",
"North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ]
}
25
Calculating Average State Area By
Region
{ $group: {
_id: "$region",
avgAreaM: {$avg:
”$areaM" }
}}
{
_id: ”North East",
avgAreaM: 154
}
{
_id: “West",
avgAreaM: 300
}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”New Jersey",
areaM: 90,
region: “North East”
}
{
state: “California",
areaM: 300,
region: “West"
}
26
Calculating Total Area and State Count
{ $group: {
_id: "$region",
totArea: {$sum:
”$areaM" },
sCount : {$sum : 1}}}
{
_id: ”North East",
totArea: 308
sCount: 2}
{
_id: “West",
totArea: 300,
sCount: 1}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”New Jersey",
areaM: 90,
region: “North East”
}
{
state: “California",
area: 300,
region: “West"
}
27
Total US Population By Year
db.cData.aggregate(
[{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {$sum : "$data.totalPop"}}},
{$sort : {"totalPop" : 1}}
])
{ "_id" : 1990, "totalPop" : 248709873 }
{ "_id" : 2000, "totalPop" : 281421906 }
{ "_id" : 2010, "totalPop" : 312471327 }
28
$unwind
• Flattens arrays
• Create documents from array elements
• Array replaced by element value
• Missing/empty fields → no output
• Non-array fields → error
• Pipe to $group to aggregate
{ "a" : "foo", "b" : [1, 2, 3] }
{ "a" : "foo", "b" : 1 }
{ "a" : "foo", "b" : 2 }
{ "a" : "foo", "b" : 3 }
29
$unwind
{ $unwind: $census }
{ state: “New York,
census: 1990}
{
state: ”New York",
census: [1990, 2000,
2010]
}
{
state: ”New Jersey",
census: [1990, 2000]
}
{
state: “California",
census: [1980, 1990,
2000, 2010]
}
{
state: ”Delaware",
census: [1990, 2000]
}
{ state: “New York,
census: 2000}
{ state: “New York,
census: 2010}
{ state: “New Jersey,
census: 1990}
{ state: “New Jersey,
census: 2000}
…
30
Southern State Population By Year
db.cData.aggregate(
[{$match : {"region" : "South"}},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop” : {"$sum” :
"$data.totalPop"}}}])
{ "_id" : 2010, "totalPop" : 113954021 }
{ "_id" : 2000, "totalPop" : 99664761 }
{ "_id" : 1990, "totalPop" : 84839030 }
31
$match
• Filter documents
–Uses existing query syntax
32
$match
{ $match:
{ “region” : “West” }
}
{
state: ”New York",
areaM: 218,
region: “North East"
}
{
state: ”Oregon",
areaM: 245,
region: “West”
}
{
state: “California",
area: 300,
region: “West"
}
{
state: ”Oregon",
areaM: 245,
region: “West”
}
{
state: “California",
area: 300,
region: “West"
}
33
Population Delta By State from 1990 to
2010
db.cData.aggregate([
{$unwind : "$data"},
{$sort : {"data.year" : 1}},
{$group :{"_id" : "$name",
"pop1990" : {"$first" : "$data.totalPop"},
"pop2010" : {"$last" : "$data.totalPop"}}},
{$project : {"_id" : 0, "name" : "$_id",
"delta" : {"$subtract" : ["$pop2010",
"$pop1990"]}, "pop1990" :
1,
"pop2010” : 1}
}
])
34
{ "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 }
{ "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 }
{ "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 }
{ "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 }
{ "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 }
{ "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 }
{ "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 }
{ "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 }
{ "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 }
{ "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 }
{ "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218
}
{ "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 }
{ "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204
}
{ "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" :
2906846 }
{ "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 }
{ "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 }
{ "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935
}
{ "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 }
{ "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736
}
{ "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }
35
$sort, $limit, $skip
• Sort documents by one or more
fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to
return
– In-memory unless early and indexed
• Limit and skip follow cursor
behavior
36
$first, $last
• Collection operations like $push and
$addToSet
• Must be used in $group
• $first and $last determined by document
order
• Typically used with $sort to ensure ordering is
known
37
$project
• Reshape/Transform Documents
– Include, exclude or rename fields
– Inject computed fields
– Create sub-document fields
38
Including and Excluding Fields
{ $project:
{ “_id” : 0,
“pop1990” : 1,
“pop2010” : 1
}
{
"_id" : "Virginia”,
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"_id" : "South Dakota",
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"pop1990" : 453588,
"pop2010" : 3725789
}
{
"pop1990" : 453588,
"pop2010" : 3725789
}
39
{
”name" : “South Dakota”,
”delta" : 118176
}
Renaming and Computing Fields
{ $project:
{ “_id” : 0,
“pop1990” : 0,
“pop2010” : 0,
“name” : “$_id”,
"delta" :
{"$subtract" :
["$pop2010",
"$pop1990"]}}
}
{
"_id" : "Virginia”,
"pop1990" : 6187358,
"pop2010" : 8001024
}
{
"_id" : "South Dakota",
"pop1990" : 696004,
"pop2010" : 814180
}
{
”name" : “Virginia”,
”delta" : 1813666
}
40
Compare number of people living within
500KM of Memphis, TN in 1990, 2000, 2010
41
Compare number of people living within
500KM of Memphis, TN in 1990, 2000, 2010
db.cData.aggregate([
{$geoNear : {
"near" : {"type" : "Point", "coordinates" : [90, 35]},
“distanceField” : "dist.calculated",
“maxDistance” : 500000,
“includeLocs” : "dist.location",
“spherical” : true }},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {"$sum" : "$data.totalPop"},
"states" : {"$addToSet" : "$name"}}},
{$sort : {"_id" : 1}}
])
42
{ "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
{ "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama",
"Tennessee", "Mississippi", "Arkansas" ] }
43
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
44
{
"_id" : ”Tennessee",
"pop1990" : 4877185,
"pop2010" : 6346105,
“center” :
{“type” : “Point”,
“coordinates” :
[86.6,
37.8]}
}
{
"_id" : "Virginia”,
"pop1990" : 6187358,
"pop2010" : 8001024,
“center” :
{“type” : “Point”,
“coordinates” :
[78.6,
37.5]}
}
$geoNear
{$geoNear : {
"near”: {"type”: "Point",
"coordinates”:
[90, 35]},
maxDistance : 500000,
spherical : true }}
{
"_id" : ”Tennessee",
"pop1990" : 4877185,
"pop2010" : 6346105,
“center” :
{“type” : “Point”,
“coordinates” :
[86.6,
37.8]}
}
45
What if I want to save the results to a
collection?
db.cData.aggregate([
{$geoNear : {
"near" : {"type" : "Point", "coordinates" : [90, 35]},
“distanceField” : "dist.calculated",
“maxDistance” : 500000,
“includeLocs” : "dist.location",
“spherical” : true }},
{$unwind : "$data"},
{$group : {"_id" : "$data.year",
"totalPop" : {"$sum" : "$data.totalPop"},
"states" : {"$addToSet" : "$name"}}},
{$sort : {"_id" : 1}},
{$out : “peopleNearMemphis”}
])
46
$out
db.cData.aggregate([ <pipeline stages>,
{“$out” :“resultsCollection”}])
• Save aggregation results to a new collection
• NOTE: Overwrites any data existing in collection
• Transform documents - ETL
47
Back To The Original Question
• Which US Division has the fastest growing population density?
– We only want to include data states with more than 1M people
– We only want to include divisions larger than 100K square miles
48
Division with Fastest Growing Pop
Densitydb.cData.aggregate(
[{$match : {"data.totalPop" : {"$gt" : 1000000}}},
{$unwind : "$data"},
{$sort : {"data.year" : 1}},
{$group : {"_id" : "$name",
"pop1990" : {"$first" : "$data.totalPop"},
"pop2010" : {"$last" : "$data.totalPop"},
"areaM" : {"$first" : "$areaM"},
"division" : {"$first" : "$division"}}},
{$group : {"_id" : "$division",
"totalPop1990" : {"$sum" : "$pop1990"},
"totalPop2010" : {"$sum" : "$pop2010"},
"totalAreaM" : {"$sum" : "$areaM"}}},
{$match : {"totalAreaM" : {"$gt" : 100000}}},
{$project : {"_id" : 0,
"division" : "$_id",
"density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]},
"density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]},
"denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]},
"totalAreaM" : 1,
"totalPop1990" : 1,
"totalPop2010" : 1}},
{$sort : {"denDelta" : -1}}])
49
{ "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997,
"division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" :
200.6566049221612, "denDelta" : 55.03359806413451 }
{ "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995,
"division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658,
"denDelta" : 30.765371019911385 }
{ "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" :
"Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674,
"denDelta" : 29.90973998350529 }
{ "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" :
"West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626,
"denDelta" : 21.716845736155996 }
{ "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" :
"East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348,
"denDelta" : 17.754371635499567 }
{ "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" :
"East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866,
"denDelta" : 14.641944911508176 }
{ "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" :
"Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta"
: 13.101876233449651 }
{ "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" :
"West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449,
"denDelta" : 7.230812757118798 }
Aggregate Options
51
Aggregate options
db.cData.aggregate([<pipeline stages>],
{‘explain’ : false
'allowDiskUse' : true,
'cursor' : {'batchSize' : 5}})
explain – similar to find().explain()
allowDiskUse – enable use of disk to store intermediate
results
cursor – specify the size of the initial result
New things in 3.2
53
$sample
{ $sample: { size: <positive integer> } }
● If WT - pseudo-random cursor to return
docs
● If MMAPv1 - uses _id index to randomly
select docs
Used by Compass, Useful for unit tests, etc
54
$lookup
• Performs a left outer join to another collection in the same database to filter in
documents from the “joined” collection for processing.
• To each input document, the $lookup stage adds a new array field whose
elements are the matching documents from the “joined” collection.
{
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
CANNOT BE SHARDED
https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
55
• Sample data:
> db.data.find()
{ "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 }
{ "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 }
{ "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 }
{ "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 }
> db.keys.find()
{ "_id" : 0, "name" : "East Meter" }
{ "_id" : 1, "name" : "Central Meter 12" }
{ "_id" : 2, "name" : "New HIFI Monitor" }
56
• Try to find ave “v” value but lookup name of “k”
db.data.aggregate( [
{ "$lookup" : {
"from" : "keys",
"localField" : "k",
"foreignField" : "_id",
"as" : "name" }
},
{ "$unwind" : "$name" },
{ "$project" : {
"k" : "$k",
"name" : "$name.name",
"v" : "$v" }
},
{ "$group" : {
"_id" : "$name",
"aveValue" : { "$avg" : "$v" }
}
},
{ "$project" : {
"_id" : 0,
"name" : "$_id",
"aveValue" : "$aveValue" }
}
]);
{ "aveValue" : 277.5, "name" : "New HIFI Monitor"}
{ "aveValue" : 559, "name" : "Central Meter 12"}
{ "aveValue" : 391, "name" : "East Meter"}
57
friends of friends
Use $lookup to perform "self-joins" for graph problems.
Simple case: find the friends of someone's friends
Can extend this to find cliques, paths, etc.
Dataset:
{ "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE",
"MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA",
"DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA",
"JOAN", "HIRAM" ] }
{ "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI",
"KESHIA" ] }
...
58
59
don't forget your indexes…
Running FOF.friendsOfFriends(1)
2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command:
aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind:
"$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField:
"name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind:
"$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, {
$project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740
keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount:
{ r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: {
r: 562 } } } protocol:op_command 48ms
with indexes { "friends" : 1 } & { "name" : 1 }:
2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command:
aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind:
"$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField:
"name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind:
"$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, {
$project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824
keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount:
{ r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r:
16 } } } protocol:op_command 2ms
60
lots of new mathematical operators
$stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> }
$stdDevPop Calculates population standard deviation. { $stdDevPop: <array> }
$sqrt Calculates the square root. { $sqrt: <number> }
$abs Returns the absolute value of a number. { $abs: <number> }
$log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] }
$log10 Calculates the log base 10 of a number. { $log10: <number> }
$ln Calculates the natural log of a number. { $ln: <number> }
$pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] }
$exp Raises e to the specified exponent. { $exp: <number> }
$trunc Truncates a number to its integer. { $trunc: <number> }
$ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>}
$floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
61
new array operators
$slice Returns a subset of an array.
{ $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] }
$arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx>
] }
$concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]}
$isArray Determines if the operand is an array. { $isArray: [ <expression> ] }
$filter Selects a subset of the array based on the condition.
{
$filter:
{
input: <array>,
as: <string>,
cond: <expression>
}
}
Summary
63
Analytics in MongoDB?
Create
Read
Update
Delete
Analytics
?
Group
Count
Derive Values
Filter
Average
Sort
YES!
64
Framework Use Cases
• Basic aggregation queries
• Ad-hoc reporting
• Real-time analytics
• Visualizing and reshaping data
Questions?
Thanks for attending & happy aggregating
Please complete survey
jason.mimick@mongodb.com
@jmimick

More Related Content

What's hot

Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipelinezahid-mian
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichNorberto Leite
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDBKishor Parkhe
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationJoe Drumgoole
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation Amit Ghosh
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorHenrik Ingo
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2MongoDB
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDBantoinegirbal
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNosh Petigara
 

What's hot (20)

Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
Aggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days MunichAggregation Framework MongoDB Days Munich
Aggregation Framework MongoDB Days Munich
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Aggregation in MongoDB
Aggregation in MongoDBAggregation in MongoDB
Aggregation in MongoDB
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
 
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Debugging MongoDB Performance
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Viewers also liked

MongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
 

Viewers also liked (6)

MongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same treeMongoDB and Spring - Two leaves of a same tree
MongoDB and Spring - Two leaves of a same tree
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
MongoDB 3.4 webinar
MongoDB 3.4 webinarMongoDB 3.4 webinar
MongoDB 3.4 webinar
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkBack to Basics Webinar 5: Introduction to the Aggregation Framework
Back to Basics Webinar 5: Introduction to the Aggregation Framework
 

Similar to Webinar: Exploring the Aggregation Framework

Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
De normalised london aggregation framework overview
De normalised london  aggregation framework overview De normalised london  aggregation framework overview
De normalised london aggregation framework overview Chris Harris
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api TrainingSpark Summit
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelineMongoDB
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingManish Kapoor
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developersTaras Romanyk
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6Maxime Beugnet
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc groupJohn Ragan
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB AggregationMongoDB
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
Gov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsGov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsXavier Badosa
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregationsenterprisesearchmeetup
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !Sébastien Prunier
 

Similar to Webinar: Exploring the Aggregation Framework (20)

MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
Starting out with MongoDB
Starting out with MongoDBStarting out with MongoDB
Starting out with MongoDB
 
De normalised london aggregation framework overview
De normalised london  aggregation framework overview De normalised london  aggregation framework overview
De normalised london aggregation framework overview
 
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDBMongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Introduction to MongoDB for C# developers
Introduction to MongoDB for C# developersIntroduction to MongoDB for C# developers
Introduction to MongoDB for C# developers
 
How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6How to leverage what's new in MongoDB 3.6
How to leverage what's new in MongoDB 3.6
 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
 
Mongo db 101 dc group
Mongo db 101 dc groupMongo db 101 dc group
Mongo db 101 dc group
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
Gov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official StatisticsGov APIs: The Notorious Case of Official Statistics
Gov APIs: The Notorious Case of Official Statistics
 
ElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to AggregationsElasticSearch - Introduction to Aggregations
ElasticSearch - Introduction to Aggregations
 
MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !MongoDB Aggregation Framework in action !
MongoDB Aggregation Framework in action !
 
Talk MongoDB - Amil
Talk MongoDB - AmilTalk MongoDB - Amil
Talk MongoDB - Amil
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Webinar: Exploring the Aggregation Framework

  • 1. Exploring the Aggregation Framework Jason Mimick - Senior Consulting Engineer jason.mimick@mongodb.com @jmimick Original Slide Credits: Jay Runkel jay.runkel@mongodb.com et al
  • 2. 2 Warning or Whew This is a “101” beginner talk! Assuming you know some basics about MongoDB But basically nothing about the Aggregation Framework
  • 3. 3 Agenda 1. Analytics in MongoDB? 2. Aggregation Framework 3. Aggregation Framework in Action – US Census Data – Aggregation Framework Options 4. New 3.2 stuff – Friends of friends $lookup for self-joins
  • 5. 5 For Example: US Census Data • Census data from 1990, 2000, 2010 • Question: Which US Division has the fastest growing population density? – We only want to include data states with more than 1M people – We only want to include divisions larger than 100K square miles Division = a group of US States Population density = Area of division/# of people Data is provided at the state level
  • 6. 6 US Regions and Divisions
  • 7. 7 How would we solve this in SQL? SELECT GROUP BY HAVING Of course, we don’t have SQL we’re a noSQL database
  • 9. 9 Core Concept: Pipeline ps -ef | grep mongod
  • 10. 10 What is the Aggregation Pipeline? A Series of Document Transformations – Executed in stages – Original input is a collection – Output as a cursor or a collection Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order
  • 12. 12 Syntax >db.foo.aggregate( [ { stage1 },{ stage2 },{ stage3 }, … ]) mongo shell 1 db - variable pointing to current database 2 collection name 3 aggregate - method on collection 4 array of objects, each a pipeline operator 5 pipeline operators 1 2 3 4 ...5...
  • 13. 13 Syntax - Driver - Java db.hospital.aggregate( [ { "$group" : { "_id" : "$PatientID, "count" : { "$sum" : 1 } } }, { "$match" : { "count" : { "$gte" : 5 } } }, { "$sort" : { "count" : -1 } } ] )
  • 14. 14 Some Popular Pipeline Operators $match Filter documents $project Reshape documents $group Summarize documents $unwind Expand arrays in documents $sort Order documents $limit/$skip Paginate documents $redact Restrict documents $geoNear Proximity sort documents $let,$map Define variables
  • 15. 15 80+ operators available as of MongoDB 3.2
  • 16. Aggregation Framework in Action (let’s play with the census data)
  • 17. 17 cData Collection • Document For Each State – Name – Region – Division • Census Data For 1990, 2000, 2010 – Population – Housing Units – Occupied Housing Units • Census Data is an array with three subdocuments
  • 18. 18 Count, Distinct • Check out cData docs • count() • distinct() When you starting building your aggregations you need to ‘get to know’ your data!
  • 19. 19 Simple $group Census data has a collection called regions > db.regions.findOne() { "_id" : ObjectId("54d0e1ac28099359f5660f9f"), "state" : "Connecticut", "region" : "Northeast", "regNum" : 1, "division" : "New England", "divNum" : 1 } How can we find out how many states are in each region?
  • 20. 20 > db.regions.aggregate( [ { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } } ] ) { "_id" : "West", "count" : 13 } { "_id" : "South", "count" : 17 } { "_id" : "Midwest", "count" : 12 } { "_id" : "Northeast", "count" : 9 } // make more readable - store your pipeline ops in variables >var group = { "$group" : { "_id" : "$region", "count" : { "$sum" : 1 } } }; db.regions.aggregate( [ group ] )
  • 21. 21 $group • Group documents by value – _id - field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default
  • 22. 22 Total US Area Back to cData… Can we use $group to find the total area of the US (according to these data)?
  • 23. 23 db.cData.aggregate([ {"$group" : {"_id" : null, "totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"} } }]) { "_id" : null, "totalArea" : 3802067.0700000003, "avgArea" : 73116.67442307693 }
  • 24. 24 Area By Regiondb.cData.aggregate([ {"$group" : {"_id" : "$region", "totalArea" : {$sum : "$areaM"}, "avgArea" : {$avg : "$areaM"}, "numStates" : {$sum : 1}, "states" : {$push : "$name"}}} ]) { "_id" : null, "totalArea" : 5393.18, "avgArea" : 2696.59, "numStates" : 2, "states" : [ "District of Columbia", "Puerto Rico" ] } { "_id" : "Northeast", "totalArea" : 181319.86, "avgArea" : 20146.65111111111, "numStates" : 9, "states" : [ "New Jersey", "Vermont", "Maine", "New Hampshire", "Rhode Island", "Pennsylvania", "Connecticut", "Massachusetts", "New York" ] } { "_id" : "Midwest", "totalArea" : 821724.3700000001, "avgArea" : 68477.03083333334, "numStates" : 12, "states" : [ "Iowa", "Missouri", "Ohio", "Indiana", "North Dakota", "Wisconsin", "Illinois", "Minnesota", "Kansas", "South Dakota", "Michigan", "Nebraska" ] } { "_id" : "West", "totalArea" : 1873251.6300000001, "avgArea" : 144096.27923076923, "numStates" : 13, "states" : [ "Colorado", "Wyoming", "California", "Utah", "Nevada", "Alaska", "Hawaii", "Montana", "New Mexico", "Arizona", "Idaho", "Oregon", "Washington" ] } { "_id" : "South", "totalArea" : 920378.03, "avgArea" : 57523.626875, "numStates" : 16, "states" : [ "Alabama", "Georgia", "Maryland", "South Carolina", "Florida", "Mississippi", "Arkansas", "Louisiana", "North Carolina", "Texas", "West Virginia", "Oklahoma", "Virginia", "Delaware", "Kentucky", "Tennessee" ] }
  • 25. 25 Calculating Average State Area By Region { $group: { _id: "$region", avgAreaM: {$avg: ”$areaM" } }} { _id: ”North East", avgAreaM: 154 } { _id: “West", avgAreaM: 300 } { state: ”New York", areaM: 218, region: “North East" } { state: ”New Jersey", areaM: 90, region: “North East” } { state: “California", areaM: 300, region: “West" }
  • 26. 26 Calculating Total Area and State Count { $group: { _id: "$region", totArea: {$sum: ”$areaM" }, sCount : {$sum : 1}}} { _id: ”North East", totArea: 308 sCount: 2} { _id: “West", totArea: 300, sCount: 1} { state: ”New York", areaM: 218, region: “North East" } { state: ”New Jersey", areaM: 90, region: “North East” } { state: “California", area: 300, region: “West" }
  • 27. 27 Total US Population By Year db.cData.aggregate( [{$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {$sum : "$data.totalPop"}}}, {$sort : {"totalPop" : 1}} ]) { "_id" : 1990, "totalPop" : 248709873 } { "_id" : 2000, "totalPop" : 281421906 } { "_id" : 2010, "totalPop" : 312471327 }
  • 28. 28 $unwind • Flattens arrays • Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error • Pipe to $group to aggregate { "a" : "foo", "b" : [1, 2, 3] } { "a" : "foo", "b" : 1 } { "a" : "foo", "b" : 2 } { "a" : "foo", "b" : 3 }
  • 29. 29 $unwind { $unwind: $census } { state: “New York, census: 1990} { state: ”New York", census: [1990, 2000, 2010] } { state: ”New Jersey", census: [1990, 2000] } { state: “California", census: [1980, 1990, 2000, 2010] } { state: ”Delaware", census: [1990, 2000] } { state: “New York, census: 2000} { state: “New York, census: 2010} { state: “New Jersey, census: 1990} { state: “New Jersey, census: 2000} …
  • 30. 30 Southern State Population By Year db.cData.aggregate( [{$match : {"region" : "South"}}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop” : {"$sum” : "$data.totalPop"}}}]) { "_id" : 2010, "totalPop" : 113954021 } { "_id" : 2000, "totalPop" : 99664761 } { "_id" : 1990, "totalPop" : 84839030 }
  • 31. 31 $match • Filter documents –Uses existing query syntax
  • 32. 32 $match { $match: { “region” : “West” } } { state: ”New York", areaM: 218, region: “North East" } { state: ”Oregon", areaM: 245, region: “West” } { state: “California", area: 300, region: “West" } { state: ”Oregon", areaM: 245, region: “West” } { state: “California", area: 300, region: “West" }
  • 33. 33 Population Delta By State from 1990 to 2010 db.cData.aggregate([ {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group :{"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}}}, {$project : {"_id" : 0, "name" : "$_id", "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}, "pop1990" : 1, "pop2010” : 1} } ])
  • 34. 34 { "pop1990" : 3725789, "pop2010" : 3725789, "name" : "Puerto Rico", "delta" : 0 } { "pop1990" : 4866692, "pop2010" : 6724540, "name" : "Washington", "delta" : 1857848 } { "pop1990" : 4877185, "pop2010" : 6346105, "name" : "Tennessee", "delta" : 1468920 } { "pop1990" : 1227928, "pop2010" : 1328361, "name" : "Maine", "delta" : 100433 } { "pop1990" : 1006749, "pop2010" : 1567582, "name" : "Idaho", "delta" : 560833 } { "pop1990" : 1108229, "pop2010" : 1360301, "name" : "Hawaii", "delta" : 252072 } { "pop1990" : 3665228, "pop2010" : 6392017, "name" : "Arizona", "delta" : 2726789 } { "pop1990" : 638800, "pop2010" : 672591, "name" : "North Dakota", "delta" : 33791 } { "pop1990" : 6187358, "pop2010" : 8001024, "name" : "Virginia", "delta" : 1813666 } { "pop1990" : 550043, "pop2010" : 710231, "name" : "Alaska", "delta" : 160188 } { "pop1990" : 1109252, "pop2010" : 1316470, "name" : "New Hampshire", "delta" : 207218 } { "pop1990" : 10847115, "pop2010" : 11536504, "name" : "Ohio", "delta" : 689389 } { "pop1990" : 6016425, "pop2010" : 6547629, "name" : "Massachusetts", "delta" : 531204 } { "pop1990" : 6628637, "pop2010" : 9535483, "name" : "North Carolina", "delta" : 2906846 } { "pop1990" : 3287116, "pop2010" : 3574097, "name" : "Connecticut", "delta" : 286981 } { "pop1990" : 17990455, "pop2010" : 19378102, "name" : "New York", "delta" : 1387647 } { "pop1990" : 29760021, "pop2010" : 37253956, "name" : "California", "delta" : 7493935 } { "pop1990" : 16986510, "pop2010" : 25145561, "name" : "Texas", "delta" : 8159051 } { "pop1990" : 11881643, "pop2010" : 12702379, "name" : "Pennsylvania", "delta" : 820736 } { "pop1990" : 2842321, "pop2010" : 3831074, "name" : "Oregon", "delta" : 988753 }
  • 35. 35 $sort, $limit, $skip • Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior
  • 36. 36 $first, $last • Collection operations like $push and $addToSet • Must be used in $group • $first and $last determined by document order • Typically used with $sort to ensure ordering is known
  • 37. 37 $project • Reshape/Transform Documents – Include, exclude or rename fields – Inject computed fields – Create sub-document fields
  • 38. 38 Including and Excluding Fields { $project: { “_id” : 0, “pop1990” : 1, “pop2010” : 1 } { "_id" : "Virginia”, "pop1990" : 453588, "pop2010" : 3725789 } { "_id" : "South Dakota", "pop1990" : 453588, "pop2010" : 3725789 } { "pop1990" : 453588, "pop2010" : 3725789 } { "pop1990" : 453588, "pop2010" : 3725789 }
  • 39. 39 { ”name" : “South Dakota”, ”delta" : 118176 } Renaming and Computing Fields { $project: { “_id” : 0, “pop1990” : 0, “pop2010” : 0, “name” : “$_id”, "delta" : {"$subtract" : ["$pop2010", "$pop1990"]}} } { "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024 } { "_id" : "South Dakota", "pop1990" : 696004, "pop2010" : 814180 } { ”name" : “Virginia”, ”delta" : 1813666 }
  • 40. 40 Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010
  • 41. 41 Compare number of people living within 500KM of Memphis, TN in 1990, 2000, 2010 db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}} ])
  • 42. 42 { "_id" : 1990, "totalPop" : 22644082, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] } { "_id" : 2000, "totalPop" : 25291421, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] } { "_id" : 2010, "totalPop" : 27337350, "states" : [ "Kentucky", "Missouri", "Alabama", "Tennessee", "Mississippi", "Arkansas" ] }
  • 43. 43 $geoNear • Order/Filter Documents by Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage
  • 44. 44 { "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” : [86.6, 37.8]} } { "_id" : "Virginia”, "pop1990" : 6187358, "pop2010" : 8001024, “center” : {“type” : “Point”, “coordinates” : [78.6, 37.5]} } $geoNear {$geoNear : { "near”: {"type”: "Point", "coordinates”: [90, 35]}, maxDistance : 500000, spherical : true }} { "_id" : ”Tennessee", "pop1990" : 4877185, "pop2010" : 6346105, “center” : {“type” : “Point”, “coordinates” : [86.6, 37.8]} }
  • 45. 45 What if I want to save the results to a collection? db.cData.aggregate([ {$geoNear : { "near" : {"type" : "Point", "coordinates" : [90, 35]}, “distanceField” : "dist.calculated", “maxDistance” : 500000, “includeLocs” : "dist.location", “spherical” : true }}, {$unwind : "$data"}, {$group : {"_id" : "$data.year", "totalPop" : {"$sum" : "$data.totalPop"}, "states" : {"$addToSet" : "$name"}}}, {$sort : {"_id" : 1}}, {$out : “peopleNearMemphis”} ])
  • 46. 46 $out db.cData.aggregate([ <pipeline stages>, {“$out” :“resultsCollection”}]) • Save aggregation results to a new collection • NOTE: Overwrites any data existing in collection • Transform documents - ETL
  • 47. 47 Back To The Original Question • Which US Division has the fastest growing population density? – We only want to include data states with more than 1M people – We only want to include divisions larger than 100K square miles
  • 48. 48 Division with Fastest Growing Pop Densitydb.cData.aggregate( [{$match : {"data.totalPop" : {"$gt" : 1000000}}}, {$unwind : "$data"}, {$sort : {"data.year" : 1}}, {$group : {"_id" : "$name", "pop1990" : {"$first" : "$data.totalPop"}, "pop2010" : {"$last" : "$data.totalPop"}, "areaM" : {"$first" : "$areaM"}, "division" : {"$first" : "$division"}}}, {$group : {"_id" : "$division", "totalPop1990" : {"$sum" : "$pop1990"}, "totalPop2010" : {"$sum" : "$pop2010"}, "totalAreaM" : {"$sum" : "$areaM"}}}, {$match : {"totalAreaM" : {"$gt" : 100000}}}, {$project : {"_id" : 0, "division" : "$_id", "density1990" : {"$divide" : ["$totalPop1990", "$totalAreaM"]}, "density2010" : {"$divide" : ["$totalPop2010", "$totalAreaM"]}, "denDelta" : {"$subtract" : [{"$divide" : ["$totalPop2010", "$totalAreaM"]},{"$divide" : ["$totalPop1990”,"$totalAreaM"]}]}, "totalAreaM" : 1, "totalPop1990" : 1, "totalPop2010" : 1}}, {$sort : {"denDelta" : -1}}])
  • 49. 49 { "totalPop1990" : 42293785, "totalPop2010" : 58277380, "totalAreaM" : 290433.39999999997, "division" : "South Atlantic", "density1990" : 145.62300685802668, "density2010" : 200.6566049221612, "denDelta" : 55.03359806413451 } { "totalPop1990" : 38577263, "totalPop2010" : 49169871, "totalAreaM" : 344302.94999999995, "division" : "Pacific", "density1990" : 112.0445322934352, "density2010" : 142.80990331334658, "denDelta" : 30.765371019911385 } { "totalPop1990" : 37602286, "totalPop2010" : 40872375, "totalAreaM" : 109331.91, "division" : "Mid-Atlantic", "density1990" : 343.9278249140621, "density2010" : 373.8375648975674, "denDelta" : 29.90973998350529 } { "totalPop1990" : 26702793, "totalPop2010" : 36346202, "totalAreaM" : 444052.01, "division" : "West South Central", "density1990" : 60.134381555890265, "density2010" : 81.85122729204626, "denDelta" : 21.716845736155996 } { "totalPop1990" : 15176284, "totalPop2010" : 18432505, "totalAreaM" : 183403.9, "division" : "East South Central", "density1990" : 82.74788049763391, "density2010" : 100.50225213313348, "denDelta" : 17.754371635499567 } { "totalPop1990" : 42008942, "totalPop2010" : 46421564, "totalAreaM" : 301368.57, "division" : "East North Central", "density1990" : 139.39390560867048, "density2010" : 154.03585052017866, "denDelta" : 14.641944911508176 } { "totalPop1990" : 12406123, "totalPop2010" : 20512410, "totalAreaM" : 618711.92, "division" : "Mountain", "density1990" : 20.051533838236054, "density2010" : 33.153410071685705, "denDelta" : 13.101876233449651 } { "totalPop1990" : 16324886, "totalPop2010" : 19018666, "totalAreaM" : 372541.8, "division" : "West North Central", "density1990" : 43.820280033005695, "density2010" : 51.05109279012449, "denDelta" : 7.230812757118798 }
  • 51. 51 Aggregate options db.cData.aggregate([<pipeline stages>], {‘explain’ : false 'allowDiskUse' : true, 'cursor' : {'batchSize' : 5}}) explain – similar to find().explain() allowDiskUse – enable use of disk to store intermediate results cursor – specify the size of the initial result
  • 53. 53 $sample { $sample: { size: <positive integer> } } ● If WT - pseudo-random cursor to return docs ● If MMAPv1 - uses _id index to randomly select docs Used by Compass, Useful for unit tests, etc
  • 54. 54 $lookup • Performs a left outer join to another collection in the same database to filter in documents from the “joined” collection for processing. • To each input document, the $lookup stage adds a new array field whose elements are the matching documents from the “joined” collection. { $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } } CANNOT BE SHARDED https://docs.mongodb.org/master/reference/operator/aggregation/lookup/
  • 55. 55 • Sample data: > db.data.find() { "_id" : ObjectId("565e759ae6f9919371a53896"), "v" : 14, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a53897"), "v" : 664, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a53898"), "v" : 701, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a53899"), "v" : 312, "k" : 1 } { "_id" : ObjectId("565e759ae6f9919371a5389a"), "v" : 10, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389b"), "v" : 686, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a5389c"), "v" : 669, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389d"), "v" : 273, "k" : 2 } { "_id" : ObjectId("565e759ae6f9919371a5389e"), "v" : 473, "k" : 0 } { "_id" : ObjectId("565e759ae6f9919371a5389f"), "v" : 158, "k" : 2 } > db.keys.find() { "_id" : 0, "name" : "East Meter" } { "_id" : 1, "name" : "Central Meter 12" } { "_id" : 2, "name" : "New HIFI Monitor" }
  • 56. 56 • Try to find ave “v” value but lookup name of “k” db.data.aggregate( [ { "$lookup" : { "from" : "keys", "localField" : "k", "foreignField" : "_id", "as" : "name" } }, { "$unwind" : "$name" }, { "$project" : { "k" : "$k", "name" : "$name.name", "v" : "$v" } }, { "$group" : { "_id" : "$name", "aveValue" : { "$avg" : "$v" } } }, { "$project" : { "_id" : 0, "name" : "$_id", "aveValue" : "$aveValue" } } ]); { "aveValue" : 277.5, "name" : "New HIFI Monitor"} { "aveValue" : 559, "name" : "Central Meter 12"} { "aveValue" : 391, "name" : "East Meter"}
  • 57. 57 friends of friends Use $lookup to perform "self-joins" for graph problems. Simple case: find the friends of someone's friends Can extend this to find cliques, paths, etc. Dataset: { "_id" : 1, "name" : "FLOYD", "friends" : [ "BILLIE", "MARGENE", "HERMINIA", "LACRESHA", "SHAUN", "INOCENCIA", "DEANA", "MARAGRET", "MICHELE", "KARLENE", "KASSANDRA", "JOAN", "HIRAM" ] } { "_id" : 2, "name" : "ELIDA", "friends" : [ "ALI", "KESHIA" ] } ...
  • 58. 58
  • 59. 59 don't forget your indexes… Running FOF.friendsOfFriends(1) 2016-01-26T10:19:41.201-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:42505581740 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 1124 } }, Database: { acquireCount: { r: 562 } }, Collection: { acquireCount: { r: 562 } } } protocol:op_command 48ms with indexes { "friends" : 1 } & { "name" : 1 }: 2016-01-26T10:17:45.167-0500 I COMMAND [conn6] command friendship.friends command: aggregate { aggregate: "friends", pipeline: [ { $match: { _id: 1.0 } }, { $unwind: "$friends" }, { $lookup: { from: "friends", localField: "friends", foreignField: "name", as: "friendsOfFriends" } }, { $unwind: "$friendsOfFriends" }, { $unwind: "$friendsOfFriends.friends" }, { $group: { _id: "$friendsOfFriends.friends" } }, { $project: { friendOfFriend: "$_id", _id: 0.0 } } ], cursor: {} } cursorid:39053867824 keyUpdates:0 writeConflicts:0 numYields:0 reslen:3722 locks:{ Global: { acquireCount: { r: 32 } }, Database: { acquireCount: { r: 16 } }, Collection: { acquireCount: { r: 16 } } } protocol:op_command 2ms
  • 60. 60 lots of new mathematical operators $stdDevSamp Calculates standard deviation. { $stdDevSamp: <array> } $stdDevPop Calculates population standard deviation. { $stdDevPop: <array> } $sqrt Calculates the square root. { $sqrt: <number> } $abs Returns the absolute value of a number. { $abs: <number> } $log Calculates the log of a number in the specified base. { $log: [ <number>, <base> ] } $log10 Calculates the log base 10 of a number. { $log10: <number> } $ln Calculates the natural log of a number. { $ln: <number> } $pow Raises a number to the specified exponent. { $pow: [ <number>, <exponent> ] } $exp Raises e to the specified exponent. { $exp: <number> } $trunc Truncates a number to its integer. { $trunc: <number> } $ceil Returns the smallest integer greater than or equal to the specified number.{$ceil:<number>} $floor Returns the largest integer less than or equal to the specified number. {$floor: <number>}
  • 61. 61 new array operators $slice Returns a subset of an array. { $slice: [ <array>, <n> ] } or { $slice: [ <array>, <position>, <n> ] } $arrayElemAt Returns the element at the specified array index.{ $arrayElemAt: [ <array>, <idx> ] } $concatArrays Concatenates arrays. { $concatArrays: [ <array1>, <array2>, ... ]} $isArray Determines if the operand is an array. { $isArray: [ <expression> ] } $filter Selects a subset of the array based on the condition. { $filter: { input: <array>, as: <string>, cond: <expression> } }
  • 64. 64 Framework Use Cases • Basic aggregation queries • Ad-hoc reporting • Real-time analytics • Visualizing and reshaping data
  • 65. Questions? Thanks for attending & happy aggregating Please complete survey jason.mimick@mongodb.com @jmimick