8. Dataset
{
"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),
"fragdate" : ISODate("2012-12-24T00:00:19.901Z"),
"gameId" : 1221, Aggregate the
"gameName" : "Christmas Blitz",
"kill" : { number of times each
"_id" : ObjectId("50acfd45712e8bc7832ea7cb"), player was killed
"username" : "player002",
"avatar" : "avatar.com/player002.png",
"displayname" : "Sniper the Clown",
"rank" : "Sniper",
"motto" : "If you run, you'll just die tired."
},
"player" : {
"userid" : 1,
"username" : "ArmyD00d1221",
"avatar" : "avatar.com/armyd00d1221.png",
"displayname" : "Army Grunt"
},
"server" : "app01.fragzilla.com"
}
9. Report Details
{
"_id" : ObjectId("50fc77ee364c74eba1afe1e3"),
"fragdate" : ISODate("2012-12-24T00:00:19.901Z"), Only aggregate kills on
"gameId" : 1221, these three players:
"gameName" : "Christmas Blitz",
"kill" : { • Sniper the Clown
"_id" : ObjectId("50acfd45712e8bc7832ea7cb"), • Kurious Killer
"username" : "player002",
"avatar" : "avatar.com/player002.png",
• My L1ttl3 P0wn13
"displayname" : "Sniper the Clown",
"rank" : "Sniper",
"motto" : "If you run, you'll just die tired."
},
"player" : { Only on Dec 23, 2012
"userid" : 1, Between 2pm and 10pm
"username" : "ArmyD00d1221",
"avatar" : "avatar.com/armyd00d1221.png",
"displayname" : "Army Grunt"
},
"server" : "app01.fragzilla.com"
}
10. Relational DB
Killed
Kills Id
Id username
fragDate avatar
gameID displayName
gameName rank
server motto
fkKilled
fkPlayer
Player
Id
username
avatar Could be the
displayName same table
rank
motto
11. Relational DB
Killed
Kills Id
Id username
avatar
fragDate SELECT tk.fragDate, k.id, count(k.id) FROM test.kills tk
gameID JOIN players p ON tk.fkPlayer = p.id displayName
gameName JOIN killed k ON tk.fkKilled = k.id rank
server WHERE k.id IN (1,2,3) motto
fkKilled GROUP BY fragDate, k.id;
fkPlayer
Player
Id
username
avatar Could be the
displayName same table
rank
motto
12. Sidenote: Exploration
• Software Engineering tends to have more
clearly defined goals
• Report Engineering tends to have more clearly
defined questions
17. Aggregation: Big Picture
Mongo
Mongo Map/Reduce
Aggregation
Queries Implementations
Framework
Complexity
• Somewhere between Mongo Queries and
Map/Reduce implementations
• Best suited for totaling and averaging functions
• Similar functionality to SQL Group By clause
18. Anatomy of Aggregation Framework
db.collection.aggregate( Aggregate command
[ {do something},
{do something else}, Pipeline
Operators
{do even more stuff}
]
)
19. Pipeline Operators
• Pipelines: transforms documents from the
collection as they pass through
– grep e server.log | less
• Expressions: produce output documents
based on calculations performed on input
documents
24. Our Aggregation Query
$match:
Provides a query-like interface to filter
documents out of the aggregation
pipeline. The $match drops
documents that do not match the
condition from the aggregation
pipeline, and it passes documents that
match along the pipeline unaltered.
25. Our Aggregation Query
$project: Reshapes a document
stream by renaming, adding, or
removing fields. Also use $project to
create computed values or sub-
objects
26. Our Aggregation Query
$group
Groups documents together for the
purpose of calculating aggregate
values based on a collection of
documents. Practically, group often
supports tasks such as average page
views for each page in a website on a
daily basis.
28. Our Aggregation Query
$sort
The $sort pipeline operator sorts all
input documents and returns them to
the pipeline in sorted order.
{ $sort : { <sort-key> } }
29. Aggregation Output
{
"result" : [
{
"_id" : {
"displayname" : "My L1ttl3 P0wn13",
"eventhour" : 21
},
"numKills" : 133
},
{
"_id" : { Produces a document with
"displayname" : "Kurious Killer",
two fields: result and ok
"eventhour" : 21
},
"numKills" : 130
},
// ******* Omitted for brevity *******
{
"_id" : {
"displayname" : "Sniper the Clown",
"eventhour" : 2
},
"numKills" : 6
}
],
"ok" : 1
}
37. Q/A/Comments
Will Button
willb@mylist.com
@wfbutton
Hinweis der Redaktion
BREAK: Break-out and run the code for demo purposes.
Point out why you would want to do this:Quick and easy for exploratory purposes. Also a dirty way to validate your finished code for accurate numbers.Some customers prefer this format
Think of $match as being similar to query operators of a find query. The purpose of $match is to identify the documents that are relevant to your aggregation and shed the rest.In our example here, we’re using two operators to match documents where the kill._id is one of three specified AND the fragdate is within the time window specified. Any documents not matching both of these criteria will be discarded and not passed down the the next stage of the pipeline.
The $project operator gives you the opportunity to reformat and refine your data prior to passing it on. We’re doing a couple of things: First we’re removing all the data from the document except the id and displayname from the “kill” subdocumentWe’re also bringing along the fragdate, but only the hour that the event occurred, since that is what we’ll be aggregating onAll other data from the document is not passed on.
$group can be thought of as the main workhorse of the aggregation framework. This is where you’ll calculate your aggregate values based on the documents in the pipeline.$group must specify an _id field- and it has to be called _id. This can be a dotted field path reference, a subdocument with multiple fields or a constant value. If need be, a $project operator can rename the _id further down the pipeline.In our example, we’re using a subdocument containing the displayname of the player who was killed and the $eventhour for the hour when the event took place.Our aggregated value is numKills, meaning we want to track the number of times this player was killed in the documents being considered. To achieve that, we’re using the $sum operator and specifying a value of 1 for a new field we’re creating called “numKills”. This has the effect of incrementing the value of “numKills” by 1 each time a matching document is found in the collection.
Had numKills been an existing numeric value- we could have summed those values by specifying the $sum operator with a value of $numKills, which causes the aggregation framework to read the value of the number found inside of the numKills field. Similarly, we could have used $min, $max, $first, $last, or $avg in place of $sum to achieve different aggregation results.One thing to note: the $group operator currently stores $group operations in memory, so your capabilities may be impacted as a result.
Last, but certainly not least: the $sort operator sorts our data in the order we specify.As parameters, it accepts an object specifying fields and 1 or -1 as the sort order (ascending or descending respectively). This works just like the sort operator on a standard mongodb query.
The output of the aggregation query is a document with two fields: result and ok.Ok returns 1 if the query completed successfully, or an error code if it did not.The result field contains an array of documents returned by the pipeline.
If we take a closer look at the result array, we see documents with an _id field showing the displayname of the player and the hour being represented. A second field, numKills shows the aggregate value indicating the number of times this player was killed during the match.
To recap, the aggregation query accepts a series of pipeline operators to modify and aggregate a collection. On the surface, it is that simple. In practice, it’s pipleline approach can produce a wide array of results.