This document provides an overview of MongoDB's new aggregation framework. It begins with terminology that maps concepts from relational databases to MongoDB. Examples are given that show how to aggregate data from a collection of tweets to count friends and followers by location. The pipeline concept is introduced along with common aggregation operations like match, project, group, sort, and expressions. Finally, information is provided on downloading MongoDB and contacting the author.
De normalised london aggregation framework overview
1. DeNormalised London:
Aggregation Framework Overview
Chris Harris
Email : charris@10gen.com
Twitter : cj_harris5
Wednesday, 21 March 12
2. Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
Wednesday, 21 March 12
3. Here is a âsimpleâ SQL Model
mysql> select * from book;
+----+----------------------------------------------------------+
| id | title |
+----+----------------------------------------------------------+
| 1 | The Demon-Haunted World: Science as a Candle in the Dark |
| 2 | Cosmos |
| 3 | Programming in Scala |
+----+----------------------------------------------------------+
3 rows in set (0.00 sec)
mysql> select * from bookauthor;
+---------+-----------+
| book_id | author_id |
+---------+-----------+
| 1| 1|
| 2| 1|
| 3| 2|
| 3| 3|
| 3| 4|
+---------+-----------+
5 rows in set (0.00 sec)
mysql> select * from author;
+----+-----------+------------+-------------+-------------+---------------+
| id | last_name | ďŹrst_name | middle_name | nationality | year_of_birth |
+----+-----------+------------+-------------+-------------+---------------+
| 1 | Sagan | Carl | Edward | NULL | 1934 |
| 2 | Odersky | Martin | NULL | DE | 1958 |
| 3 | Spoon | Lex | NULL | NULL | NULL |
| 4 | Venners | Bill | NULL | NULL | NULL |
+----+-----------+------------+-------------+-------------+---------------+
4 rows in set (0.00 sec)
Wednesday, 21 March 12
4. The Same Data in MongoDB
{
"_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"),
"title" : "Programming in Scala",
"author" : [
{
"ďŹrst_name" : "Martin",
"last_name" : "Odersky",
"nationality" : "DE",
"year_of_birth" : 1958
},
{
"ďŹrst_name" : "Lex",
"last_name" : "Spoon"
},
{
"ďŹrst_name" : "Bill",
"last_name" : "Venners"
}
]
}
Wednesday, 21 March 12
5. What problem are we solving?
⢠Map/Reduce can be used for aggregationâŚ
⢠Currently being used for totaling, averaging, etc
⢠Map/Reduce is a big hammer
⢠Simpler tasks should be easier
⢠Shouldnât need to write JavaScript
⢠Avoid the overhead of JavaScript engine
⢠Weâre seeing requests for help in handling
complex documents
⢠Select only matching subdocuments or arrays
Wednesday, 21 March 12
6. How will we solve the problem?
⢠New aggregation framework
⢠Declarative framework (no JavaScript)
⢠Describe a chain of operations to apply
⢠Expression evaluation
⢠Return computed values
⢠Framework: new operations added easily
⢠C++ implementation
Wednesday, 21 March 12
7. Aggregation - Pipelines
⢠Aggregation requests specify a pipeline
⢠A pipeline is a series of operations
⢠Members of a collection are passed
through a pipeline to produce a result
⢠ps -ef | grep -i mongod
Wednesday, 21 March 12
8. Example - twitter
{
"_id" : ObjectId("4f47b268fb1c80e141e9888c"),
"user" : {
"friends_count" : 73,
"location" : "Brazil",
"screen_name" : "Bia_cunha1",
"name" : "Beatriz Helena Cunha",
"followers_count" : 102,
}
}
⢠Find the # of followers and # friends by location
Wednesday, 21 March 12
11. Example - twitter
db.tweets.aggregate(
{$match:
{"user.friends_count": { $gt: 0 }, Predicate
"user.followers_count": { $gt: 0 }
}
},
{$project:
{ location: "$user.location",
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location",
friends: {$sum: "$friends"},
followers: {$sum: "$followers"}
}
}
);
Wednesday, 21 March 12
12. Example - twitter
db.tweets.aggregate(
{$match:
{"user.friends_count": { $gt: 0 }, Predicate
"user.followers_count": { $gt: 0 }
}
},
{$project:
{ location: "$user.location",
Parts of the
friends: "$user.friends_count", document you
followers: "$user.followers_count" want to project
}
},
{$group:
{_id: "$location", Function to
friends: {$sum: "$friends"}, apply to the
followers: {$sum: "$followers"}
} result set
}
);
Wednesday, 21 March 12
14. Demo
Demo ďŹles are at https://gist.github.com/
2036709
Wednesday, 21 March 12
15. Projections
⢠$project can reshape results
⢠Include or exclude ďŹelds
⢠Computed ďŹelds
⢠Arithmetic expressions
⢠Pull ďŹelds from nested documents to the top
⢠Push ďŹelds from the top down into new virtual
documents
Wednesday, 21 March 12
16. Unwinding
⢠$unwind can âstreamâ arrays
⢠Array values are doled out one at time in the
context of their surrounding documents
⢠Makes it possible to ďŹlter out elements before
returning
Wednesday, 21 March 12
17. Grouping
⢠$group aggregation expressions
⢠DeďŹne a grouping key as the _id of the result
⢠Total grouped column values: $sum
⢠Average grouped column values: $avg
⢠Collect grouped column values in an array or
set: $push, $addToSet
⢠Other functions
⢠$min, $max, $ďŹrst, $last
Wednesday, 21 March 12
18. Sorting
⢠$sort can sort documents
⢠Sort speciďŹcations are the same as today,
e.g., $sort:{ key1: 1, key2: -1, âŚ}
Wednesday, 21 March 12
19. Computed Expressions
⢠Available in $project operations
⢠PreďŹx expression language
⢠$add:[â$ďŹeld1â, â$ďŹeld2â]
⢠$ifNull:[â$ďŹeld1â, â$ďŹeld2â]
⢠Nesting: $add:[â$ďŹeld1â, $ifNull:[â$ďŹeld2â,
â$ďŹeld3â]]
⢠Other functionsâŚ.
⢠$divide, $mod, $multiply
Wednesday, 21 March 12
20. Computed Expressions
⢠String functions
⢠$toUpper, $toLower, $substr
⢠Date ďŹeld extraction
⢠$year, $month, $day, $hour...
⢠Date arithmetic
⢠$ifNull
⢠Ternary conditional
⢠Return one of two values based on a
predicate
Wednesday, 21 March 12
21. download at mongodb.org
Weâre Hiring !
Chris Harris
Email : charris@10gen.com
Twitter : cj_harris5
conferences, appearances
http://www.10gen.com/events
and meetups
http://www.meetup.com/London-MongoDB-User-Group
Wednesday, 21 March 12