Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

MongoDB Aggregation Framework

6.836 Aufrufe

Veröffentlicht am

These are slides from our Big Data Warehouse Meetup in April. We talked about NoSQL databases: What they are, how they’re used and where they fit in existing enterprise data ecosystems.

Mike O’Brian from 10gen, introduced the syntax and usage patterns for a new aggregation system in MongoDB and give some demonstrations of aggregation using the new system. The new MongoDB aggregation framework makes it simple to do tasks such as counting, averaging, and finding minima or maxima while grouping by keys in a collection, complementing MongoDB’s built-in map/reduce capabilities.

For more information, visit our website at http://casertaconcepts.com/ or email us at info@casertaconcepts.com.

Veröffentlicht in: Technologie, Business

MongoDB Aggregation Framework

  1. 1. AggregationFramework
  2. 2. Quick Overview of
  3. 3. Quick Overview ofDocument-orientedSchemalessJSON-style documentsRich QueriesScales Horizontallydb.users.find({last_name: Smith,age: {$gt : 10}});SELECT * FROM users WHERElast_name=‘Smith’ AND age > 10;
  4. 4. Computing Aggregations inDatabasesSQL-basedRDBMSJOINGROUP BYAVG(),COUNT(),SUM(), FIRST(),LAST(),etc.MongoDB 2.0MapReduceMongoDB 2.2+MapReduceAggregation Framework
  5. 5. MapReducevar map = function(){...emit(key, val);}var reduce = function(key, vals){...return resultVal;}DataMap()emit(k,v)Sort(k)Group(k)Reduce(k,values)k,vFinalize(k,v)k,vMongoDBmap iterates ondocumentsDocument is $this1 at time per shardInput matches outputCan run multiple times
  6. 6. What’s wrong with just usingMapReduce?Map/Reduce is verypowerful, but often overkillLots of users relying on itfor simple aggregation tasks••
  7. 7. What’s wrong with just usingMapReduce?Easy to screw up JavaScriptDebugging a M/R job sucksWriting more JS for simple tasks should not be necessary•••(ಠ︿ಠ)
  8. 8. AggregationFrameworkDeclarative (no need to write JS)Implemented directly in C++Expression EvaluationReturn computed valuesFramework: We can extend it with newops•••••
  9. 9. InputData(collection)FilterProjectUnwindGroupSortLimitResult(document)
  10. 10. db.article.aggregate({ $project : {author : 1,tags : 1}},{ $unwind : "$tags" },{ $group : {_id : “$tags”,authors:{ $addToSet:"$author"}}});An aggregation command looks like:
  11. 11. db.article.aggregate({ $project : {author : 1, tags : 1}},{ $unwind : "$tags" },{ $group : {_id : “$tags”,authors : { $addToSet:"$author"}}});New HelperMethod:.aggregate()Operatorpipelinedb.runCommand({aggregate : "article",pipeline : [ {$op1, $op2, ...} ]}
  12. 12. {"result" : [{ "_id" : "art", "authors" : [ "bill", "bob" ] },{ "_id" : "sports", "authors" : [ "jane", "bob" ] },{ "_id" : "food", "authors" : [ "jane", "bob" ] },{ "_id" : "science", "authors" : [ "jane", "bill", "bob" ] }],"ok" : 1}Output Document Looks like this:result: array of pipelineoutputok: 1 for success, 0otherwise
  13. 13. PipelineInput to the start of the pipeline is a collectionSeries of operators - each one filters or transforms itsinputPasses output data to next operator in the pipelineOutput of the pipeline is the result document••••ps -ax | tee processes.txt | moreKind of like UNIX:
  14. 14. Let’s do:1. Tour of the pipelineoperators2. A couple examples based oncommon SQL aggregation tasks$match$unwind$group$project$skip $limit $sort
  15. 15. filters documents from pipeline with a query predicatefiltered with:{$match: {author:”bob”}}$match{author: "bob", pageViews:5, title:"Lorem Ipsum..."}{author: "bill", pageViews:3, title:"dolor sit amet..."}{author: "joe", pageViews:52, title:"consectetur adipi..."}{author: "jane", pageViews:51, title:"sed diam..."}{author: "bob", pageViews:14, title:"magna aliquam..."}{author: "bob", pageViews:53, title:"claritas est..."}filtered with:{$match: {pageViews:{$gt:50}}{author:"bob",pageViews:5,title:"Lorem Ipsum..."}{author:"bob",pageViews:14,title:"magna aliquam..."}{author:"bob",pageViews:53,title:"claritas est..."}{author: "joe", pageViews:52, title:"consectetur adipiscing..."}{author: "jane", pageViews:51, title:"sed diam..."}{author: "bob", pageViews:53, title:"claritas est..."}Input:
  16. 16. $unwind{"_id" : ObjectId("4f...146"),"author" : "bob","tags" :[ "fun","good","awesome"]}explode the “tags” array with:{ $unwind : ”$tags” }{ _id : ObjectId("4f...146"), author : "bob", tags:"fun"},{ _id : ObjectId("4f...146"), author : "bob", tags:"good"},{ _id : ObjectId("4f...146"), author : "bob", tags:"awesome"}produces output:Produce a new document foreach value in an input array
  17. 17. Bucket a subset of docs together,calculate an aggregated output doc from the bucket$sum$max, $min$avg$first, $last$addToSet$pushdb.article.aggregate({ $group : {_id : "$author",viewsPerAuthor : { $sum :"$pageViews" }}});$groupOutputCalculationOperators:
  18. 18. db.article.aggregate({ $group : {_id : "$author",viewsPerAuthor : { $sum : "$pageViews" }}});_id: selects a field to use asbucket key for groupingOutput field name Operation used to calculate theoutput value($sum, $max, $avg, etc.)$group (cont’d)dot notation (nested fields)a constanta multi-key expression inside{...}•••also allowed here:
  19. 19. An example with $match and $groupSELECT SUM(price) FROM ordersWHERE customer_id = 4;MongoDB:SQL:db.orders.aggregate({$match : {“$customer_id” : 4}},{$group : { _id : null,total: {$sum : “price”}})English: Find the sum of all prices of theorders placed by customer #4
  20. 20. An example with $unwind and $groupMongoDB:SQL:English:db.posts.aggregate({ $unwind : "$tags" },{ $group : {_id : “$tags”,authors : { $addToSet : "$author" }}});For all tags used in blog posts, produce a list ofauthors that have posted under each tagSELECT tag, author FROM post_tags LEFTJOIN posts ON post_tags.post_id =posts.id GROUP BY tag, author;
  21. 21. More operators - Controlling Pipeline Input$skip$limit$sortSimilar to:.skip().limit().sort()in a regular Mongo query
  22. 22. $sortspecified the same way as index keys:{ $sort : { name : 1, age: -1 } }Must be used in order to takeadvantage of $first/$last with$group.order input documents
  23. 23. $limitlimit the number of input documents{$limit : 5}$skipskips over documents{$skip : 5}
  24. 24. $projectUse for:Add, Remove, Pull up, Push down, RenameFieldsBuilding computed fieldsReshape a document
  25. 25. $project(cont’d)Include or exclude fields{$project :{ title : 1,author : 1} }Only pass on fields“title” and “author”{$project : { comments : 0}Exclude“comments” field,keep everythingelse
  26. 26. Moving + Renaming fields{$project :{ page_views : “$pageViews”,catName : “$category.name”,info : {published : “$ctime”,update : “$mtime”}}}Rename page_views to pageViewsTake nested field“category.name”, moveit into top-level fieldcalled “catName”Populate a newsub-documentinto the output$project(cont’d)
  27. 27. db.article.aggregate({ $project : {name : 1,age_fixed : { $add:["$age", 2] }}});Building a Computed FieldOutput(computed field) OperandsExpression$project(cont’d)
  28. 28. Lots of AvailableExpressions$project(cont’d)Numeric $add $sub $mod $divide $multiplyLogical $eq $lte/$lt $gte/$gt $and $not $or $eqDates$dayOfMonth $dayOfYear $dayOfWeek $second $minute$hour $week $month $isoDateStrings $substr $add $toLower $toUpper $strcasecmp
  29. 29. Example: $sort → $limit → $project→$groupMongoDB:SQL:English: Of the most recent 1000 blog posts, how manywere posted within each calendar year?SELECT YEAR(pub_time) as pub_year,COUNT(*) FROM(SELECT pub_time FROM posts ORDER BYpub_time desc)GROUP BY pub_year;db.test.aggregate({$sort : {pub_time: -1}},{$limit : 1000},{$project:{pub_year:{$year:["$pub_time"]}}},{$group: {_id:"$pub_year", num_year:{$sum:1}}})
  30. 30. Some Usage NotesIn BSON, order matters - so computedfields always show up after regular fieldsWe use $ in front of field names todistinguish fields from string literalsin expressions “$name”“name”vs.
  31. 31. Some Usage NotesUse a $match,$sort and $limitfirst in pipeline if possibleCumulative Operators $group:be aware of memory usageUse $project to discard unneeded fieldsRemember the 16MB output limit
  32. 32. Aggregation vs.MapReduceFramework is geared towards counting/accumulatingIf you need something more exotic, useMapReduceNo 16MB constraint on output size withMapReduceJS in M/R is not limited to any fixed set of expressions••••
  33. 33. thanks! ✌(-‿-)✌questions?$$$ BTW: we are hiring!http://10gen.com/jobs $$$@mpobriengithub.com/mpobrienhit me up: