Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

MongoDB Aggregation Framework

8.286 Aufrufe

Veröffentlicht am

Veröffentlicht in: Technologie, Business
  • Distinct can be achieved using pipes in Aggregate framework...Sample code snippet at http://www.techiesinfo.com/code-snippet analysing clickstream
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

MongoDB Aggregation Framework

  1. 1. MongoDB’s New Aggregation Framework Tyler Brock
  2. 2. 2.1 available now (unstable)
  3. 3. Map Reduce Map/Reduce is a big hammer• Used to perform complex analytics tasks on massive amounts of data• Users are currently using it for aggregation… • totaling, averaging, etc
  4. 4. Problem • It should be easier to do simple aggregations • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine
  5. 5. New Aggregation Framework • Declarative • No JavaScript required • C++ implementation • Higher performance than JavaScript • Expression evaluation • Return computed values • Framework: we can add new operations easily
  6. 6. Pipeline • Series of operations • Members of a collection are passed through a pipeline to produce a result
  7. 7. The Aggregation Command• Takes two arguments • Aggregate -- name of collection • Pipeline -- array of pipeline operators db.runCommand( { aggregate : "article", pipeline : [ {$op1, $op2, ...} ] } );
  8. 8. Aggregation helper db.article.aggregate( { $pipeline_op1 }, { $pipeline_op2 }, { $pipeline_op3 }, { $pipeline_op4 }, ... );
  9. 9. Pipeline Operators Old Faves New Hotness • $match • $project • $sort • $unwind • $limit • $group • $skip
  10. 10. $match • Uses a query predicate (like .find({…})) as a filter { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , } { $match :{ $match : { author : "bob" } } { pgv : { $gt : 50, $lte : 90 } } }
  11. 11. $sort• Sorts input documents• Requires sort key -- specified like index keys { $sort : { name : 1, age: -1 } }
  12. 12. $limit• Limits the number of JSON documents { $limit : 5 }$skip• Skips a number of JSON documents { $skip : 5 }
  13. 13. $project• Project can reshape a document • add, remove, rename, move• Similar to .find()’s field selection syntax • But much more powerful• Can generate computed values
  14. 14. $project (include and exclude fields){ $project : { title : 1 , /* include this field, if it exists */ author : 1 , /* include this field, if it exists */ "comments.author" : 1 }}{ $project : { title : 0 , /* exclude this field */ author : 0 , /* exclude this field */ }}
  15. 15. $project (computed fields){ $project : { title : 1, /* include this field if it exists */ doctoredPageViews : { $add: ["$pageViews", 10] } }}
  16. 16. Computed Fields• Prefix expression language • Add two fields • $add:[“$field1”, “$field2”] • Provide a value for a missing field • $ifnull:[“$field1”, “$field2”] • Nesting • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] • Date field extraction • Get year, month, day, hour, etc, from Date • Date arithmetic
  17. 17. $project (rename and pull fields up){ $project : { title : 1 , page_views : "$pageViews" , /* rename this field */ upgrade : "$other.foo" /* move to top level */ }}
  18. 18. $project (push fields down){ $project : { title : 1 , stats : { pv : "$pageViews", /* rename this from the top-level */ } }}
  19. 19. $unwind• Produces document for each value in an array where the array value is single array element { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "awesome" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
  20. 20. { ... tags : "fun" ...},{ ... tags : "good" ...}{ ... tags : "awesome" ...}
  21. 21. $unwind db.article.aggregate( { $project : { author : 1 , /* include this field */ title : 1 , /* include this field */ tags : 1 /* include this field */ }}, { $unwind : "$tags" } );
  22. 22. { "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "ok" : 1}
  23. 23. Grouping• $group aggregation expressions • Total of column values: $sum • Average of column values: $avg • Collect column values in an array: $push { $group : { _id: "$author", fieldname: { $aggfunc: “$field” } } }
  24. 24. $group example db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } }} );
  25. 25. { "result" : [ { "_id" : "jane", "viewsPerAuthor" : 6 }, { "_id" : "dave", "viewsPerAuthor" : 7 }, { "_id" : "bob", "viewsPerAuthor" : 5 } ], "ok" : 1}
  26. 26. Group Aggregation Functions $min $addToSet $avg $first $push $last $sum $max
  27. 27. Pulling it all together {{ tag : “fun” title : "this is my title" , authors: [ ..., ..., ... ] author : "bob" , }, posted : new Date(1079895594000) , { pageViews : 5 , tag: “good” tags : [ "fun" , "good" , "fun" ] authors: [ ..., ..., ... ]} }
  28. 28. db.article.aggregate( { $project : { author : 1, tags : 1, }}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }});
  29. 29. "result" : [ { "_id" : { "tags" : "cool" }, "authors" : [ "jane","dave" ] }, { "_id" : { "tags" : "fun" }, "authors" : [ "dave", "bob" ] }, { "_id" : { "tags" : "good" }, "authors" : [ "bob" ] }, { "_id" : { "tags" : "awful" }, "authors" : [ "jane" ] } ]
  30. 30. Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection
  31. 31. Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : [ ] } • Beware of command result size limit
  32. 32. Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to first $group or $sort to shards; combines shard server results and continues
  33. 33. Common SQL• Distinct • aggregate({ $group: { _id: "$author" }})• Count • aggregate({ $group: {_id:null, count: {$sum:1}}}])• Sum • aggregate({ $group: {_id:null, total: {$sum: "$price"}}})

×