MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using a library as a sample application. You will learn how to work with documents, evolve your schema, and common schema design patterns.
5. Aggregation - PipelinesAggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
are passed through a pipeline to produce
a result
– Similar to a Unix command-line pipe
9. Pipeline OperationsPipeline Operations
• $project
– Uses a sample document to determine the
shape of the result (similar to .find()’s 2nd
optional argument)
• Include or exclude fields
• Compute new fields
– Arithmetic expressions, including built-in functions
– Pull fields from nested documents to the top
– Push fields from the top down into new virtual documents
10. Pipeline OperationsPipeline Operations
• $unwind
– Hands out array elements one at a time
{ $unwind : {"$myarray" } }
• $unwind “streams” arrays
– Array values are doled out one at time in the
context of their surrounding document
– Makes it possible to filter out elements before
returning
12. GroupingGrouping
• $group aggregation expressions
– Define a grouping key as the _id of the result
– Total grouped column values: $sum
– Average grouped column values: $avg
– Collect grouped column values in an array or
set: $push, $addToSet
– Other functions
• $min, $max, $first, $last
13. Pipeline OperationsPipeline Operations
• $sort
– Sort documents
– Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
{ $sort : {“total”:-1} }
16. Computed ExpressionsComputed Expressions
• Available in $project operations
• Prefix expression language
– Add two fields: $add:[“$field1”, “$field2”]
– Provide a value for a missing field: $ifNull:
[“$field1”, “$field2”]
– Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]
(continued)
17. Computed ExpressionsComputed Expressions
(continued)(continued)
• String functions
– toUpper, toLower, substr
• Date field extraction
– Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
Oracle nvl())
• Ternary conditional
– Return one of two values based on a predicate
• Other functions….
– And we can easily add more as required
18. Sample data
Original
Event
Data
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”
As JSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}
Insert to
MongoDB
db.logs.insert( doc )
19. Dynamic Queries
Find all
logs for
a URL
db.logs.find( { ‘path’ : ‘/index.html’ } )
Find all
logs for
a time
range
db.logs.find( { ‘time’ :
{ ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012,1) } } );
Find all
logs for
a host
over a
range of
dates
db.logs.find( {
‘host’ : ‘127.0.0.1’,
‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );