This deck consists of following operations in MongoDB: aggregation through aggregation pipeline, map reduce, operations, indexes and profiling of slow queries.
2. Agenda
● Recap of insertion and finding documents
● Aggregation Framework
o The Aggregation Pipeline
o Aggregation Operators
o Mapping between SQL and Aggregation
o Limitations of the Aggregation Framework
● Map-reduce Queries
● Performance
o Indexes
o Logging slow queries
o Profiling
4. Aggregation Pipeline
The MongoDB aggregation pipeline consists of stages. Documents pass
through each stage and output of each stage is input of next stage.
Some stages may generate new documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline.
What is aggregation pipeline?
14. Limitations of aggregation framework
● Result Size Restrictions
● Memory Restrictions
● Limited use of index
● Sharded collections
15. Exercises
1. Count number of students in group by age
2. Find name of students group by age
3. Find average age of students group by city
4. Count number of students in group by age belonging to
Chennai.
17. What Documentation says?
Map-reduce is a data processing paradigm for
condensing large volumes of data into useful
aggregated results. For map-reduce operations,
MongoDB provides the mapReduce database
command.
21. map function
The map function is responsible for transforming each
input document into zero or more documents.
22. map function
• In the map function, reference the current document as this within the function.
• The map function should not access the database for any reason.
• The map function should be pure, or have no impact outside of the function (i.e. side
effects.)
• The map function may optionally call emit(key,value) any number of times to create
an output document associating key with value.
The following example will call emit either 0 or 1 time.
The
24. reduce function
● The reduce function should not access the database, even to perform read
operations.
● The reduce function should not affect the outside system.
● MongoDB will not call the reduce function for a key that has only a single value. The
values argument is an array whose elements are the value objects that are “mapped”
to the key.
● MongoDB can invoke the reduce function more than once for the same key. In this
case, the previous output from the reduce function for that key will become one of
the input values to the next reduce function invocation for that key.
● The reduce function must return an object whose type must be identical to the type
of the value emitted by the map function.
● The reduce function must be idempotent.
26. finalize function
The finalize function has the following prototype
It receives as its arguments a key value and the
reducedValue from the reduce function.
29. Index
Single field Index:
db.collectionName.createIndex({name: 1})
db.collection.createIndex( <key and index type specification>, <options> )
Given the following document in the friends collection:
{
"_id" : ObjectId(...),
"name" : "Alice",
"age" : 27
}
The following command creates an index on the name field:
db.friends.createIndex( { "name" : 1 })
30. Index
Compound Index:
Index structure holds references to multiple fields within a collection’s documents.
Consider following document in people collection:
{ "_id" : ObjectId(...),
"name" : "John Doe",
"dateOfBirth": Date(..)
"address": {
"street": "Main",
"zipcode": "53511",
"state": "WI"
}
}
db.people.createIndex({'name':1, 'dateOfBirth':1})
MongoDB imposes a limit of 31 fields for any compound index.
31. Index
Compound Index(cntd):
Prefixes in Compound indexes.
Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound index:
{ "item": 1, "location": 1, "stock":1 }
The index has the following index prefixes:
{ item: 1 }
{ item: 1, location: 1 }
What about find({item: ‘Pen’, ‘stock’:{‘$gt’: 20}})
what about fields item and stock?
MongoDB can also use the index to support a query on item and stock fields since item field corresponds to a prefix.
However, the index would not be as efficient in supporting the query as would be an index on only item and stock.
However, MongoDB cannot use the index to support queries that include the following fields since without the item field, none
of the listed fields correspond to a prefix index:
the location field,
the stock field,
the location and stock field,
.
32. Index
Index on Embedded field:
Consider following document in people collection:
{
"_id" : ObjectId(...),
"name" : "John Doe",
"address": {
"street": "Main",
"zipcode": "53511",
"state": "WI"
}
}
db.people.createIndex({address.zipcode:1})
33. Index
Index on Embedded document:
Consider following document in people collection:
{
"_id" : ObjectId("5587afde264fb5b4e25b1556"),
"name" : "John Doe",
"address" : {
"city" : "New York",
"state" : "NY"
}
}
{
"_id" : ObjectId("5587affd264fb5b4e25b1557"),
"name" : "John",
"address" : {
"city" : "New Delhi",
"state" : "ND"
}
}
db.people.createIndex({address:1})
34. Index
Index on Embedded document(cntd.):
Important: When performing equality matches on embedded documents, field order matters.
i.e.
db.people.find({address:{'city': 'New York', state: 'NY'}}) will find result
{ "_id" : ObjectId("5587afde264fb5b4e25b1556"), "name" : "John Doe", "address" : { "city" :
"New York", "state" : "NY" } }
but
db.people.find({address:{'state':'NY', 'city': 'New York'}}) will not find any result
36. Profiling
Enabling Profiling:
db.setProfilingLevel(level)
db.setProfilingLevel(level, slowOpThresholdInMillis)
Profiling Levels:
The following profiling levels are available:
0 - the profiler is off, does not collect any data. mongod always writes operations longer than the slowOpThresholdMs
threshold to its log. This is the default profiler level.
1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds.
You can modify the threshold for “slow” operations with the slowOpThresholdMs runtime option or the setParameter
command. See the Specify the Threshold for Slow Operations section for more information.
2 - collects profiling data for all database operations.
38. Profiling
Viewing Profiling Data(cntd.):
To show five slowest queries run after a certain timestamp
db.system.profile.find({"ts": {"$gt": ISODate("2016-04-29T02:48:42.019Z")}},{ts:1, millis:1, command:1,
query:1, ns:1}).sort({millis:-1}).limit(5).pretty()