Speaker: Austin Zellner, Solutions Architect, MongoDB
Level: 200 (Intermediate)
Track: Data Analytics
$lookup is a pipeline stage in the aggregation framework that performs a left outer join. In this session, you will learn how to leverage $lookup in your applications and best practices for implementing features with $lookup.
What You Will Learn:
- Fundamentals of $lookup and its syntax.
- How to use $lookup stages in your aggregation pipelines.
- Best practices for using $lookup to implement application features.
3. #MDBW17
WHY WE’RE HERE…$LOOKUP
• Fundamentals of $lookup and its syntax.
• How to use $lookup stages in your aggregation pipelines.
• Best practices for using $lookup to implement application features.
7. #MDBW17
$LOOKUP
“Performs a left outer join to an unsharded collection in the same database to filter
in documents from the “joined” collection for processing.
The $lookup stage does an equality match between a field from the
input documents with a field from the documents of the “joined” collection.
To each input document, the $lookup stage adds a new array field
whose elements are the matching documents from the “joined” collection.
The $lookup stage passes these reshaped documents to the next stage.”
13. #MDBW17
SO WHAT’S THE POINT? DEV
PRODUCTIVITY
Before $lookup, all combining
data from multiple collections had
to be done on client side…
After $lookup, can combine data
on server side, saving dev and
execution time…
15. #MDBW17
GROKKING $LOOKUP
• Go and read the code at GitHub
‒ Start up
‒ Outer loop -> For each of the records in the incoming set
o Inner Loop -> for each match to the incoming collection on the field matching
• Match? Yes – add to out array
o Next
‒ Next
‒ Some optimizations and put the modified set into the out tray of the pipeline
‒ Clean up
16. #MDBW17
AGGREGATION BUDDIES
• Typically used with:
‒ $match – limit the records coming into the lookup
‒ $project – mutate the record shape
‒ $unwind – make it easier to work with the array that is generated
‒ $lookup – for linking against multiple collections to build more complex
shapes
• Always place your $lookup AFTER your filtering operations!
17. #MDBW17
SO WHAT’S THIS GOING TO COST ME
• Rough calculation of $lookup
• model as the # of IOPS to read the set of documents that enter the first stage
of the pipeline.
• For example, if the first stage is a $match then I would assume 1 IOPS for
each document that meets the match criteria. I would then add to it the IOPS
for the $lookup. For the $lookup, I would estimate the number of documents
entering the $lookup phase and estimate the # of documents (on average) that
match each join. The IOPS would be the # of documents entering the $lookup
phase * the average #of documents joined to each document. This assumes
that there is an index for the $lookup and the first $match as well.
18. #MDBW17
LET’S TRY THAT AGAIN
1000 docs =
1000 IOPS
Average 2 docs per order
so 2 * 1000 = 2000 IOPS
So total is 3000
IOPS
20. #MDBW17
DON’T FORGET ABOUT $GRAPHLOOKUP
• $graphlookup is just like $lookup, but for graph data
• There is a trick – you can apply more filters, and set the depth to 0
• Gives you a combo of a $match and $lookup in one punch
Try this at home!
22. #MDBW17
CONCEPTUAL APPROACH
• Continue to design schema to optimize for use cases
• When you have use cases where having data from two collections
solves, then do the following:
‒ 1. Check to see if you really should just refactor your collection
‒ 2. Make sure you are adding indexes to support your $lookup
‒ 3. Create your aggregation leveraging $lookup
23. #MDBW17
BEST PRACTICE
• Take a look back at old file based design patterns: master file and
utility files. $lookup great for supporting more detail for reports,
analytics.
• Use to solve problems of reference to keep document sizes
optimized
‒ Time series data, IOT
• Should be a red flag from a design perspective, but when you are
using it to combine well designed, well indexed schema, good to go