Doing Joins in MongoDB: Best Practices for Using $lookup

#MDBW17
Best Practices for Using $lookup
DOING JOINS IN MONGODB

#MDBW17
Senior Solution Architect, MongoDB
AUSTIN ZELLNER

#MDBW17
WHY WE’RE HERE…$LOOKUP
• Fundamentals of $lookup and its syntax.
• How to use $lookup stages in your aggregation pipelines.
• Best practices for using $lookup to implement application features.

#MDBW17
Finally, I can do joins in Mongo DB…

#MDBW17
JUST WHAT IS MONGO UNDER THE HOOD
Server
Index
_idFileLoc
Listener
Executes
Javascript
C++
Node

#MDBW17
$LOOKUP
“Performs a left outer join to an unsharded collection in the same database to filter
in documents from the “joined” collection for processing.
The $lookup stage does an equality match between a field from the
input documents with a field from the documents of the “joined” collection.
To each input document, the $lookup stage adds a new array field
whose elements are the matching documents from the “joined” collection.
The $lookup stage passes these reshaped documents to the next stage.”

#MDBW17
$LOOKUP COMMAND SYNTAX

#MDBW17
EXAMPLE – SHOW INVENTORY FOR
ORDERS
Orders
• ID
• Item
• Price
• Quantity
Inventory
• ID
• SKU
• Description
• Instock
Item:SKU

#MDBW17
Orders
• ID
• Item
• Price
• Quantity
Where Inventory.SKU = Orders.Item
Inventory
• ID
• SKU
• Description
• Instock
Orders
• ID
• Item
• Price
• Quantity
• Inventory_docs{}

#MDBW17
SO WHAT’S THE POINT? DEV
PRODUCTIVITY
Before $lookup, all combining
data from multiple collections had
to be done on client side…
After $lookup, can combine data
on server side, saving dev and
execution time…

#MDBW17
GROKKING $LOOKUP
• Go and read the code at GitHub
‒ Start up
‒ Outer loop -> For each of the records in the incoming set
o Inner Loop -> for each match to the incoming collection on the field matching
• Match? Yes – add to out array
o Next
‒ Next
‒ Some optimizations and put the modified set into the out tray of the pipeline
‒ Clean up

#MDBW17
AGGREGATION BUDDIES
• Typically used with:
‒ $match – limit the records coming into the lookup
‒ $project – mutate the record shape
‒ $unwind – make it easier to work with the array that is generated
‒ $lookup – for linking against multiple collections to build more complex
shapes
• Always place your $lookup AFTER your filtering operations!

#MDBW17
SO WHAT’S THIS GOING TO COST ME
• Rough calculation of $lookup
• model as the # of IOPS to read the set of documents that enter the first stage
of the pipeline.
• For example, if the first stage is a $match then I would assume 1 IOPS for
each document that meets the match criteria. I would then add to it the IOPS
for the $lookup. For the $lookup, I would estimate the number of documents
entering the $lookup phase and estimate the # of documents (on average) that
match each join. The IOPS would be the # of documents entering the $lookup
phase * the average #of documents joined to each document. This assumes
that there is an index for the $lookup and the first $match as well.

#MDBW17
LET’S TRY THAT AGAIN
1000 docs =
1000 IOPS
Average 2 docs per order
so 2 * 1000 = 2000 IOPS
So total is 3000
IOPS

#MDBW17
SERIOUSLY, INDEX…
Thanks
Guy Harrison!
http://guyharrison.net/

#MDBW17
DON’T FORGET ABOUT $GRAPHLOOKUP
• $graphlookup is just like $lookup, but for graph data
• There is a trick – you can apply more filters, and set the depth to 0
• Gives you a combo of a $match and $lookup in one punch
Try this at home!

#MDBW17
CONCEPTUAL APPROACH
• Continue to design schema to optimize for use cases
• When you have use cases where having data from two collections
solves, then do the following:
‒ 1. Check to see if you really should just refactor your collection
‒ 2. Make sure you are adding indexes to support your $lookup
‒ 3. Create your aggregation leveraging $lookup

#MDBW17
BEST PRACTICE
• Take a look back at old file based design patterns: master file and
utility files. $lookup great for supporting more detail for reports,
analytics.
• Use to solve problems of reference to keep document sizes
optimized
‒ Time series data, IOT
• Should be a red flag from a design perspective, but when you are
using it to combine well designed, well indexed schema, good to go

Doing Joins in MongoDB: Best Practices for Using $lookup

Doing Joins in MongoDB: Best Practices for Using $lookup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Doing Joins in MongoDB: Best Practices for Using $lookup

Similar to Doing Joins in MongoDB: Best Practices for Using $lookup (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

Doing Joins in MongoDB: Best Practices for Using $lookup