3. 4
• it is way too broad to tackle with one solution
• data maps so well to the document model
• needs for agility, performance and scaling
• Many (e)retailers are already using MongoDB
• Let's define the best ways and places for it!
Retail solution
4. 5
• Holds complex JSON structures
• Dynamic Schema for Agility
• complex querying and in-place updating
• Secondary, compound and geo indexing
• full consistency, durability, atomic operations
• Near linear scaling via sharding
• Overall, MongoDB is a unique fit!
MongoDB is a great fit
6. 7
build your data to fit your application
Relational MongoDB
{ customer_id : 1,
name : "Mark Smith",
city : "San Francisco",
orders: [ {
order_number : 13,
store_id : 10,
date: “2014-01-03”,
products: [
{SKU: 24578234,
Qty: 3,
Unit_price: 350},
{SKU: 98762345,
Qty: 1,
Unit_Price: 110}
]
},
{ <...> }
]
}
CustomerID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Danields Boston
Order Number Store ID Product Customer ID
10 100 Tablet 0
11 101 Smartphone 0
12 101 Dishwasher 0
13 200 Sofa 1
14 200 Coffee table 1
15 201 Suit 2
13. 14
• Single view of a product, one central catalog service
• Read volume high and sustained, 100k reads / s
• Write volume spikes up during catalog update
• Advanced indexing and querying
• Geographical distribution and low latency
• No need for a cache layer, CDN for assets
Merchandising - principles
14. 15
Merchandising - requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and
hierarchy of product to
ensure availability on all
channels
Flexible document-oriented
storage
High sustained read
volume with low latency
Constant querying from
online users and sales
associates, requiring
immediate response
Fast indexed querying,
replication allows local copy
of catalog, sharding for
scaling
Spiky and real-time write
volume
Bulk update of full catalog
without impacting
production, real-time touch
update
Fast in-place updating, real-
time indexing, , sharding for
scaling
Advanced querying Find product based on
color, size, description
Ad-hoc querying on any
field, advanced secondary
and compound indexing
15. 16
Merchandising - Product Page
Product
images
General
Informatio
n
List of
Variants
External
Informatio
n
Localized
Description
16. 17
> db.item.findOne()
{ _id: "301671", // main item id
department: "Shoes",
category: "Shoes/Women/Pumps",
brand: "Guess",
thumbnail: "http://cdn…/pump.jpg",
image: "http://cdn…/pump1.jpg", // larger version of thumbnail
title: "Evening Platform Pumps",
description: "Those evening platform pumps put the perfect
finishing touches on your most glamourous night-on-the-town
outfit",
shortDescription: "Evening Platform Pumps",
style: "Designer",
type: "Platform",
rating: 4.5, // user rating
lastUpdated: Date("2014/04/01"), // last update time
… }
Merchandising - Item Model
17. 18
• Get item by id
db.definition.findOne( { _id: "301671" } )
• Get item from Product Ids
db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } )
• Get items by department
db.definition.find({ department: "Shoes" })
• Get items by category prefix
db.definition.find( { category: /^Shoes/Women/ } )
• Indices
productId, department, category, lastUpdated
Merchandising - Item Definition
18. 19
> db.variant.findOne()
{
_id: "730223104376", // the sku
itemId: "301671", // references item id
thumbnail: "http://cdn…/pump-red.jpg", // variant
specific
image: "http://cdn…/pump-red.jpg",
size: 6.0,
color: "Red",
width: "B",
heelHeight: 5.0,
lastUpdated: Date("2014/04/01"), // last update time
…
}
Merchandising – Variant Model
19. 20
• Get variant from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variants for a product, sorted by SKU
db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )
• Indices
productId, lastUpdated
Merchandising – Variant Model
20. 22
Per store Pricing could result in billions of documents,
unless you build it in a modular way
Price: {
_id: "sku730223104376_store123",
currency: "USD",
price: 89.95,
lastUpdated: Date("2014/04/01"), // last update time
…
}
_id: concatenation of item and store.
Item: can be an item id or sku
Store: can be a store group or store id.
Indices: lastUpdated
Merchandising – per store Pricing
21. 23
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376_store1234",
"sku730223104376_sgroup0",
"p301671_store1234",
"p301671_sgroup0"] , { price: 1 })
Merchandising – per store Pricing
22. 26
Merchandising – Browse and Search products
Browse by
category
Special
Lists
Filter by
attributes
Lists hundreds
of item
summaries
Ideally a single query is issued to the database
to obtain all items and metadata to display
23. 27
The previous page presents many challenges:
• Response within milliseconds for hundreds of items
• Faceted search on many attributes: category, brand, …
• Attributes at the variant level: color, size, etc, and the
variation's image should be shown
• thousands of variants for an item, need to de-duplicate
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering
Merchandising – Browse and Search products
24. 28
Merchandising – Browse and Search products
Hundreds
of sizes
One Item
Dozens of
colors
A single item may have thousands of variants
25. 29
Merchandising – Browse and Search products
Images of the matching
variants are displayed
Hierarchy
Sort
parameter
Faceted
Search
26. 30
Merchandising – Traditional Architecture
Relational DB
System of Records
Full Text Search
Engine
Indexing
#1 obtain
search
results IDs
ApplicationCache
#2 obtain
objects by
ID
Pre-joined
into objects
27. 31
The traditional architecture issues:
• 3 different systems to maintain: RDBMS, Search
engine, Caching layer
• search returns a list of IDs to be looked up in the cache,
increases latency of response
• RDBMS schema is complex and static
• The search index is expensive to update
• Setup does not allow efficient pagination
Merchandising – Traditional Architecture
29. 33
The summary relies on the following parameters:
• department e.g. "Shoes"
• An indexed attribute
– Category path, e.g. "Shoes/Women/Pumps"
– Price range
– List of Item Attributes, e.g. Brand = Guess
– List of Variant Attributes, e.g. Color = red
• A non-indexed attribute
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variant Secondary Attributes, e.g. heel height = 4.0
• Sorting, e.g. Price Low to High
Merchandising – Summary Model
31. 35
• Get summary from item id
db.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKU
db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by rating
db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parameters
db.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} ,
"category" : ^/Shoes/Women/ ,
"price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )
Merchandising - Summary Model
32. 36
Merchandising – Summary Model
• The following indices are used:
– department + attr + category + _id
– department + vars.attrs + category + _id
– department + category + _id
– department + price + _id
– department + rating + _id
• _id used for pagination
• Can take advantage of index intersection
• With several attributes specified (e.g. color=red
and size=6), which one is looked up?
33. 37
Facet samples:
{ "_id" : "Accessory Type=Hosiery" , "count" : 14}
{ "_id" : "Ladder Material=Steel" , "count" : 2}
{ "_id" : "Gold Karat=14k" , "count" : 10138}
{ "_id" : "Stone Color=Clear" , "count" : 1648}
{ "_id" : "Metal=White gold" , "count" : 10852}
Single operations to insert / update:
db.facet.update( { _id: "Accessory Type=Hosiery" },
{ $inc: 1 }, true, false)
The facet with lowest count is the most restrictive…
It should come first in the query!
Merchandising – Facet