Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Retail Reference Architecture
with MongoDB
Antoine Girbal
Principal Solutions Engineer, MongoDB Inc.
@antoinegirbal

4
MongoDB Strategic Advantages
Horizontally Scalable
-Sharding
Agile
Flexible
High Performance &
Strong Consistency
Application
Highly
Available
-Replica Sets
{ customer: “roger”,
date: new Date(),
comment: “Spirited Away”,
tags: [“Tezuka”, “Manga”]}

5
Documents let you build your data to fit
your application
Relational MongoDB
{ customer_id : 1,
name : "Mark Smith",
city : "San Francisco",
orders: [ {
order_number : 13,
store_id : 10,
date: “2014-01-03”,
products: [
{SKU: 24578234,
Qty: 3,
Unit_price: 350},
{SKU: 98762345,
Qty: 1,
Unit_Price: 110}
]
},
{ <...> }
]
}
CustomerID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Danields Boston
Order Number Store ID Product Customer ID
10 100 Tablet 0
11 101 Smartphone 0
12 101 Dishwasher 0
13 200 Sofa 1
14 200 Coffee table 1
15 201 Suit 2

6
Notions
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field

8
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social
Architecture Overview
Customer
Channels
Amazon
Ebay
…
Stores
POS
Kiosk
…
Mobile
Smartphone
Tablet
Website
Contact
Center
API
Data and
Service
Integration
Social
Facebook
Twitter
…
Data
Warehouse
Analytics
Supply Chain
Management
System
Suppliers
3rd Party
In Network
Web
Servers
Application
Servers

9
Commerce Functional Components
Information
Layer
Look & Feel
Navigation
Customization
Personalization
Branding
Promotions
Chat
Ads
Customer's
Perspective
Research
Browse
Search
Select
Shopping Cart
Purchase
Checkout
Receive
Track
Use
Feedback
Maintain
Dialog
Assist
Market / Offer
Guide
Offer
Semantic
Search
Recommend
Rule-based
Decisions
Pricing
Coupons
Sell / Fullfill
Orders
Payments
Fraud
Detection
Fulfillment
Business Rules
Insight
Session
Capture
Activity
Monitoring
Customer Enterprise
Information
Management
Merchandising
Content
Inventory
Customer
Channel
Sales &
Fulfillment
Insight
Social

11
Merchandising
Merchandising
MongoDB
Product Variation
Product Hierarchy
Pricing
Promotions
Ratings & Reviews
Calendar
Semantic Search
Product Definition
Localization

12
• Single view of a product: Single scalable catalog service
used by all services and channels
• Read volume is high and sustained
• Write volume spikes up during catalog update, but also
allows real-time updating of a product
• Advanced indexing and querying is a requirement: find
product by SKU, category, color, etc
• Geographical distribution and low latency achieved
through replication
• Scaling achieved through sharding
Merchandising - principles

13
Merchandising - requirements
Requirement Example Challenge MongoDB
Single-view of product Blended description and
hierarchy of product to
ensure availability on all
channels
Flexible document-oriented
storage
High sustained read
volume with low latency
Constant querying from
online users and sales
associates, requiring
immediate response
Fast indexed querying,
replication allows local copy
of catalog, sharding for
scaling
Spiky and real-time write
volume
Bulk update of full catalog
without impacting
production, real-time touch
update
Fast in-place updating, real-
time indexing, , sharding for
scaling
Advanced querying Find product based on
color, size, description
Ad-hoc querying on any
field, advanced secondary
and compound indexing

14
Merchandising - Product Page
Product
images
General
Informatio
n
List of
Variations
External
Informatio
n
Localized
Description

15
> db.definitions.findOne()
{ productId: "301671", // main product id
department: "Shoes",
category: "Shoes/Women/Pumps",
brand: "Guess",
thumbnail: "http://cdn…/pump.jpg",
image: "http://cdn…/pump1.jpg", // larger version of thumbnail
title: "Evening Platform Pumps",
description: "Those evening platform pumps put the perfect
finishing touches on your most glamourous night-on-the-town
outfit",
shortDescription: "Evening Platform Pumps",
style: "Designer",
type: "Platform",
rating: 4.5, // user rating
lastUpdated: Date("2014/04/01"), // last update time
… }
Merchandising - Product Definition

16
• Get item from Product Id
db.definition.findOne( { productId: "301671" } )
• Get item from Product Ids
db.definition.findOne( { productId: { $in: ["301671", "301672" ] } } )
• Get items by department
db.definition.find({ department: "Shoes" })
• Get items by category prefix
db.definition.find( { category: /^Shoes/Women/ } )
• Indices
productId, department, category, lastUpdated
Merchandising - Product Definition

17
> db.variations.findOne()
{
_id: "730223104376", // the sku
productId: "301671", // references product id
thumbnail: "http://cdn…/pump-red.jpg",
image: "http://cdn…/pump-red.jpg", // larger version of
thumbnail
size: 6.0,
color: "Red",
width: "B",
heelHeight: 5.0,
…
}
Merchandising - Product Variation

18
• Get Variation from SKU
db.variation.find( { _id: "730223104376" } )
• Get all variations for a product, sorted by SKU
db.variation.find( { productId: "301671" } ).sort( { _id: 1 } )
• Indices
productId, lastUpdated
Merchandising - Product Variation

20
Price: {
_id: "sku730223104376_store123",
currency: "USD",
price: 89.95,
…
}
_id: concatenation of item and store.
Store: can be a store group or store id.
Item: can be an item id or sku
Indices: lastUpdated
Merchandising – Pricing

21
• Get all prices for a given item
db.prices.find( { _id: /^p301671_/ )
• Get all prices for a given sku (price could be at item level)
db.prices.find( { _id: { $in: [ /^sku730223104376_/, /^p301671_/ ])
• Get minimum and maximum prices for a sku
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price },
max: { $max : price} } })
• Get price for a sku and store id (returns up to 4 prices)
db.prices.find( { _id: { $in: [ "sku730223104376_store1234",
"sku730223104376_sgroup0",
"p301671_store1234",
"p301671_sgroup0"] , { price: 1 })
Merchandising - Pricing

22
• The hierarchy of items typically follows:
• Company
– Division:
• Department: Women's shoe store
– Class: Pumps
»Item: Guess classic pump
• Variation: size 6 black
Merchandising – Product Hierarchy

24
Merchandising – Browse and Search products
Browse by
category
Special
Lists
Filter by
attributes
Lists hundreds
of item
summaries
Ideally a single query is issued to the database
to obtain all items and metadata to display

25
The previous page presents many challenges:
• Response is needed within milliseconds for hundreds of
items
• Faceted search on many attributes of an item:
department, brand, category, etc
• Attributes to match may be at the variation level: color,
size, etc, in which case the variation should be shown
• One item may have thousands of variations. Only one
item should be displayed even if many variations match
• Efficient sorting on several attributes: price, popularity
• Pagination feature which requires deterministic ordering

26
Hundreds
of sizes
One Item
Dozens of
colors
A single item may have thousands of variations

27
Images of the matching
variations are displayed
Hierarchy
Sort
parameter
Faceted
Search

28
Merchandising – Traditional Architecture
Relational DB
System of Records
Full Text Search
Engine
Indexing
#1 obtain
search
results IDs
ApplicationCache
#2 obtain
objects by
ID
Pre-joined
into objects

29
The traditional architecture presents issues:
• 3 different systems to maintain: RDBMS, Search
engine, Caching layer
• A search returns a list of IDs which then are looked up in
the cache as a batch or one by one. It significantly
increases latency of response
• RDBMS schema is complex and static
• The search index needs to be refreshed at intervals
• Setup does not allow efficient pagination
Merchandising – Traditional Architecture

30
MongoDB Data Store
Merchandising - Architecture
Product
Summaries
Product
Definitions
Pricing
Promotions
Product
Variations
Ratings &
Reviews
#1 Obtain
results

31
The product index relies on the following parameters:
• The department (required): the main component of category, e.g. "Shoes"
• An indexed attribute (optional)
– Category path, e.g. "Shoes/Women/Pumps"
– Price range (based on online prices)
– List of Item Attributes, e.g. Brand = Guess
– List of Variation Attributes, e.g. Color = red
• A non-indexed attribute (optional)
– List of Item Secondary Attributes, e.g. Style = Designer
– List of Variation Secondary Attributes, e.g. heel height = 5.0
• As well as Sorting, e.g. Price Low to High
Merchandising – Product Summaries

32
> db.summaries.findOne()
{ "_id": "p39",
"title": "Evening Platform Pumps 39",
"department": "Shoes", "category": "Shoes/Women/Pumps",
"thumbnail": "http://cdn…/pump-small-39.jpg", "image": "http://cdn…/pump-39.jpg",
"price": 145.99,
"rating": 0.95,
"attrs": [ { "brand" : "Guess"}, … ],
"sattrs": [ { "style" : "Designer"} , { "type" : "Platform"}, …],
"vars": [
{ "sku": "sku2441",
"thumbnail": "http://cdn…/pump-small-39.jpg.Blue",
"image": "http://cdn…/pump-39.jpg.Blue",
"attrs": [ { "size": 6.0 }, { "color": "Blue" }, …],
"sattrs": [ { "width" : "B"} , { "heelHeight" : 5.0 }, …],
}, … Many more skus …
] }
Indices: vars.sku, department + attr + category, department + vars.attrs + category,
department + category, department + price, department + rating
Merchandising – Product Summaries

33
• Get summary from item id
db.variation.find({ _id: "p301671" })
• Get summary's specific variation from SKU
db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } )
• Get summary by department, sorted by rating
db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } )
• Get summary with mix of parameters
db.variation.find( { department : "Shoes" ,
"vars.attrs" : { "color" : "Gray"} ,
"category" : ^/Shoes/Women/ ,
"price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )
Merchandising - Product Summaries

34
Merchandising – Query stats
Department Category Price Primary
attribute
Time
Average
(ms)
90th (ms) 95th (ms)
1 0 0 0 2 3 3
1 1 0 0 1 2 2
1 0 1 0 1 2 3
1 1 1 0 1 2 2
1 0 0 1 0 1 2
1 1 0 1 0 1 1
1 0 1 1 1 2 2
1 1 1 1 0 1 1
1 0 0 2 1 3 3
1 1 0 2 0 2 2
1 0 1 2 10 20 35
1 1 1 2 0 1 1

36
Content
Content
MongoDB
Metadata
Asset Repository
Digital Right Mgt
Access Control
Processing /
Encoding

38
Inventory
Inventory
MongoDB
External Inventory
Internal Inventory
Regional Inventory
Purchase Orders
Fulfillment
Promotions

39
Demonstration Document Model
Definitions
• id: p0
Variations
• id: sku0
• pId: p0
Summary
• id: p0
• vars: [sku0,
sku1, …]
Stores
• id: s1
• Loc: [22, 33]
Inventory
• store: s1
• pId: p0
• vars:
[{sku: sku0, q: 3},
{sku: sku2, q: 2}]
Product

40
db.stores.findOne()
{ "_id" : ObjectId("53549fd3e4b0aaf5d6d07f35"),
"className" : "catalog.Store",
"storeId" : "store0",
"name" : "Bessemer store",
"address" : {
"addr1" : "1st Main St",
"city" : "Bessemer",
"state" : "AL",
"zip" : "12345",
"country" : "US"
},
"location" : [
-86.95444,
33.40178
]
… }
Inventory - Stores

41
• Get a store by storeId
db.stores.find({ productId: "301671" })
• Get nearby stores sorted by distance
db.stores.runCommand({ "geoNear" : "stores" , "near" : [ -82.800672 ,
40.090844] , "maxDistance" : 10.0 , "spherical" : true}
Inventory - Stores

42
> db.inventory.findOne()
{ "_id": "5354869f300487d20b2b011d",
"storeId": "store0",
"location": [
-86.95444,
33.40178
],
"productId": "p0",
"vars": [
{ "sku": "sku1", "q": 14 },
{ "sku": "sku3", "q": 7 },
{ "sku": "sku7", "q": 32 },
{ "sku": "sku14", "q": 65 },
...
]
}
Inventory - Quantities

43
• Get all items in a store
db.inventory.find({ storeId: "store100" })
• Get quantity for an item at a store
db.inventory.find({ storeId: "store100", productId: "p200" })
• Get quantity for a sku at a store
db.inventory.find(
{ storeId: "store100", productId: "p200", "vars.sku": "sku11736" },
{ "vars.$": 1 })
• Increment / decrement inventory for an item at a store
db.inventory.update(
{ storeId: "store100", productId: "p200", "vars.sku": "sku11736" },
{ $inc: { "vars.$.q": 20 } })
• Indices: productId, storeId + productId, location (geo) + productId
Inventory - Stores

44
• Aggregate total quantity for an item
db.inventory.aggregate([
{ $match: { productId: "p200" }},
{ $unwind: "$vars" },
{ $group: { _id: "result", count: {$sum: 1} } }])
{ "_id" : "result", "count" : 101752 }
• Aggregate total quantity for a store
db.inventory.aggregate([
{ $match: { storeId: "store100" }},
{ $unwind: "$vars" },
{ $group: { _id: "result", count: {$sum: 1} } }])
{ "_id" : "result", "count" : 29347 }
Inventory - Stores

45
• Get inventory for an item near a point
db.runCommand(
{ "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] ,
"maxDistance" : 10.0 , "spherical" : true, limit: 10,
query: { productId: "p200", "vars.sku": "sku11736" }})
• Get closest store with available sku
db.runCommand(
{ "geoNear" : "inventory" , "near" : [ -82.800672 , 40.090844] ,
"maxDistance" : 10.0 , "spherical" : true, limit: 10,
query: { productId: "p200",
vars: { $elemMatch: { "sku": "sku11736", q: { $gt: 0 } }}}}})
Inventory - Stores

47
Customer
Customer
MongoDB
Profile
Market Segment
Demographics
Wish List
Preference
Inbox
Sales / Support
Chat
Content
Subscription

49
Channels
Channels
MongoDB
Location
Store
Assortment
Point of Sale
Channel Definition
Planogram

51
Sales & Fulfillment
Sales &
Fulfillment
MongoDB
Sales Transaction
Shipping
Tracking
Return & Exchange
Business Rule
Audit
Shopping Cart

53
Insight
Insight
MongoDB
Advertising metrics
Clickstream
Recommendations
Session Capture
Activity Logging
Geo Tracking
Product Analytics
Customer Insight
Application Logs

54
• Many user activities can be of interest:
– Search
– Product view, like or wish
– Shopping cart add / remove
– Sharing on social network
– Ad impression, Clickstream
• Those will be used to compute:
– Product Map (relationships, etc)
– User Preferences
– Recommendations
– Trends
Activity Logging – Data of interest

55
Activity logging - Architecture
MongoDB
HVDF
API
Activity Logging
User History
External
Analytics:
Hadoop,
Spark,
Storm,
…
User Preferences
Recommendations
Trends
Product Map
Apps
Internal
Analytics:
Aggregation,
MR
All user activity
is recorded
MongoDB –
Hadoop
Connector
Personalization

57
• You need to store and manage an incoming stream of data
samples (views, impressions, orders, …)
– High arrival rate of data from many sources
– Variable schema of arriving data
– You need to control retention period of data
• You need to compute derivative data sets based on these
samples
– Aggregations and statistics based on data
– Roll-up data into pre-computed reports and summaries
• You need low latency access to up-to-date data (user history)
– Flexible indexing of raw and derived data sets
– Rich querying based on time + meta-data fields in samples
Activity Logging – Problem statement

58
Activity logging - Requirements
Requirement MongoDB
Ingestion of 100ks of
writes / sec
Fast C++ process, multi-threads, multi-locks. Horizontal
scaling via sharding. Sequential IO via time partitioning.
Flexible schema Dynamic schema, each document is independent. Data is
stored the same format and size as it is inserted.
Fast querying on varied
fields, sorting
Secondary Btree indexes can lookup and sort the data in
milliseconds.
Easy clean up of old data Deletes are typically as expensive as inserts. Getting free
deletes via time partitioning.

59
Activity Logging using HVDF
HVDF (High Volume Data Feed):
• Open source reference implementation of high
volume writing with MongoDB
• Rest API server written in Java with most
popular libraries
• Public project, issues can be logged
• Can be run as-is, or customized as needed

60
Feed
High volume data feed architecture
Channel
Sample Sample Sample Sample
Source
Source
Processor
Inline
Processing
Batch
Processing
Stream
Processing
The Channel is the
sequence of data
samples that a sensor
sends into the
platform.
Sources send
samples into
the Channel
Processors generate
derivative Channels from
other Channel data

61
HVDF -- High Volume Data Feed engine
HVDF – Reference implementation
REST
Service API
Processor
Plugins
Inline
Batch
Stream
Channel Data Storage
Raw
Channel
Data
Aggregated
Rollup T1
Aggregated
Rollup T2
Query Processor Streaming spout
Custom Stream
Processing Logic
Incoming Sample Stream
POST /feed/channel/data
GET
/feed/channeldata?time=XX
X&range=YYY
Real-time Queries

62
{ _id: ObjectId(),
geoCode: 1, // used to localize write operations
sessionId: "2373BB…",
device: { id: "1234",
type: "mobile/iphone",
userAgent: "Chrome/34.0.1847.131"
}
type: "VIEW|CART_ADD|CART_REMOVE|ORDER|…", // type of activity
itemId: "301671",
sku: "730223104376",
order: { id: "12520185",
… },
location: [ -86.95444, 33.40178 ],
tags: [ "smartphone", "iphone", … ], // associated tags
timeStamp: Date("2014/04/01 …")
}
User Activity - Model

63
Dynamic schema for sample data
Sample 1
{
deviceId: XXXX,
time: Date(…)
type: "VIEW",
…
}
Channel
Sample 2
{
deviceId: XXXX,
time: Date(…)
type: "CART_ADD",
cartId: 123, …
}
Sample 3
{
deviceId: XXXX,
time: Date(…)
type: “FB_LIKE”
}
Each sample
can have
variable fields

64
Channels are sharded
Shard
Shard
Shard
Shard
Shard
Shard Key:
Customer_id
Sample
{
customer_id: XXXX,
time: Date(…)
type: "VIEW",
}
Channel
You choose how
to partition
samples
Samples can
have dynamic
schema
Scale
horizontally by
adding shards
Each shard is
highly available

65
Channels are time partitioned
Channel
Sample Sample Sample Sample Sample Sample Sample Sample
- 2 days - 1 Day Today
Partitioning
keeps indexes
manageable
This is where all
of the writes
happen
Older partitions
are read only for
best possible
concurrency
Queries are routed
only to needed
partitions
Partition 1 Partition 2 Partition N
Each partition is
a separate
collection
Efficient and
space reclaiming
purging of old
data

66
Dynamic queries on Channels
Channel
App
App
App
Indexes
Queries Pipelines Map-Reduce
Create custom
indexes on
Channels
Use full mongodb
query language to
access samples
Use mongodb
aggregation
pipelines to access
samples
Use mongodb
inline map-reduce
to access samples
Full access to
field, text, and geo
indexing

67
North America - West
North America - East
Europe
Geographically distributed system
Channel
Source
Source
Source
Source
Source
Source
Sample
Sample
Sample
Sample
Geo shards per
location
Clients write
local nodes
Single view of
channel available
globally

69
Insight – Useful Data
• Useful data for better shopping:
– User history (e.g. recently seen products)
– User statistics (e.g. total purchases, visits)
– User interests (e.g. likes videogames and SciFi)
– User social network
– Cross-selling: people who bought this item had
tendency to buy those other items (e.g. iPhone, then
bought iPhone case)
– Up-selling: people who looked at this item eventually
bought those items (alternative product that may be
better)

70
Example of real-time aggregation with Agg Framework
User Activity – Computing User Stats

71
Example of real-time aggregation with Agg Framework
User Activity – Computing User Stats

72
Let's simplify each activity recorded as the following:
{ userId: 123, type: order, itemId: 2, time }
To calculate items bought by a user for a period of time, let's use
MongoDB's Map Reduce:
- Match activities of type "order" for the past 2 weeks
- map: emit the document by userId
- reduce: push all itemId in a list
- Output looks like { _id: userId, items: [2, 3, 8] }
User Activity –
Items frequently bought together

73
Then run a 2nd mapreduce job that for each of the previous results:
- map: emits every combination of 2 items, starting with lowest
itemId
- reduce: sum up the total.
- output looks like { _id: { a: 2, b: 3 } , count: 36 }
User Activity –

74
The output collection can then be queried per item Id and sorted by
count, and cutoff at a threshold.
Need of index on { _id.a, count } and { _id.b, count }
You then obtain an affiliation collection with docs like:
{ itemId: 2, affil: [ { id: 3, weight: 36}, { id: 8, weight: 23} ] }
User Activity –

75
Example of Hadoop integration
User Activity – Hadoop integration

77
Social
Social
MongoDB
Social Channels
User Network
Activity
Chat
Social Profiles
Community Mgt
Rewards /
Gamification

83
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Single View of Product Cluster Topology

84
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DCPrimary node replicates data
to all secondaries in the shard
as fast as possible

85
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Center Shard contains
all the data for stores
in Center region

86
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Center Shard contains
all the data for stores
in Center region
Local writes enable
very high throughput
of updates

87
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Each region is able to
see the data of all
stores from its “local”
DC.

88
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Two nodes in each DC
for painless maintenance
with zero downtime

89
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Even if a DC goes out, the
database remains fully available
thanks to automated failover

90
West DC
Primary
Primary
Primary
Shard
“West”
Shard
“Center”
Shard
“East”
Center DC East DC
Data set can grow, shards can
add up, without any rewrite of the
application code

Thank You!
Antoine Girbal
Senior Solutions Engineer, MongoDB Inc.
@antoinegirbal

Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Ähnlich wie Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization (20)

Mehr von MongoDB

Mehr von MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Retail Reference Architecture Part 3: Scalable Insight Component Providing User History, Recommendations and Personalization

Hinweis der Redaktion