SlideShare ist ein Scribd-Unternehmen logo
1 von 108
Downloaden Sie, um offline zu lesen
Recommender Systems 102
Beyond the (usual) user-item matrix—implementation & results
DataScience SG Meetup Jan 2020
About me
§ Lead Data Scientist @ health-tech startup
- Early detection of preventable diseases
- Healthcare resource allocation
§ Previously: VP, Data Science @ Lazada
- E-commerce ML systems
- Facilitated integration with Alibaba
§ More at https://eugeneyan.com
RecSys
Overview
Figure 1. Obligatory (cliché) recsys representation
Definition: Use
behavior data to
predict what other
users will like based on
user/item similarity
Topics*
§ Data Acquisition, Preparation, Split, etc.
§ Conventional Baseline
§ Applying Graph and NLP approaches
* Implementation and results discussed throughout
Laying the Groundwork
Data acquisition, preparation, train-val-split, etc.
Data
Acquisition
http://jmcauley.ucsd.edu/data/amazon/links.html
{
"asin": "0000031852",
"title": "Girls Ballet Tutu Zebra Hot Pink",
"price": 3.17,
"imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg",
"related”:
{ "also_bought":[ "B00JHONN1S",
"B002BZX8Z6",
"B00D2K1M3O",
...
"B007R2RM8W"
],
"also_viewed":[ "B002BZX8Z6",
"B00JHONN1S",
"B008F0SU0Y",
...
"B00BFXLZ8M"
],
"bought_together":[ "B002BZX8Z6"
]
},
"salesRank":
{ "Toys & Games":211836
},
"brand": "Coxlures",
"categories":[
[ "Sports & Outdoors",
"Other Sports",
"Dance"
]
]
}
{
"asin": "0000031852",
"title": "Girls Ballet Tutu Zebra Hot Pink",
"price": 3.17,
"imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg",
"related”:
{ "also_bought":[ "B00JHONN1S",
"B002BZX8Z6",
"B00D2K1M3O",
...
"B007R2RM8W"
],
"also_viewed":[ "B002BZX8Z6",
"B00JHONN1S",
"B008F0SU0Y",
...
"B00BFXLZ8M"
],
"bought_together":[ "B002BZX8Z6"
]
},
"salesRank":
{ "Toys & Games":211836
},
"brand": "Coxlures",
"categories":[
[ "Sports & Outdoors",
"Other Sports",
"Dance"
]
]
}
Parsing
json
§ Require parsing json to tabular form
§ Fairly large, with the largest having 142.8
million rows and 20gb on disk
§ Not able to load into ram fully on regular
laptop (16gb ram)
def parse_json_to_csv(read_path: str, write_path: str) -> None:
csv_writer = csv.writer(open(write_path, 'w'))
i = 0
for d in parse(read_path):
if i == 0:
header = d.keys()
csv_writer.writerow(header)
csv_writer.writerow(d.values().lower())
i += 1
if i % 10000 == 0:
logger.info('Rows processed: {:,}'.format(i))
logger.info('Csv saved to {}'.format(write_path))
Getting
product-
pairs
§ Evaluate string and convert to dictionary
§ Get product-pairs for each relationship
§ Explode each product-pair into a row
Getting
product-
pairs
§ Evaluate string and convert to dictionary
§ Get product-pairs for each relationship
§ Explode each product-pair into a row
product1 | product2 | relationship
--------------------------------------
B001T9NUFS | B003AVEU6G | also_viewed
0000031895 | B002R0FA24 | also_viewed
B007ZN5Y56 | B005C4Y4F6 | also_viewed
0000031909 | B00538F5OK | also_bought
B00CYBULSO | B00B608000 | also_bought
B004FOEEHC | B00D9C32NI | bought_together
Table 1. Product-pairs and relationships (sample)
Scoring
product-
pairs
§ Simple way: Assign 1.0 if product-pair has
any/multiple relationships, 0.0 otherwise
§ My approach: Score relationships differently*
- Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
Scoring
product-
pairs
§ Simple way: Assign 1.0 if product-pair has
any/multiple relationships, 0.0 otherwise
§ My approach: Score relationships differently*
- Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
product1 | product2 | weight
--------------------------------
B001T9NUFS | B003AVEU6G | 0.5
0000031895 | B002R0FA24 | 0.5
B007ZN5Y56 | B005C4Y4F6 | 0.5
0000031909 | B00538F5OK | 1.0
B00CYBULSO | B00B608000 | 1.0
B004FOEEHC | B00D9C32NI | 1.2
Table 2. Product-pairs and weights (sample)
* Assume relationships are symmetrical
Electronics Books
Unique products 418,749 1,948,370
Product-pairs 4,005,262 26,595,848
Sparsity 0.9999 0.9999
𝑆𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = 1 −
𝐶𝑜𝑢𝑛𝑡(𝑛𝑜𝑛𝑧𝑒𝑟𝑜 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
𝐶𝑜𝑢𝑛𝑡(𝑡𝑜𝑡𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠)
Table 3. Unique products and sparsity for electronics and books
Train-Validation Split
Or how to create negative samples (at scale)
Splitting
the data
§ Random split: 2/3 train, 1/3 validation
§ Easy, right?
§ But our dataset only consists of positive
product-pairs—how do we validate?
Splitting
the data
§ Random split: 2/3 train, 1/3 validation
§ Easy, right?
§ Not so fast! Our dataset only has positive
product-pairs—how do we validate?
Creating
negative
samples
§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million times—very slow!
§ Hack: Add products in array, shuffle, slice
to sample; shuffle when exhausted—fast!
Creating
negative
samples
§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million times—very slow!
§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhausted—fast!
Creating
negative
samples
§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million times—very slow!
§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhausted—fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 1
Creating
negative
samples
§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million times—very slow!
§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhausted—fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 2
Creating
negative
samples
§ Direct approach: Random sampling
- To create 1 million negative product-pairs, call
random 2 million times—very slow!
§ Hack: Add products in array, shuffle, slice to
sample; re-shuffle when exhausted—fast!
products
----------
B001T9NUFS
0000031895
B007ZN5Y56
0000031909
B00CYBULSO
B004FOEEHC
Negative product-pair 3
Matrix Factorization
Let’s start with a baseline
Batch MF
§ Common approach 1: Load matrix in
memory; apply Python package (e.g.,
scipy.svd, surprise, etc.)
§ Common approach 2: Run on cluster with
SparkML Alternating Least Squares
§ Very resource intensive!
- Is there a smarter way, given the sparse data?
Batch MF
§ Common approach 1: Load matrix in
memory; apply Python package (e.g.,
scipy.svd, surprise, etc.)
§ Common approach 2: Run on cluster with
SparkML Alternating Least Squares
§ Very resource intensive!
- Is there a smarter way, given the sparse data?
Iterative
MF
§ Only load (or read from disk) product-pairs,
instead of entire matrix that contains zeros
§ Matrix factorization by iterating through
each product-pair
Iterative
MF
(numeric
labels,
step 0)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (multiply embeddings and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 1)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 2)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(numeric
labels,
step 3)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sum(product1_emb * product2_emb, dim=1)
# Minimize loss
loss = MeanSquaredErrorLoss(prediction, label)
loss.backward()
optimizer.step()
Iterative
MF
(binary
labels)
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sig(sum(product1_emb * product2_emb, dim=1))
# Minimize loss
loss = BinaryCrossEntropyLoss(prediction, label)
loss.backward()
optimizer.step()
Regularize!
for product_pair, label in train_set:
# Get embedding for each product
product1_emb = embedding(product1)
product2_emb = embedding(product2)
# Predict product-pair score (interaction term and sum)
prediction = sig(sum(product1_emb * product2_emb, dim=1))
l2_reg = lambda * sum(embedding.weight ** 2)
# Minimize loss
loss = BinaryCrossEntropyLoss(prediction, label)
loss += l2_reg
loss.backward()
optimizer.step()
Training
Schedule
Figure 2. Cosine Annealing training schedule
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Binary labels
AUC-ROC = 0.8083
Time for 5 epochs = 45 min
Continuous labels
AUC-ROC = 0.9225
Time for 5 epochs = 45 min
Figure 3a and 3b. Precision recall curves for Matrix Factorization
Results
(MF)
Figure 3a and 3b. Precision recall curves for Matrix Factorization
”Cliff of Death”
Learning
curve
(MF)
Figure 4. AUC-ROC across epochs for matrix factorization; Each time learning rate is
reset, the model seems to ”forget”, causing AUC-ROC to revert to ~0.5.
Also, a single epoch seems sufficient
Matrix Factorization + bias
Incremental improvement on the baseline
Adding
bias
§ What if a product is generally popular or
unpopular?
§ Learn a bias factor (i.e., single number for
each product)
Results
(MF-bias)
Binary labels
AUC-ROC = 0.7951
Time for 5 epochs = 45 min
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
Results
(MF-bias)
Binary labels
AUC-ROC = 0.7951
Time for 5 epochs = 45 min
Continuous labels
AUC-ROC = 0.8319
Time for 5 epochs = 45 min
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
Results
(MF-bias)
Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
More
“production
friendly”
Off the Beaten Path
Natural language processing (“NLP”) and Graphs in RecSys
Word2Vec
§ In 2013, two seminal papers by Tomas
Mikolov on Word2Vec (”w2v”)
§ Demonstrated w2v could learn semantic
and syntactic word vector representations
§ TL; DR: Converts words into numbers (array)
DeepWalk
§ Unsupervised learning of representations of
nodes (i.e., vertices) in a social network
§ Generate sequences from random walks
on (social) graph
§ Learn vector representations of nodes
(e.g., profiles, content)
How do
NLP and
Graphs
matter?
How do
NLP and
Graphs
matter?
§ Create graph from product-pairs + weights
§ Generate sequences from graph (via
random walk)
§ Learn product embeddings (via word2vec)
§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
§ Create graph from product-pairs + weights
§ Generate sequences from graph (via
random walk)
§ Learn product embeddings (via word2vec)
§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
§ Create graph from product-pairs + weights
§ Generate sequences from graph (via
random walk)
§ Learn product embeddings (via word2vec)
§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
How do
NLP and
Graphs
matter?
§ Create graph from product-pairs + weights
§ Generate sequences from graph (via
random walk)
§ Learn product embeddings (via word2vec)
§ Recommend based on embedding similarity
(e.g., cosine similarity, dot product)
More groundwork
Generating graphs and sequences
Creating a
product
graph
§ We have product-pairs and weights
- These are our graph edges
§ Create a weighted graph with networkx
- Each graph edge is given a numerical weight,
instead of all edges having same weight
product1 | product2 | weight
--------------------------------
B001T9NUFS | B003AVEU6G | 0.5
0000031895 | B002R0FA24 | 0.5
B007ZN5Y56 | B005C4Y4F6 | 0.5
0000031909 | B00538F5OK | 1.0
B00CYBULSO | B00B608000 | 1.1
B004FOEEHC | B00D9C32NI | 1.2
Table 2. Product-pairs and weights
Random
Walks
§ Direct approach: Traverse networkx graph
- For 10 sequences of length 10 for a starting node,
need to traverse 100 times
- 2 mil nodes for books graph = 200 mil queries
- Very slow and memory intensive
§ Hack: Work with transition probabilities
Random
Walks
§ Direct approach: Traverse networkx graph
- For 10 sequences of length 10 for a starting node,
need to traverse 100 times
- 2 mil nodes for books graph = 200 mil queries
- Very slow and memory intensive
§ Hack: Work directly on transition probabilities
Random
Walks
(Nodes and
edges)
1
2
4
5
3
1
1
1
2
3
Random
Walks
(Weighted-
adjacency
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 1 1 3
Product2 1 1
Product3 1 2
Product4 3 2
Product5 1
Random
Walks
(Transition
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 .2 .2 .6
Product2 .5 .5
Product3 .33 .67
Product4 .6 .4
Product5 1.0
Random
Walks
(Transition
matrix)
Product1 Product2 Product3 Product4 Product5
Product1 .2 .2 .6
Product2 .5 .5
Product3 .33 .67
Product4 .6 .4
Product5 1.0
Transition-probability(Product3)
B001T9NUFS B003AVEU6G B005C4Y4F6 B007ZN5Y56 ... B007ZN5Y56
0000031895 B00538F5OK B004FOEEHC B001T9NUFS ... 0000031895
B005C4Y4F6 0000031909 B00CYBULSO B003AVEU6G ... B00D9C32NI
B00CYBULSO B001T9NUFS B002R0FA24 B00CYBULSO ... B007ZN5Y56
B004FOEEHC B00CYBULSO B001T9NUFS B002R0FA24 ... B00B608000
...
...
...
0000031909 B00B608000 B00D9C32NI B00CYBULSO ... B007ZN5Y56
Length of sequence (10)
No. of nodes
(420k) * samples
per node (10)
Pre-canned Node2Vec
Readily available open-sourced implementations
Node2Vec
§ Seemed to work out of the box
- Just need to provide edges
- Uses networkx and gensim under the hood
§ But very memory intensive and slow
- Could not run to completion even on 64gb ram
https://github.com/aditya-grover/node2vec
Gensim Word2Vec
Using a trusted package as baseline
Gensim
w2v
§ Very easy to use
- Takes in a list of sequences
- Can be multithreaded
- CPU-only
§ Fastest to complete 5 epochs
Results
(gensim
w2v)
All products
AUC-ROC = 0.9082
Time for 5 epochs = 2.58 min
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Results
(gensim
w2v)
All products
AUC-ROC = 0.9082
Time for 5 epochs = 2.58 min
Seen products only
AUC-ROC = 0.9735
Time for 5 epochs = 2.58 min
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Results
(gensim
w2v)
Figure 6a and 6b. Precision recall curves for gensim.word2vec
Unseen products
without embeddings
Building w2v from Scratch
To plot learning curves and extend it
Data
Loader
§ Input sequences instead of product-pairs
§ Implements two features from w2v papers
- Subsampling of frequent words
- Negative sampling
Data
Loader
(sub-
sampling)
§ Drop out words of higher frequency
- Frequency of 0.0026 = 0.0 dropout
- Frequency of 0.00746 = 0.5 dropout
- Frequency of 1.0 = 0.977 dropout
§ Accelerated learning and improved
vectors of rare words
𝐷𝑟𝑜𝑝𝑜𝑢𝑡 𝑃𝑟𝑜𝑏 𝑤𝑜𝑟𝑑 = 1 −
𝐹𝑟𝑒𝑞 𝑤𝑜𝑟𝑑
0.001
+ 1 ×
0.001
𝐹𝑟𝑒𝑞(𝑤𝑜𝑟𝑑)
Data
Loader
(sub-
sampling)
§ Drop out words of higher frequency
- Frequency of 0.0026 = 0.0 dropout
- Frequency of 0.00746 = 0.5 dropout
- Frequency of 1.0 = 0.977 dropout
§ Accelerated learning and improved
vectors of rare words
𝐷𝑟𝑜𝑝𝑜𝑢𝑡 𝑃𝑟𝑜𝑏 𝑤𝑜𝑟𝑑 = 1 −
𝐹𝑟𝑒𝑞 𝑤𝑜𝑟𝑑
0.001
+ 1 ×
0.001
𝐹𝑟𝑒𝑞(𝑤𝑜𝑟𝑑)
Data
Loader
(Negative
sampling)
§ Original skip-gram ends with SoftMax
- If vocab = 10k words, embedding dim = 128,
1.28 million weights to update—expensive!
- In RecSys, ”vocab” in the millions
§ Negative sampling
- Only modify weights of negative pair samples
- If 6 pairs (1 pos, 5 neg) and 1 mil products, only
update 0.0006 weights—efficient!
Data
Loader
(Negative
sampling)
§ Original skip-gram ends with SoftMax
- If vocab = 10k words, embedding dim = 128,
1.28 million weights to update—expensive!
- In RecSys, ”vocab” in the millions
§ Negative sampling
- Only modify weights of negative pair samples
- If 6 pairs (1 pos, 5 neg) and 1 mil products, only
update 0.0006% weights—very efficient!
PyTorch
Word2Vec
(step 0)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.mul(emb_center, emb_context)
score = torch.sum(score, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 1)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.mul(emb_center, emb_context)
score = torch.sum(score, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 2)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs (interaction term and sum)
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 3)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs (interaction term and sum)
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs (batch interaction term and sum)
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
PyTorch
Word2Vec
(step 4)
class SkipGram(nn.Module):
def __init__(self, emb_size, emb_dim):
self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True)
def forward(self, center, context, neg_context):
emb_center, emb_context, emb_neg_context = self.get_embeddings()
# Get score for positive pairs
score = torch.sum(emb_center * emb_context, dim=1)
score = -F.logsigmoid(score)
# Get score for negative pairs
neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze()
neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1)
# Return combined score
return torch.mean(score + neg_score)
Results
(w2v)
Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec
All products
AUC-ROC = 0.9554
Time for 5 epochs = 23.63 min
Results
(w2v)
Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec
All products
AUC-ROC = 0.9554
Time for 5 epochs = 23.63 min
Seen products only
AUC-ROC = 0.9855
Time for 5 epochs = 23.63 min
Learning
curve
(w2v)
Figure 8. AUC-ROC across epochs for word2vec; a single epoch seems sufficient
Overall
results so
far
§ Improvement on gensim.word2vec and
Alibaba paper
All products Seen products only
PyTorch MF 0.7951 -
Gensim w2v 0.9082 0.9735
PyTorch w2v 0.9554 0.9855
Alibaba Paper* 0.9327 -
* Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349)
Table 4. AUC-ROC across various implementations
Adding side info to w2v
To help solve the cold start problem
Extending
w2v
§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Extending
w2v
§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Extending
w2v
§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Television Sound bar Lamp Standing Fan
Sony Sony Phillips Dyson
500 – 600 200 – 300 50 – 75 300 - 400
Extending
w2v
§ For each product, we have information like
category, brand, price group, etc.
- Why not add this when learning embeddings?
§ Alibaba paper reported AUC-ROC
improvement from 0.9327 to 0.9575
B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
Television Sound bar Lamp Standing Fan
Sony Sony Phillips Dyson
500 – 600 200 – 300 50 – 75 300 - 400
Weighting
side info
§ Two version were implemented
§ 1: Equal-weighted average of embeddings
§ 2: Learn weightage for each embedding
and applying weighted average
Learning
curve
(w2v with
side info)
Figure 9. AUC-ROC across epochs for word2vec with side information
Why
doesn’t it
work?!
§ Perhaps due to sparsity of metadata
- Of 418,749 electronics, metadata available for
162,023 (39%); Of these, brand was 51% empty
§ But I assumed the weights of the (useless)
embeddings would be learnt— ¯_(ツ)_/¯
§ An example of more data ≠ better
Why
doesn’t it
work?!
§ Perhaps due to sparsity of metadata
- Of 418,749 electronics, metadata available for
162,023 (39%); Of these, brand was 51% empty
§ But I assumed the weights of the (useless)
embeddings would be learnt— ¯_(ツ)_/¯
§ An example of more data ≠ better
Why w2v > MF?
Is it skip-gram? Or sequences?
Mixing it
up to pull
it apart
§ Why does w2v perform so much better?
§ For the fun of it, lets use the MF-bias model
with sequence data (used in w2v)
Results &
learning
curve
Figure 10a and 10b. Precision recall curve and learning curve
for PyTorch MF-bias with sequences
All products
AUC-ROC = 0.9320
Time for 5 epochs = 70.39 min
Further Extensions
What Airbnb, Facebook, and Uber are doing
Embed
everything
§ Building user embeddings in the same vector
space as products (Airbnb)
- Train user embeddings based on interactions with
products (e.g., click, ignore, purchase)
§ Embed all discrete features and just learn
similarities (Facebook)
§ Graph Neural Networks for embeddings;
node neighbors as representation (Uber Eats)
Key Takeaways
Last two tables, I promise
Overall
results
(electronics)
All products
Seen products
only
Runtime (min)
PyTorch MF 0.7951 - 45
Gensim w2v 0.9082 0.9735 2.58
PyTorch w2v 0.9554 0.9855 23.63
PyTorch w2v
with side info
NA NA NA
PyTorch MF with
sequences
0.9320 - 70.39
Alibaba Paper* 0.9327 - -
* Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349)
Table 5. AUC-ROC across various implementations (electronics)
Overall
results
(books)
All products
Seen products
only
Runtime (min)
PyTorch MF 0.4996 - 1353.12
Gensim w2v 0.9701 0.9892 16.24
PyTorch w2v 0.9775 - 122.66
PyTorch w2v
with side info
NA NA NA
PyTorch MF with
sequences
0.7196 - 1393.08
Table 6. AUC-ROC across various implementations (books)
§ Don’t just look at numeric metrics—plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
§ Matrix Factorization is an okay-ish baseline
§ Word2vec is a great baseline
§ Training on sequences is epic
§ Don’t just look at numeric metrics—plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
§ Matrix Factorization is an okay-ish baseline
§ Word2vec is a great baseline
§ Training on sequences is epic
§ Don’t just look at numeric metrics—plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
§ Matrix Factorization is an okay-ish baseline
§ Word2vec is a great baseline
§ Training on sequences is epic
§ Don’t just look at numeric metrics—plot some curves!
- Especially if you need some arbitrary threshold (i.e., classification)
§ Matrix Factorization is an okay-ish baseline
§ Word2vec is a great baseline
§ Training on sequences is epic
Thank you!
eugene@eugeneyan.com
References
McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015, August). Image-based
recommendations on styles and substitutes. In Proceedings of the 38th International ACM
SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. In Advances in neural
information processing systems (pp. 3111-3119).
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social
representations. In Proceedings of the 20th ACM SIGKDD international conference on
Knowledge discovery and data mining (pp. 701-710). ACM.
Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks.
In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery
and data mining (pp. 855-864). ACM.
References
Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., & Lee, D. L. (2018, July). Billion-scale
commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the
24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp.
839-848). ACM.
Grbovic, M., & Cheng, H. (2018, July). Real-time personalization using embeddings for search
ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (pp. 311-320). ACM.
Wu, L. Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., & Weston, J. (2018, April). Starspace:
Embed all the things!. In Thirty-Second AAAI Conference on Artificial Intelligence.
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations,
https://eng.uber.com/uber-eats-graph-learning/, retrieved 10 Jan 2020

Weitere ähnliche Inhalte

Was ist angesagt?

Css color and background properties
Css color and background propertiesCss color and background properties
Css color and background properties
Jesus Obenita Jr.
 

Was ist angesagt? (20)

Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
Web Mining
Web Mining Web Mining
Web Mining
 
Book Recommendation Engine
Book Recommendation EngineBook Recommendation Engine
Book Recommendation Engine
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph Schema
 
Css color and background properties
Css color and background propertiesCss color and background properties
Css color and background properties
 
Graph Neural Networks for Recommendations
Graph Neural Networks for RecommendationsGraph Neural Networks for Recommendations
Graph Neural Networks for Recommendations
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
CSS Flexbox (flexible box layout)
CSS Flexbox (flexible box layout)CSS Flexbox (flexible box layout)
CSS Flexbox (flexible box layout)
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Avro introduction
Avro introductionAvro introduction
Avro introduction
 
Bootstrap grids
Bootstrap gridsBootstrap grids
Bootstrap grids
 

Ähnlich wie Recommender Systems: Beyond the user-item matrix

PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docxPREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
ChantellPantoja184
 
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
ssusera34210
 
Repositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docxRepositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docx
kellet1
 
Im posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdfIm posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdf
maheshkumar12354
 

Ähnlich wie Recommender Systems: Beyond the user-item matrix (20)

PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docxPREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
PREMIER PRODUCTS, INC.Premier Products, Inc. manufactures te.docx
 
Assumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourselfAssumptions: Check yo'self before you wreck yourself
Assumptions: Check yo'self before you wreck yourself
 
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docxThe projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
The projectAboveWay Sandwich - ProjectYou are a Master Black Belt .docx
 
IPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation SlidesIPO Framework PowerPoint Presentation Slides
IPO Framework PowerPoint Presentation Slides
 
Lesson 9--production[1]
Lesson 9--production[1]Lesson 9--production[1]
Lesson 9--production[1]
 
Repositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docxRepositioning Assignment 1. All students are required to c.docx
Repositioning Assignment 1. All students are required to c.docx
 
Fast Distributed Online Classification
Fast Distributed Online Classification Fast Distributed Online Classification
Fast Distributed Online Classification
 
Introduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen TuckerIntroduction to Personalisation - Stephen Tucker
Introduction to Personalisation - Stephen Tucker
 
Recommend Products To Intsacart Customers
Recommend Products To Intsacart CustomersRecommend Products To Intsacart Customers
Recommend Products To Intsacart Customers
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
Retail products - machine learning recommendation engine
Retail products   - machine learning recommendation engineRetail products   - machine learning recommendation engine
Retail products - machine learning recommendation engine
 
Failure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep LearningFailure Rate Prediction with Deep Learning
Failure Rate Prediction with Deep Learning
 
Im posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdfIm posting this again because the answer wasnt correct.Please .pdf
Im posting this again because the answer wasnt correct.Please .pdf
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
 
Deep recommendations in PyTorch
Deep recommendations in PyTorchDeep recommendations in PyTorch
Deep recommendations in PyTorch
 
APIs for catalogs
APIs for catalogsAPIs for catalogs
APIs for catalogs
 
Magento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris incMagento 2 Automatic Related Products Extension by itoris inc
Magento 2 Automatic Related Products Extension by itoris inc
 
A/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approachA/B testing in Firebase. Intermediate and advanced approach
A/B testing in Firebase. Intermediate and advanced approach
 
Promotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: DataPromotion Analytics in Consumer Electronics - Module 1: Data
Promotion Analytics in Consumer Electronics - Module 1: Data
 

Mehr von Eugene Yan Ziyou

Mehr von Eugene Yan Ziyou (20)

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
Predicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admissionPredicting Hospital Bills at Pre-admission
Predicting Hospital Bills at Pre-admission
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
 
INSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyINSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my Journey
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
 
Diving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brandsDiving into Twitter data on consumer electronic brands
Diving into Twitter data on consumer electronic brands
 

Kürzlich hochgeladen

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Kürzlich hochgeladen (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Recommender Systems: Beyond the user-item matrix

  • 1. Recommender Systems 102 Beyond the (usual) user-item matrix—implementation & results DataScience SG Meetup Jan 2020
  • 2. About me § Lead Data Scientist @ health-tech startup - Early detection of preventable diseases - Healthcare resource allocation § Previously: VP, Data Science @ Lazada - E-commerce ML systems - Facilitated integration with Alibaba § More at https://eugeneyan.com
  • 3. RecSys Overview Figure 1. Obligatory (cliché) recsys representation
  • 4. Definition: Use behavior data to predict what other users will like based on user/item similarity
  • 5. Topics* § Data Acquisition, Preparation, Split, etc. § Conventional Baseline § Applying Graph and NLP approaches * Implementation and results discussed throughout
  • 6. Laying the Groundwork Data acquisition, preparation, train-val-split, etc.
  • 8. { "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "related”: { "also_bought":[ "B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", ... "B007R2RM8W" ], "also_viewed":[ "B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", ... "B00BFXLZ8M" ], "bought_together":[ "B002BZX8Z6" ] }, "salesRank": { "Toys & Games":211836 }, "brand": "Coxlures", "categories":[ [ "Sports & Outdoors", "Other Sports", "Dance" ] ] }
  • 9. { "asin": "0000031852", "title": "Girls Ballet Tutu Zebra Hot Pink", "price": 3.17, "imUrl": "http://ecx.images-amazon.com/images/I/51fAmVkTbyL._SY300_.jpg", "related”: { "also_bought":[ "B00JHONN1S", "B002BZX8Z6", "B00D2K1M3O", ... "B007R2RM8W" ], "also_viewed":[ "B002BZX8Z6", "B00JHONN1S", "B008F0SU0Y", ... "B00BFXLZ8M" ], "bought_together":[ "B002BZX8Z6" ] }, "salesRank": { "Toys & Games":211836 }, "brand": "Coxlures", "categories":[ [ "Sports & Outdoors", "Other Sports", "Dance" ] ] }
  • 10. Parsing json § Require parsing json to tabular form § Fairly large, with the largest having 142.8 million rows and 20gb on disk § Not able to load into ram fully on regular laptop (16gb ram)
  • 11. def parse_json_to_csv(read_path: str, write_path: str) -> None: csv_writer = csv.writer(open(write_path, 'w')) i = 0 for d in parse(read_path): if i == 0: header = d.keys() csv_writer.writerow(header) csv_writer.writerow(d.values().lower()) i += 1 if i % 10000 == 0: logger.info('Rows processed: {:,}'.format(i)) logger.info('Csv saved to {}'.format(write_path))
  • 12. Getting product- pairs § Evaluate string and convert to dictionary § Get product-pairs for each relationship § Explode each product-pair into a row
  • 13. Getting product- pairs § Evaluate string and convert to dictionary § Get product-pairs for each relationship § Explode each product-pair into a row product1 | product2 | relationship -------------------------------------- B001T9NUFS | B003AVEU6G | also_viewed 0000031895 | B002R0FA24 | also_viewed B007ZN5Y56 | B005C4Y4F6 | also_viewed 0000031909 | B00538F5OK | also_bought B00CYBULSO | B00B608000 | also_bought B004FOEEHC | B00D9C32NI | bought_together Table 1. Product-pairs and relationships (sample)
  • 14. Scoring product- pairs § Simple way: Assign 1.0 if product-pair has any/multiple relationships, 0.0 otherwise § My approach: Score relationships differently* - Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5
  • 15. Scoring product- pairs § Simple way: Assign 1.0 if product-pair has any/multiple relationships, 0.0 otherwise § My approach: Score relationships differently* - Bought together: 1.2, Also bought: 1.0, Also viewed: 0.5 product1 | product2 | weight -------------------------------- B001T9NUFS | B003AVEU6G | 0.5 0000031895 | B002R0FA24 | 0.5 B007ZN5Y56 | B005C4Y4F6 | 0.5 0000031909 | B00538F5OK | 1.0 B00CYBULSO | B00B608000 | 1.0 B004FOEEHC | B00D9C32NI | 1.2 Table 2. Product-pairs and weights (sample) * Assume relationships are symmetrical
  • 16. Electronics Books Unique products 418,749 1,948,370 Product-pairs 4,005,262 26,595,848 Sparsity 0.9999 0.9999 𝑆𝑝𝑎𝑟𝑠𝑖𝑡𝑦 = 1 − 𝐶𝑜𝑢𝑛𝑡(𝑛𝑜𝑛𝑧𝑒𝑟𝑜 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠) 𝐶𝑜𝑢𝑛𝑡(𝑡𝑜𝑡𝑎𝑙 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠) Table 3. Unique products and sparsity for electronics and books
  • 17. Train-Validation Split Or how to create negative samples (at scale)
  • 18. Splitting the data § Random split: 2/3 train, 1/3 validation § Easy, right? § But our dataset only consists of positive product-pairs—how do we validate?
  • 19. Splitting the data § Random split: 2/3 train, 1/3 validation § Easy, right? § Not so fast! Our dataset only has positive product-pairs—how do we validate?
  • 20. Creating negative samples § Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million times—very slow! § Hack: Add products in array, shuffle, slice to sample; shuffle when exhausted—fast!
  • 21. Creating negative samples § Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million times—very slow! § Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhausted—fast!
  • 22. Creating negative samples § Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million times—very slow! § Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhausted—fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 1
  • 23. Creating negative samples § Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million times—very slow! § Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhausted—fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 2
  • 24. Creating negative samples § Direct approach: Random sampling - To create 1 million negative product-pairs, call random 2 million times—very slow! § Hack: Add products in array, shuffle, slice to sample; re-shuffle when exhausted—fast! products ---------- B001T9NUFS 0000031895 B007ZN5Y56 0000031909 B00CYBULSO B004FOEEHC Negative product-pair 3
  • 26. Batch MF § Common approach 1: Load matrix in memory; apply Python package (e.g., scipy.svd, surprise, etc.) § Common approach 2: Run on cluster with SparkML Alternating Least Squares § Very resource intensive! - Is there a smarter way, given the sparse data?
  • 27. Batch MF § Common approach 1: Load matrix in memory; apply Python package (e.g., scipy.svd, surprise, etc.) § Common approach 2: Run on cluster with SparkML Alternating Least Squares § Very resource intensive! - Is there a smarter way, given the sparse data?
  • 28. Iterative MF § Only load (or read from disk) product-pairs, instead of entire matrix that contains zeros § Matrix factorization by iterating through each product-pair
  • 29. Iterative MF (numeric labels, step 0) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (multiply embeddings and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 30. Iterative MF (numeric labels, step 1) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 31. Iterative MF (numeric labels, step 2) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 32. Iterative MF (numeric labels, step 3) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sum(product1_emb * product2_emb, dim=1) # Minimize loss loss = MeanSquaredErrorLoss(prediction, label) loss.backward() optimizer.step()
  • 33. Iterative MF (binary labels) for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sig(sum(product1_emb * product2_emb, dim=1)) # Minimize loss loss = BinaryCrossEntropyLoss(prediction, label) loss.backward() optimizer.step()
  • 34. Regularize! for product_pair, label in train_set: # Get embedding for each product product1_emb = embedding(product1) product2_emb = embedding(product2) # Predict product-pair score (interaction term and sum) prediction = sig(sum(product1_emb * product2_emb, dim=1)) l2_reg = lambda * sum(embedding.weight ** 2) # Minimize loss loss = BinaryCrossEntropyLoss(prediction, label) loss += l2_reg loss.backward() optimizer.step()
  • 35. Training Schedule Figure 2. Cosine Annealing training schedule
  • 36. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 37. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 38. Results (MF) Binary labels AUC-ROC = 0.8083 Time for 5 epochs = 45 min Continuous labels AUC-ROC = 0.9225 Time for 5 epochs = 45 min Figure 3a and 3b. Precision recall curves for Matrix Factorization
  • 39. Results (MF) Figure 3a and 3b. Precision recall curves for Matrix Factorization ”Cliff of Death”
  • 40. Learning curve (MF) Figure 4. AUC-ROC across epochs for matrix factorization; Each time learning rate is reset, the model seems to ”forget”, causing AUC-ROC to revert to ~0.5. Also, a single epoch seems sufficient
  • 41. Matrix Factorization + bias Incremental improvement on the baseline
  • 42. Adding bias § What if a product is generally popular or unpopular? § Learn a bias factor (i.e., single number for each product)
  • 43. Results (MF-bias) Binary labels AUC-ROC = 0.7951 Time for 5 epochs = 45 min Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
  • 44. Results (MF-bias) Binary labels AUC-ROC = 0.7951 Time for 5 epochs = 45 min Continuous labels AUC-ROC = 0.8319 Time for 5 epochs = 45 min Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias
  • 45. Results (MF-bias) Figure 5a and 5b. Precision recall curves for Matrix Factorization with bias More “production friendly”
  • 46. Off the Beaten Path Natural language processing (“NLP”) and Graphs in RecSys
  • 47. Word2Vec § In 2013, two seminal papers by Tomas Mikolov on Word2Vec (”w2v”) § Demonstrated w2v could learn semantic and syntactic word vector representations § TL; DR: Converts words into numbers (array)
  • 48. DeepWalk § Unsupervised learning of representations of nodes (i.e., vertices) in a social network § Generate sequences from random walks on (social) graph § Learn vector representations of nodes (e.g., profiles, content)
  • 50. How do NLP and Graphs matter? § Create graph from product-pairs + weights § Generate sequences from graph (via random walk) § Learn product embeddings (via word2vec) § Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 51. How do NLP and Graphs matter? § Create graph from product-pairs + weights § Generate sequences from graph (via random walk) § Learn product embeddings (via word2vec) § Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 52. How do NLP and Graphs matter? § Create graph from product-pairs + weights § Generate sequences from graph (via random walk) § Learn product embeddings (via word2vec) § Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 53. How do NLP and Graphs matter? § Create graph from product-pairs + weights § Generate sequences from graph (via random walk) § Learn product embeddings (via word2vec) § Recommend based on embedding similarity (e.g., cosine similarity, dot product)
  • 55. Creating a product graph § We have product-pairs and weights - These are our graph edges § Create a weighted graph with networkx - Each graph edge is given a numerical weight, instead of all edges having same weight product1 | product2 | weight -------------------------------- B001T9NUFS | B003AVEU6G | 0.5 0000031895 | B002R0FA24 | 0.5 B007ZN5Y56 | B005C4Y4F6 | 0.5 0000031909 | B00538F5OK | 1.0 B00CYBULSO | B00B608000 | 1.1 B004FOEEHC | B00D9C32NI | 1.2 Table 2. Product-pairs and weights
  • 56. Random Walks § Direct approach: Traverse networkx graph - For 10 sequences of length 10 for a starting node, need to traverse 100 times - 2 mil nodes for books graph = 200 mil queries - Very slow and memory intensive § Hack: Work with transition probabilities
  • 57. Random Walks § Direct approach: Traverse networkx graph - For 10 sequences of length 10 for a starting node, need to traverse 100 times - 2 mil nodes for books graph = 200 mil queries - Very slow and memory intensive § Hack: Work directly on transition probabilities
  • 59. Random Walks (Weighted- adjacency matrix) Product1 Product2 Product3 Product4 Product5 Product1 1 1 3 Product2 1 1 Product3 1 2 Product4 3 2 Product5 1
  • 60. Random Walks (Transition matrix) Product1 Product2 Product3 Product4 Product5 Product1 .2 .2 .6 Product2 .5 .5 Product3 .33 .67 Product4 .6 .4 Product5 1.0
  • 61. Random Walks (Transition matrix) Product1 Product2 Product3 Product4 Product5 Product1 .2 .2 .6 Product2 .5 .5 Product3 .33 .67 Product4 .6 .4 Product5 1.0 Transition-probability(Product3)
  • 62. B001T9NUFS B003AVEU6G B005C4Y4F6 B007ZN5Y56 ... B007ZN5Y56 0000031895 B00538F5OK B004FOEEHC B001T9NUFS ... 0000031895 B005C4Y4F6 0000031909 B00CYBULSO B003AVEU6G ... B00D9C32NI B00CYBULSO B001T9NUFS B002R0FA24 B00CYBULSO ... B007ZN5Y56 B004FOEEHC B00CYBULSO B001T9NUFS B002R0FA24 ... B00B608000 ... ... ... 0000031909 B00B608000 B00D9C32NI B00CYBULSO ... B007ZN5Y56 Length of sequence (10) No. of nodes (420k) * samples per node (10)
  • 63. Pre-canned Node2Vec Readily available open-sourced implementations
  • 64. Node2Vec § Seemed to work out of the box - Just need to provide edges - Uses networkx and gensim under the hood § But very memory intensive and slow - Could not run to completion even on 64gb ram https://github.com/aditya-grover/node2vec
  • 65. Gensim Word2Vec Using a trusted package as baseline
  • 66. Gensim w2v § Very easy to use - Takes in a list of sequences - Can be multithreaded - CPU-only § Fastest to complete 5 epochs
  • 67. Results (gensim w2v) All products AUC-ROC = 0.9082 Time for 5 epochs = 2.58 min Figure 6a and 6b. Precision recall curves for gensim.word2vec
  • 68. Results (gensim w2v) All products AUC-ROC = 0.9082 Time for 5 epochs = 2.58 min Seen products only AUC-ROC = 0.9735 Time for 5 epochs = 2.58 min Figure 6a and 6b. Precision recall curves for gensim.word2vec
  • 69. Results (gensim w2v) Figure 6a and 6b. Precision recall curves for gensim.word2vec Unseen products without embeddings
  • 70. Building w2v from Scratch To plot learning curves and extend it
  • 71. Data Loader § Input sequences instead of product-pairs § Implements two features from w2v papers - Subsampling of frequent words - Negative sampling
  • 72. Data Loader (sub- sampling) § Drop out words of higher frequency - Frequency of 0.0026 = 0.0 dropout - Frequency of 0.00746 = 0.5 dropout - Frequency of 1.0 = 0.977 dropout § Accelerated learning and improved vectors of rare words 𝐷𝑟𝑜𝑝𝑜𝑢𝑡 𝑃𝑟𝑜𝑏 𝑤𝑜𝑟𝑑 = 1 − 𝐹𝑟𝑒𝑞 𝑤𝑜𝑟𝑑 0.001 + 1 × 0.001 𝐹𝑟𝑒𝑞(𝑤𝑜𝑟𝑑)
  • 73. Data Loader (sub- sampling) § Drop out words of higher frequency - Frequency of 0.0026 = 0.0 dropout - Frequency of 0.00746 = 0.5 dropout - Frequency of 1.0 = 0.977 dropout § Accelerated learning and improved vectors of rare words 𝐷𝑟𝑜𝑝𝑜𝑢𝑡 𝑃𝑟𝑜𝑏 𝑤𝑜𝑟𝑑 = 1 − 𝐹𝑟𝑒𝑞 𝑤𝑜𝑟𝑑 0.001 + 1 × 0.001 𝐹𝑟𝑒𝑞(𝑤𝑜𝑟𝑑)
  • 74. Data Loader (Negative sampling) § Original skip-gram ends with SoftMax - If vocab = 10k words, embedding dim = 128, 1.28 million weights to update—expensive! - In RecSys, ”vocab” in the millions § Negative sampling - Only modify weights of negative pair samples - If 6 pairs (1 pos, 5 neg) and 1 mil products, only update 0.0006 weights—efficient!
  • 75. Data Loader (Negative sampling) § Original skip-gram ends with SoftMax - If vocab = 10k words, embedding dim = 128, 1.28 million weights to update—expensive! - In RecSys, ”vocab” in the millions § Negative sampling - Only modify weights of negative pair samples - If 6 pairs (1 pos, 5 neg) and 1 mil products, only update 0.0006% weights—very efficient!
  • 76. PyTorch Word2Vec (step 0) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.mul(emb_center, emb_context) score = torch.sum(score, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 77. PyTorch Word2Vec (step 1) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.mul(emb_center, emb_context) score = torch.sum(score, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 78. PyTorch Word2Vec (step 2) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs (interaction term and sum) score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 79. PyTorch Word2Vec (step 3) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs (interaction term and sum) score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs (batch interaction term and sum) neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 80. PyTorch Word2Vec (step 4) class SkipGram(nn.Module): def __init__(self, emb_size, emb_dim): self.center_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) self.context_embeddings = nn.Embedding(emb_size, emb_dim, sparse=True) def forward(self, center, context, neg_context): emb_center, emb_context, emb_neg_context = self.get_embeddings() # Get score for positive pairs score = torch.sum(emb_center * emb_context, dim=1) score = -F.logsigmoid(score) # Get score for negative pairs neg_score = torch.bmm(emb_neg_context, emb_center.unsqueeze(2)).squeeze() neg_score = -torch.sum(F.logsigmoid(-neg_score), dim=1) # Return combined score return torch.mean(score + neg_score)
  • 81. Results (w2v) Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec All products AUC-ROC = 0.9554 Time for 5 epochs = 23.63 min
  • 82. Results (w2v) Figure 7a and 7b. Precision recall curves for PyTorch Word2Vec All products AUC-ROC = 0.9554 Time for 5 epochs = 23.63 min Seen products only AUC-ROC = 0.9855 Time for 5 epochs = 23.63 min
  • 83. Learning curve (w2v) Figure 8. AUC-ROC across epochs for word2vec; a single epoch seems sufficient
  • 84. Overall results so far § Improvement on gensim.word2vec and Alibaba paper All products Seen products only PyTorch MF 0.7951 - Gensim w2v 0.9082 0.9735 PyTorch w2v 0.9554 0.9855 Alibaba Paper* 0.9327 - * Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349) Table 4. AUC-ROC across various implementations
  • 85. Adding side info to w2v To help solve the cold start problem
  • 86. Extending w2v § For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
  • 87. Extending w2v § For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56
  • 88. Extending w2v § For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56 Television Sound bar Lamp Standing Fan Sony Sony Phillips Dyson 500 – 600 200 – 300 50 – 75 300 - 400
  • 89. Extending w2v § For each product, we have information like category, brand, price group, etc. - Why not add this when learning embeddings? § Alibaba paper reported AUC-ROC improvement from 0.9327 to 0.9575 B001T9NUFS -> B003AVEU6G -> B007ZN5Y56 ... -> B007ZN5Y56 Television Sound bar Lamp Standing Fan Sony Sony Phillips Dyson 500 – 600 200 – 300 50 – 75 300 - 400
  • 90. Weighting side info § Two version were implemented § 1: Equal-weighted average of embeddings § 2: Learn weightage for each embedding and applying weighted average
  • 91. Learning curve (w2v with side info) Figure 9. AUC-ROC across epochs for word2vec with side information
  • 92. Why doesn’t it work?! § Perhaps due to sparsity of metadata - Of 418,749 electronics, metadata available for 162,023 (39%); Of these, brand was 51% empty § But I assumed the weights of the (useless) embeddings would be learnt— ¯_(ツ)_/¯ § An example of more data ≠ better
  • 93. Why doesn’t it work?! § Perhaps due to sparsity of metadata - Of 418,749 electronics, metadata available for 162,023 (39%); Of these, brand was 51% empty § But I assumed the weights of the (useless) embeddings would be learnt— ¯_(ツ)_/¯ § An example of more data ≠ better
  • 94. Why w2v > MF? Is it skip-gram? Or sequences?
  • 95. Mixing it up to pull it apart § Why does w2v perform so much better? § For the fun of it, lets use the MF-bias model with sequence data (used in w2v)
  • 96. Results & learning curve Figure 10a and 10b. Precision recall curve and learning curve for PyTorch MF-bias with sequences All products AUC-ROC = 0.9320 Time for 5 epochs = 70.39 min
  • 97. Further Extensions What Airbnb, Facebook, and Uber are doing
  • 98. Embed everything § Building user embeddings in the same vector space as products (Airbnb) - Train user embeddings based on interactions with products (e.g., click, ignore, purchase) § Embed all discrete features and just learn similarities (Facebook) § Graph Neural Networks for embeddings; node neighbors as representation (Uber Eats)
  • 99. Key Takeaways Last two tables, I promise
  • 100. Overall results (electronics) All products Seen products only Runtime (min) PyTorch MF 0.7951 - 45 Gensim w2v 0.9082 0.9735 2.58 PyTorch w2v 0.9554 0.9855 23.63 PyTorch w2v with side info NA NA NA PyTorch MF with sequences 0.9320 - 70.39 Alibaba Paper* 0.9327 - - * Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (https://arxiv.org/abs/1803.02349) Table 5. AUC-ROC across various implementations (electronics)
  • 101. Overall results (books) All products Seen products only Runtime (min) PyTorch MF 0.4996 - 1353.12 Gensim w2v 0.9701 0.9892 16.24 PyTorch w2v 0.9775 - 122.66 PyTorch w2v with side info NA NA NA PyTorch MF with sequences 0.7196 - 1393.08 Table 6. AUC-ROC across various implementations (books)
  • 102. § Don’t just look at numeric metrics—plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) § Matrix Factorization is an okay-ish baseline § Word2vec is a great baseline § Training on sequences is epic
  • 103. § Don’t just look at numeric metrics—plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) § Matrix Factorization is an okay-ish baseline § Word2vec is a great baseline § Training on sequences is epic
  • 104. § Don’t just look at numeric metrics—plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) § Matrix Factorization is an okay-ish baseline § Word2vec is a great baseline § Training on sequences is epic
  • 105. § Don’t just look at numeric metrics—plot some curves! - Especially if you need some arbitrary threshold (i.e., classification) § Matrix Factorization is an okay-ish baseline § Word2vec is a great baseline § Training on sequences is epic
  • 107. References McAuley, J., Targett, C., Shi, Q., & Van Den Hengel, A. (2015, August). Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 43-52). ACM. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014, August). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701-710). ACM. Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864). ACM.
  • 108. References Wang, J., Huang, P., Zhao, H., Zhang, Z., Zhao, B., & Lee, D. L. (2018, July). Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 839-848). ACM. Grbovic, M., & Cheng, H. (2018, July). Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 311-320). ACM. Wu, L. Y., Fisch, A., Chopra, S., Adams, K., Bordes, A., & Weston, J. (2018, April). Starspace: Embed all the things!. In Thirty-Second AAAI Conference on Artificial Intelligence. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations, https://eng.uber.com/uber-eats-graph-learning/, retrieved 10 Jan 2020