SlideShare a Scribd company logo
1 of 30
GRU4Rec v2
Recurrent neural networks with top-k gains for
session-based recommendations
Balázs Hidasi
@balazshidasi
At a glance
• Improvements on GRU4Rec [Hidasi et. al, 2015]
 Session-based recommendations with RNNs
• General
 Negative sampling strategies
 Loss function design
• Specific
 Constrained embeddings
 Implementation details
• Offline tests: up to 35% improvement
• Online tests & observations
GRU4Rec overview
Context: session-based recommendations
• User identification
 Feasibility
 Privacy
 Regulations
• User intent
 Disjoint sessions
o Need
o Situation (context)
o „Irregularities”
• Session-based recommendations
• Permanent user cold-start
Preliminaries
• Next click prediction
• Top-N recommendation (ranking)
• Implicit feedback
Recurrent Neural Networks
• Basics
 Input: sequential information ( 𝑥 𝑡 𝑡=1
𝑇
)
 Hidden state (ℎ 𝑡):
o representation of the sequence so far
o influenced by every element of the sequence up to t
 ℎ 𝑡 = 𝑓 𝑊𝑥 𝑡 + 𝑈ℎ 𝑡−1 + 𝑏
• Gated RNNs (GRU, LSTM & others)
 Basic RNN is subject to the exploding/vanishing gradient problem
 Use ℎ 𝑡 = ℎ 𝑡−1 + Δ 𝑡 instead of rewriting the hidden state
 Information flow is controlled by gates
• Gated Recurrent Unit (GRU)
 Update gate (z)
 Reset gate (r)
 𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧
 𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟
 ℎ 𝑡 = tanh 𝑊𝑥 𝑡 + 𝑟𝑡 ∘ 𝑈ℎ 𝑡−1 + 𝑏
 ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ ℎ 𝑡
ℎ
r
IN
OUT
z
+
GRU4Rec
• GRU trained on session data, adapted to the
recommendation task
 Input: current item ID
 Hidden state: session representation
 Output: likelihood of being the next item
• Session-parallel mini-batches
 Mini-batch is defined over sessions
 Update with one step BPTT
o Lots of sessions are very short
o 2D mini-batching, updating on longer sequences (with
or without padding) didn’t improve accuracy
• Output sampling
 Computing scores for all items (100K – 1M) in every
step is slow
 One positive item (target) + several samples
 Fast solution: scores on mini-batch targets
o Items of the other mini-batch are negative samples for
the current mini-batch
• Loss functions: cross-entropy, BPR, TOP1
GRU layer
One-hot vector
Weighted output
Scores on items
f()
One-hot vector
ItemID (next)
ItemID
Negative sampling & loss
function design
Negative sampling
• Training step
 Score all items
 Push target items forward (modify model parameters)
• Many training steps & many items
 Not scalable
 𝑂 𝑆𝐼 𝑁+
• Sample negative examples instead  𝑂 𝐾𝑁+
• Sampling probability
 Must be quick to calculate
 Two popular choices
o Uniform  many unnecessary steps
o Proportional to support  better in practice, fast start, some
relations are not examined enough
 Optimal choice
o Data dependent
o Changing during training might help
Mini-batch based negative sampling
• Target items of other examples
from the mini-batch  as
negative samples
• Pros
 Efficient & simple implementation
on GPU
 Sampling probability proportional
to support
• Cons
 Number of samples is tied to the
batch size
o Mini-batch training: smaller
batches
o Negative sampling: larger
batches
 Sampling probability is always the
same
𝑖1,1 𝑖1,2 𝑖1,3 𝑖1,4
𝑖2,1 𝑖2,2 𝑖2,3
𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6
𝑖4,1 𝑖4,2
𝑖5,1 𝑖5,2 𝑖5,3
Session1
Session2
Session3
Session4
Session5
𝑖1,1 𝑖1,2 𝑖1,3
𝑖2,1 𝑖2,2
𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5
𝑖4,1
𝑖5,1 𝑖5,2
Input
Target
…
𝑖1,2 𝑖1,3 𝑖1,4
𝑖2,2 𝑖2,3
𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6
𝑖4,2
𝑖5,2 𝑖5,3
…
𝑖1 𝑖5 𝑖8
𝑦1
1
𝑦2
1
𝑦3
1 𝑦4
1
𝑦5
1
𝑦6
1
𝑦7
1
𝑦8
1
𝑦1
3
𝑦2
3
𝑦3
3
𝑦4
3
𝑦5
3
𝑦6
3
𝑦7
3
𝑦8
3
𝑦1
2
𝑦2
2
𝑦3
2 𝑦4
2
𝑦5
2
𝑦6
2
𝑦7
2
𝑦8
2
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0
Solution: add more samples!
• Add extra samples
• Shared between mini-batches
• Sampling probability: 𝑝𝑖 ∼ supp 𝛼
 𝛼 = 0  uniform
 𝛼 = 1  popularity based
• Implementation trick
 Sampling interrupts GPU computations
 More efficient in parallel
 Sample store (cache)
o Precompute 10-100M samples
o Resample is we used them all
Another look the loss functions
• Listwise losses on the target+negative samples
 Cross-entropy + softmax
o Cross-entropy in itself is pointwise
o Target should have the maximal score
o 𝑋𝐸 = − log 𝑠𝑖 , 𝑠𝑖 =
𝑒 𝑟 𝑖
𝑗=1
𝐾 𝑒
𝑟 𝑗
o Unstable in previous implementation
– Rounding errors
– Fixed
 Average BPR-score
o BPR in itself is pairwise
o Target should have higher score than all negative samples
o 𝐵𝑃𝑅 = −
1
𝐾 𝑗=1
𝐾
log 𝜎 𝑟𝑖 − 𝑟𝑗
 TOP1 score
o Heuristic loss, idea is similar to the average BPR
o Score regularization part
o 𝑇𝑂𝑃1 =
1
𝐾 𝑗=1
𝐾
𝜎 𝑟𝑗 − 𝑟𝑖 + 𝜎 𝑟𝑗
2
• Earlier results
 Similar performance
 TOP1 is slightly better
Unexpected behaviour of pairwise losses
• Unexpected behaviour
 Several negative samples  good results
 Many negative samples  bad results
• Gradient (BPR wrt. target score)

𝜕𝐿 𝑖
𝜕𝑟 𝑖
=
1
𝐾 𝑗=1
𝐾
1 − 𝜎 𝑟𝑖 − 𝑟𝑗
• Irrelevant negative sample
 Whose score is already lower than 𝑟𝑖
o Changes during training
 Doesn’t contribute to optimization: 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 ∼ 0
 Number of irrelevant samples increases as training progresses
• Averaging with many negative samples  gradient vanishes
 Target will be pushed up for a while
 Slows down as approaches the top
 More samples: slows down earlier
Pairwise-max loss functions
• The target score should be higher than the maximum
score amongst the negative samples
• BPR-max
 𝑃 𝑟𝑖 > 𝑟MAX|𝜃 = 𝑗=1
𝐾
𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 𝑃 𝑟𝑗 = 𝑟MAX|𝜃
 Use continuous approximations
o 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 = 𝜎 𝑟𝑖 − 𝑟𝑗
o 𝑃 𝑟𝑗 = 𝑟MAX|𝜃 = softmax 𝑟𝑘 𝑘=1
𝐾
=
𝑒
𝑟 𝑗
𝑘=1
𝐾 𝑒 𝑟 𝑘
= 𝑠𝑗
– Softmax over negative samples only
 Minimize negative log probability
 Add ℓ2 score regularization
• 𝐿𝑖 = − log 𝑗=1
𝐾
𝑠𝑗 𝜎 𝑟𝑖 − 𝑟𝑗 + 𝜆 𝑗=1
𝐾
𝑟𝑗
2
Gradient of pairwise max losses
• BPR-max (wrt. 𝑟𝑖)

𝜕𝐿 𝑖
𝜕𝑟 𝑖
=
𝑗=1
𝐾
𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 1−𝜎 𝑟 𝑖−𝑟 𝑗
𝑗=1
𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗
 Weighted average of BPR gradients
o Relative importance of samples:
𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗
𝑠 𝑘 𝜎 𝑟 𝑖−𝑟 𝑘
=
𝑒
𝑟 𝑗+𝑒
𝑟 𝑗+𝑟 𝑘−𝑟 𝑖
𝑒 𝑟 𝑘+𝑒
𝑟 𝑗+𝑟 𝑘−𝑟 𝑖
o Smoothed softmax
– If 𝑟𝑖 ≫ max 𝑟𝑗, 𝑟𝑘  behaves like softmax
– Stronger smoothing otherwise
– Uniform  softmax as 𝑟𝑖 is pushed to the top
Number of samples: training times &
performance
• Training times on GPU
don’t increase until the
parallelization limit is
reached
• Around the same place
• Significant improvements
up to a certain point
• Diminishing returns after
The effect of the α parameter
• Data & loss dependent
 Cross-entropy: favours lower values
 BPR-max: 0.5 is usually a good choice (data dependent)
 There is always a popularity based part of the samples
o Original mini-batch examples
o Removing these will result in higher optimal 𝛼
• CLASS dataset (cross-entropy, BPR-max)
Constrained embeddings &
offline tests
Unified item representations in GRU4Rec (1/2)
• Hidden state x 𝑊𝑦  scores
 One vector for each item  „item feature matrix”
• Embedding
 A vector for each item  another „item feature matrix”
 Can be used as the input of the GRU layer instead of the one-hot
vector
 Slightly decreases offline metrics
• Constrained embeddings (unified item representations)
 Use the same matrix for both input and output embedding
 Unified representations  faster convergence
 Embedding size tied to hidden the size of the last hidden state
Unified item representations in GRU4Rec (2/2)
• Model size: largest components scale with the items
 One-hot input: 𝑋
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
 Embedding input: 𝑋/2
o 𝐸, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
 𝑆 𝐸 × 𝑆 𝐻
o 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
 Constrained embedding: 𝑋/4
o 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻
o 𝑊𝑥
0
, 𝑊𝑟
0
, 𝑊𝑧
0
, 𝑈ℎ
0
, 𝑈𝑟
0
, 𝑈𝑧
0
 𝑆 𝐻 × 𝑆 𝐻
GRU
layer
H
ItemID
dot(𝐻,𝑊𝑦)
scores
GRU
layer
H
ItemIDs
dot(𝐻,𝑊𝑦)
scores
E[ItemIDs
]
embedding
GRU
layer
H
ItemIDs
dot(𝐻,𝑊𝑦)
scores
𝑊𝑦[ItemIDs]
embedding
Offline results
• Over item-kNN
 +25-52% in recall@20
 +35-55% in MRR@20
• Over the original GRU4Rec
 +18-35% in recall@20
 +27-37% in MRR@20
• BPR-max vs. (fixed) cross-entropy
 +2-6% improvement in 2 of 4 cases
 No statistically significant difference in the other 2 case
• Constrained embedding
 Most cases: slightly worse MRR & better recall
 Huge improvements on the CLASS datasset (+18.74% in recall, +29.44% in MRR)
Online A/B tests
A/B test - video service (1/3)
• Setup
 Video page
 Recommended videos on a strip
 Autoplay functionality
 Recommendations are NOT recomputed if the user clicks on any of
the recommended videos or autoplay loads a new video
 User based A/B split
• Algorithms
 Original algorithm: previous recommendation logic
 GRU4Rec next best: N guesses for the next item
 GRU4Rec sequence: sequence of length N as the continuation of
the current session
o Greedy generation
A/B test - video service (2/3)
• Technical details
 User based A/B split
 GRU4Rec serving from a single GPU using a single thread
o Score computations for ~500K items in 1-2ms (next best)
 Constant retraining
o GRU4Rec: ~1.5 hours on ~30M events (including data collection and
preprocessing)
o Original logic: different parts with different frequency
 GRU4Rec falls back to the original logic if it can’t recommend or times
out
• KPIs (relative to the number of recommendation requests)
 Watch time
 Videos played
 Recommendations clicked
• Bots and power users are excluded from the KPI computations
A/B test - video service (3/3)
A/B test – long term effects (1/3)
• Watch time
0.96
0.98
1
1.02
1.04
1.06
Rel.improvement
(watchtime/rec)
Next best
Sequence
Watchtime/rec
Baseline
Next best
Sequence
A/B test – long term effects (2/3)
• GRU4Rec: strong generalization capabilities
 Finds hidden gems
 Unlike counting based approaches
o Not obvious in offline only testing
• Feedback loop
 Baseline trains also sees the feedback generated for recommendations of other groups
 Learns how to recommend hidden gems
• GRU4Rec maintains some lead
 New items are constantly uploaded
• Comparison of different countries
Watchtime/rec
(increase)
A/B test – long term effects (3/3)
• Videos played
 Next best and sequence switched places
 Sequence mode: great for episodic content, can suffer
otherwise
 Next best mode: more diverse, better for non-episodic
content
 Feedback loop: next best learns some of the episodic
recommendations
1
1.01
1.02
1.03
1.04
1.05
1.06
Relativeimprovement
(videosplayed/rec)
Next best
Sequence
A/B test - online marketplace
• Differences in setup
 On home page
 Next best mode only
 KPI: CTR
• Items have limited lifespan
 Will GRU4Rec keep its 19-20% lead?
CTR
Baseline
GRU4Rec
Thank you!
Q&A
Check out the code (free for research):
https://github.com/hidasib/GRU4Rec
Read the preprint: https://arxiv.org/abs/1706.03847

More Related Content

What's hot

RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemEhsan38
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?blueace
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksZimin Park
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Roelof van Zwol
 

What's hot (20)

RecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender SystemRecSysOps: Best Practices for Operating a Large-Scale Recommender System
RecSysOps: Best Practices for Operating a Large-Scale Recommender System
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Session-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networksSession-based recommendations with recurrent neural networks
Session-based recommendations with recurrent neural networks
 
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020 Marketplace in motion - AdKDD keynote - 2020
Marketplace in motion - AdKDD keynote - 2020
 

Similar to GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Recommendations

Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi
 
A deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouA deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouAnne(Lijie) Zhou
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102Linaro
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1khairulhuda242
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentGiuliana Carullo
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisGabor Szabo, CQE
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxSyed Zaid Irshad
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...ActiveEon
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learningSeungHyeok Baek
 

Similar to GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Recommendations (20)

Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...Deep learning to the rescue - solving long standing problems of recommender ...
Deep learning to the rescue - solving long standing problems of recommender ...
 
A deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhouA deep learning approach for twitter spam detection lijie zhou
A deep learning approach for twitter spam detection lijie zhou
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102BKK16-300 Benchmarking 102
BKK16-300 Benchmarking 102
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1Week 12 Dimensionality Reduction Bagian 1
Week 12 Dimensionality Reduction Bagian 1
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Parallel DNA Sequence Alignment
Parallel DNA Sequence AlignmentParallel DNA Sequence Alignment
Parallel DNA Sequence Alignment
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems Analysis
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
Online Stochastic Tensor Decomposition for Background Subtraction in Multispe...
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 

More from Balázs Hidasi

Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelEgyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelBalázs Hidasi
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityBalázs Hidasi
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Balázs Hidasi
 
Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Balázs Hidasi
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorizationBalázs Hidasi
 
Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Balázs Hidasi
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Balázs Hidasi
 
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Balázs Hidasi
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...Balázs Hidasi
 
Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Balázs Hidasi
 
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)Balázs Hidasi
 
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)Balázs Hidasi
 
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)Balázs Hidasi
 

More from Balázs Hidasi (16)

Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségévelEgyedi termék kreatívok tömeges gyártása generatív AI segítségével
Egyedi termék kreatívok tömeges gyártása generatív AI segítségével
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
Parallel Recurrent Neural Network Architectures for Feature-rich Session-base...
 
Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...Context aware factorization methods for implicit feedback based recommendatio...
Context aware factorization methods for implicit feedback based recommendatio...
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...Approximate modeling of continuous context in factorization algorithms (CaRR1...
Approximate modeling of continuous context in factorization algorithms (CaRR1...
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
Az implicit ajánlási probléma és néhány megoldása (BME TMIT szeminárium előad...
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
 
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
iTALS: implicit tensor factorization for context-aware recommendations (ECML/...
 
Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)Initialization of matrix factorization (CaRR 2012 presentation)
Initialization of matrix factorization (CaRR 2012 presentation)
 
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
ShiftTree: model alapú idősor-osztályozó (VK 2009 előadás)
 
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
ShiftTree: model alapú idősor-osztályozó (ML@BP előadás, 2012)
 
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
ShiftTree: model based time series classifier (ECML/PKDD 2011 presentation)
 

Recently uploaded

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfTukamushabaBismark
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 

Recently uploaded (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 

GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Recommendations

  • 1. GRU4Rec v2 Recurrent neural networks with top-k gains for session-based recommendations Balázs Hidasi @balazshidasi
  • 2. At a glance • Improvements on GRU4Rec [Hidasi et. al, 2015]  Session-based recommendations with RNNs • General  Negative sampling strategies  Loss function design • Specific  Constrained embeddings  Implementation details • Offline tests: up to 35% improvement • Online tests & observations
  • 4. Context: session-based recommendations • User identification  Feasibility  Privacy  Regulations • User intent  Disjoint sessions o Need o Situation (context) o „Irregularities” • Session-based recommendations • Permanent user cold-start
  • 5. Preliminaries • Next click prediction • Top-N recommendation (ranking) • Implicit feedback
  • 6. Recurrent Neural Networks • Basics  Input: sequential information ( 𝑥 𝑡 𝑡=1 𝑇 )  Hidden state (ℎ 𝑡): o representation of the sequence so far o influenced by every element of the sequence up to t  ℎ 𝑡 = 𝑓 𝑊𝑥 𝑡 + 𝑈ℎ 𝑡−1 + 𝑏 • Gated RNNs (GRU, LSTM & others)  Basic RNN is subject to the exploding/vanishing gradient problem  Use ℎ 𝑡 = ℎ 𝑡−1 + Δ 𝑡 instead of rewriting the hidden state  Information flow is controlled by gates • Gated Recurrent Unit (GRU)  Update gate (z)  Reset gate (r)  𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧  𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟  ℎ 𝑡 = tanh 𝑊𝑥 𝑡 + 𝑟𝑡 ∘ 𝑈ℎ 𝑡−1 + 𝑏  ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ ℎ 𝑡 ℎ r IN OUT z +
  • 7. GRU4Rec • GRU trained on session data, adapted to the recommendation task  Input: current item ID  Hidden state: session representation  Output: likelihood of being the next item • Session-parallel mini-batches  Mini-batch is defined over sessions  Update with one step BPTT o Lots of sessions are very short o 2D mini-batching, updating on longer sequences (with or without padding) didn’t improve accuracy • Output sampling  Computing scores for all items (100K – 1M) in every step is slow  One positive item (target) + several samples  Fast solution: scores on mini-batch targets o Items of the other mini-batch are negative samples for the current mini-batch • Loss functions: cross-entropy, BPR, TOP1 GRU layer One-hot vector Weighted output Scores on items f() One-hot vector ItemID (next) ItemID
  • 8. Negative sampling & loss function design
  • 9. Negative sampling • Training step  Score all items  Push target items forward (modify model parameters) • Many training steps & many items  Not scalable  𝑂 𝑆𝐼 𝑁+ • Sample negative examples instead  𝑂 𝐾𝑁+ • Sampling probability  Must be quick to calculate  Two popular choices o Uniform  many unnecessary steps o Proportional to support  better in practice, fast start, some relations are not examined enough  Optimal choice o Data dependent o Changing during training might help
  • 10. Mini-batch based negative sampling • Target items of other examples from the mini-batch  as negative samples • Pros  Efficient & simple implementation on GPU  Sampling probability proportional to support • Cons  Number of samples is tied to the batch size o Mini-batch training: smaller batches o Negative sampling: larger batches  Sampling probability is always the same 𝑖1,1 𝑖1,2 𝑖1,3 𝑖1,4 𝑖2,1 𝑖2,2 𝑖2,3 𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6 𝑖4,1 𝑖4,2 𝑖5,1 𝑖5,2 𝑖5,3 Session1 Session2 Session3 Session4 Session5 𝑖1,1 𝑖1,2 𝑖1,3 𝑖2,1 𝑖2,2 𝑖3,1 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖4,1 𝑖5,1 𝑖5,2 Input Target … 𝑖1,2 𝑖1,3 𝑖1,4 𝑖2,2 𝑖2,3 𝑖3,2 𝑖3,3 𝑖3,4 𝑖3,5 𝑖3,6 𝑖4,2 𝑖5,2 𝑖5,3 … 𝑖1 𝑖5 𝑖8 𝑦1 1 𝑦2 1 𝑦3 1 𝑦4 1 𝑦5 1 𝑦6 1 𝑦7 1 𝑦8 1 𝑦1 3 𝑦2 3 𝑦3 3 𝑦4 3 𝑦5 3 𝑦6 3 𝑦7 3 𝑦8 3 𝑦1 2 𝑦2 2 𝑦3 2 𝑦4 2 𝑦5 2 𝑦6 2 𝑦7 2 𝑦8 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0
  • 11. Solution: add more samples! • Add extra samples • Shared between mini-batches • Sampling probability: 𝑝𝑖 ∼ supp 𝛼  𝛼 = 0  uniform  𝛼 = 1  popularity based • Implementation trick  Sampling interrupts GPU computations  More efficient in parallel  Sample store (cache) o Precompute 10-100M samples o Resample is we used them all
  • 12. Another look the loss functions • Listwise losses on the target+negative samples  Cross-entropy + softmax o Cross-entropy in itself is pointwise o Target should have the maximal score o 𝑋𝐸 = − log 𝑠𝑖 , 𝑠𝑖 = 𝑒 𝑟 𝑖 𝑗=1 𝐾 𝑒 𝑟 𝑗 o Unstable in previous implementation – Rounding errors – Fixed  Average BPR-score o BPR in itself is pairwise o Target should have higher score than all negative samples o 𝐵𝑃𝑅 = − 1 𝐾 𝑗=1 𝐾 log 𝜎 𝑟𝑖 − 𝑟𝑗  TOP1 score o Heuristic loss, idea is similar to the average BPR o Score regularization part o 𝑇𝑂𝑃1 = 1 𝐾 𝑗=1 𝐾 𝜎 𝑟𝑗 − 𝑟𝑖 + 𝜎 𝑟𝑗 2 • Earlier results  Similar performance  TOP1 is slightly better
  • 13. Unexpected behaviour of pairwise losses • Unexpected behaviour  Several negative samples  good results  Many negative samples  bad results • Gradient (BPR wrt. target score)  𝜕𝐿 𝑖 𝜕𝑟 𝑖 = 1 𝐾 𝑗=1 𝐾 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 • Irrelevant negative sample  Whose score is already lower than 𝑟𝑖 o Changes during training  Doesn’t contribute to optimization: 1 − 𝜎 𝑟𝑖 − 𝑟𝑗 ∼ 0  Number of irrelevant samples increases as training progresses • Averaging with many negative samples  gradient vanishes  Target will be pushed up for a while  Slows down as approaches the top  More samples: slows down earlier
  • 14. Pairwise-max loss functions • The target score should be higher than the maximum score amongst the negative samples • BPR-max  𝑃 𝑟𝑖 > 𝑟MAX|𝜃 = 𝑗=1 𝐾 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 𝑃 𝑟𝑗 = 𝑟MAX|𝜃  Use continuous approximations o 𝑃 𝑟𝑖 > 𝑟𝑗 𝑟𝑗 = 𝑟MAX, 𝜃 = 𝜎 𝑟𝑖 − 𝑟𝑗 o 𝑃 𝑟𝑗 = 𝑟MAX|𝜃 = softmax 𝑟𝑘 𝑘=1 𝐾 = 𝑒 𝑟 𝑗 𝑘=1 𝐾 𝑒 𝑟 𝑘 = 𝑠𝑗 – Softmax over negative samples only  Minimize negative log probability  Add ℓ2 score regularization • 𝐿𝑖 = − log 𝑗=1 𝐾 𝑠𝑗 𝜎 𝑟𝑖 − 𝑟𝑗 + 𝜆 𝑗=1 𝐾 𝑟𝑗 2
  • 15. Gradient of pairwise max losses • BPR-max (wrt. 𝑟𝑖)  𝜕𝐿 𝑖 𝜕𝑟 𝑖 = 𝑗=1 𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 1−𝜎 𝑟 𝑖−𝑟 𝑗 𝑗=1 𝐾 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗  Weighted average of BPR gradients o Relative importance of samples: 𝑠 𝑗 𝜎 𝑟 𝑖−𝑟 𝑗 𝑠 𝑘 𝜎 𝑟 𝑖−𝑟 𝑘 = 𝑒 𝑟 𝑗+𝑒 𝑟 𝑗+𝑟 𝑘−𝑟 𝑖 𝑒 𝑟 𝑘+𝑒 𝑟 𝑗+𝑟 𝑘−𝑟 𝑖 o Smoothed softmax – If 𝑟𝑖 ≫ max 𝑟𝑗, 𝑟𝑘  behaves like softmax – Stronger smoothing otherwise – Uniform  softmax as 𝑟𝑖 is pushed to the top
  • 16. Number of samples: training times & performance • Training times on GPU don’t increase until the parallelization limit is reached • Around the same place • Significant improvements up to a certain point • Diminishing returns after
  • 17. The effect of the α parameter • Data & loss dependent  Cross-entropy: favours lower values  BPR-max: 0.5 is usually a good choice (data dependent)  There is always a popularity based part of the samples o Original mini-batch examples o Removing these will result in higher optimal 𝛼 • CLASS dataset (cross-entropy, BPR-max)
  • 19. Unified item representations in GRU4Rec (1/2) • Hidden state x 𝑊𝑦  scores  One vector for each item  „item feature matrix” • Embedding  A vector for each item  another „item feature matrix”  Can be used as the input of the GRU layer instead of the one-hot vector  Slightly decreases offline metrics • Constrained embeddings (unified item representations)  Use the same matrix for both input and output embedding  Unified representations  faster convergence  Embedding size tied to hidden the size of the last hidden state
  • 20. Unified item representations in GRU4Rec (2/2) • Model size: largest components scale with the items  One-hot input: 𝑋 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0 , 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻  Embedding input: 𝑋/2 o 𝐸, 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0  𝑆 𝐸 × 𝑆 𝐻 o 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻  Constrained embedding: 𝑋/4 o 𝑊𝑦  𝑆𝐼 × 𝑆 𝐻 o 𝑊𝑥 0 , 𝑊𝑟 0 , 𝑊𝑧 0 , 𝑈ℎ 0 , 𝑈𝑟 0 , 𝑈𝑧 0  𝑆 𝐻 × 𝑆 𝐻 GRU layer H ItemID dot(𝐻,𝑊𝑦) scores GRU layer H ItemIDs dot(𝐻,𝑊𝑦) scores E[ItemIDs ] embedding GRU layer H ItemIDs dot(𝐻,𝑊𝑦) scores 𝑊𝑦[ItemIDs] embedding
  • 21. Offline results • Over item-kNN  +25-52% in recall@20  +35-55% in MRR@20 • Over the original GRU4Rec  +18-35% in recall@20  +27-37% in MRR@20 • BPR-max vs. (fixed) cross-entropy  +2-6% improvement in 2 of 4 cases  No statistically significant difference in the other 2 case • Constrained embedding  Most cases: slightly worse MRR & better recall  Huge improvements on the CLASS datasset (+18.74% in recall, +29.44% in MRR)
  • 23. A/B test - video service (1/3) • Setup  Video page  Recommended videos on a strip  Autoplay functionality  Recommendations are NOT recomputed if the user clicks on any of the recommended videos or autoplay loads a new video  User based A/B split • Algorithms  Original algorithm: previous recommendation logic  GRU4Rec next best: N guesses for the next item  GRU4Rec sequence: sequence of length N as the continuation of the current session o Greedy generation
  • 24. A/B test - video service (2/3) • Technical details  User based A/B split  GRU4Rec serving from a single GPU using a single thread o Score computations for ~500K items in 1-2ms (next best)  Constant retraining o GRU4Rec: ~1.5 hours on ~30M events (including data collection and preprocessing) o Original logic: different parts with different frequency  GRU4Rec falls back to the original logic if it can’t recommend or times out • KPIs (relative to the number of recommendation requests)  Watch time  Videos played  Recommendations clicked • Bots and power users are excluded from the KPI computations
  • 25. A/B test - video service (3/3)
  • 26. A/B test – long term effects (1/3) • Watch time 0.96 0.98 1 1.02 1.04 1.06 Rel.improvement (watchtime/rec) Next best Sequence Watchtime/rec Baseline Next best Sequence
  • 27. A/B test – long term effects (2/3) • GRU4Rec: strong generalization capabilities  Finds hidden gems  Unlike counting based approaches o Not obvious in offline only testing • Feedback loop  Baseline trains also sees the feedback generated for recommendations of other groups  Learns how to recommend hidden gems • GRU4Rec maintains some lead  New items are constantly uploaded • Comparison of different countries Watchtime/rec (increase)
  • 28. A/B test – long term effects (3/3) • Videos played  Next best and sequence switched places  Sequence mode: great for episodic content, can suffer otherwise  Next best mode: more diverse, better for non-episodic content  Feedback loop: next best learns some of the episodic recommendations 1 1.01 1.02 1.03 1.04 1.05 1.06 Relativeimprovement (videosplayed/rec) Next best Sequence
  • 29. A/B test - online marketplace • Differences in setup  On home page  Next best mode only  KPI: CTR • Items have limited lifespan  Will GRU4Rec keep its 19-20% lead? CTR Baseline GRU4Rec
  • 30. Thank you! Q&A Check out the code (free for research): https://github.com/hidasib/GRU4Rec Read the preprint: https://arxiv.org/abs/1706.03847