SlideShare ist ein Scribd-Unternehmen logo
1 von 27
DM
Session-Based Recommendations with Recurrent
Neural Networks
(Balazs Hidasi, Alexandros Karatzoglou et al)
1
DM목차
 Backgrounds
 Factor model based approach in recommender system
 Neighborhood approach in recommender system
 Recurrent Neural Networks and GRU(Gated Recurrent Unit)
 RNN in session based recommender system
 Structure of proposed model
 Session based mini-batch
 Ranking Loss
 Experiments and Discussion
 Conclusion
 What makes this paper great?
2
DM
Backgrounds
3
DMFactor model based recommender system
Represent users and items in latent space numerically
EX)
 Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇
 Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇
Targets(what we want to predict) are calculated using numerically
represented using user, item, and other contents information.
EX)
 Predicted rating of user U on item I
𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇 𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035
4
DMNeighborhood based recommender system
 Rating of user u on item I is calculated by how user u’s neighbors rated item I
 determining neighborhood of users is important
 finding similar users given an user is big issue
 Targets are calculated by weighted normalized sum of rating of neighbors where similarity is used as weight.
EX)
 Predicted rating of user U on item I
𝑟𝑢,𝑖 ==
𝑢𝑠𝑒𝑟𝑠(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ∗ 𝑟𝑎𝑡𝑖𝑛𝑔 𝑢𝑠𝑒𝑟)
𝑢𝑠𝑒𝑟𝑠 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
=
0.7 ∗ 3 + 0.3 ∗ 4 + 0.05 ∗ 2
0.7 + 0.3 + 0.05
= 3.24
5
User similarity with user U Rating on item I
U 70% 3
B 30% 4
C 5% 2
DMLimits of factor model in session-based recommender
system
Same user in different session are classified as different user.
 Hard to construct user-profile
 Lack of user-profile
Neighborhood based recommender system still works
 Computing similarities between items are based on co-occurrences of items in
sessions(user profiles).
In session-based recommender system, Neighborhood methods are
used extensively.
6
DMRecurrent Neural Network, GRU
Recurrent Neural network is a kind of network that
sequential input
 text sentences, series of actions of a user on web gives
arbitrary goals (mainly next element/action in sequential data)
 sentiment of given sentence (given text sentence)
 Which page a user will visit next (given actions on web)
Gated Recurrent Unit(GRU) (Cho et al., 2014)
 Designed to solve gradient vanishing/exploding problem in RNN like LSTM
 Faster training than LSTM because it have lower number of parameters than LSTM
7
DMAbstract view of Recurrent Neural Network (1)
 RNN layer takes two input
 Given input
 Hidden state from previous state (initially zero)
 Input hidden state determines the state of RNN layer
 RNN layer gives two output
 Output
 Hidden state to next state.
 RNN can span arbitrary length.
 We can train using only last output, whole
output sequence, or some of them.
8
Input 1
Output 1
RNN layerℎ1
Initially zero
Input 2
Output 2
RNN layer
ℎ2
DM
Abstract view of Recurrent Neural Network (2)
Training RNN (1)
EX)
 𝒍𝒐𝒔𝒔 = 𝒊 𝒚𝒊 − 𝒐𝒊
𝟐
9
Input 1
𝑜1
RNNℎ1
Input 2
𝑜2
RNN
…
Input n
𝑜 𝑘
RNN
𝑦1 𝑦2 𝑦 𝑘
…
…
…
ℎ2
DM
Abstract view of Recurrent Neural Network (3)
Training RNN (2)
EX)
 𝒍𝒐𝒔𝒔 = 𝒚 − 𝒐 𝒌
𝟐
10
Input 1
𝑜1
RNNℎ1
Input 2
𝑜2
RNN
…
Input n
𝑜 𝑘
RNN
𝑦
…
…
Discard 𝑜1, 𝑜2, … , 𝑜 𝑘−1
ℎ2
DM
Abstract view of Recurrent Neural Network (4)
11
Input 1
RNN 1ℎ1
Input 2
RNN 1
…
Input n
RNN 1
…
…
RNN 2 RNN 2 RNN 2ℎ1
ℎ2
…RNN 3 RNN 3 RNN 3ℎ1 Deep RNN
… … … …
ℎ2
ℎ2
𝑜1
𝑜1
𝑜2
𝑜2 𝑜 𝑘
𝑜 𝑘
DMgradient vanishing/exploding in Deep Learning
presume we are familiar with basic linear algebra
 Repeated matrix-vector multiplication can be dangerous.
𝑾𝒙 𝒕 = 𝒙 𝒕+𝟏
 Suppose that 𝒙0 = 𝒗 𝟏 + 𝒗 𝟐 + ⋯ 𝒗 𝒌, 𝒘𝒉𝒆𝒓𝒆 𝒗 𝟏, 𝒗 𝟐, . . 𝒗 𝒌 are eigenvectors of 𝑾
 This is True for most cases.
𝑾𝒙 𝟎 = 𝝀 𝟏 𝒗 𝟏 + 𝝀 𝟐 𝒗 𝟐 + … + 𝝀 𝒌 𝒗 𝒌 , where 𝝀 𝒌 is eigenvalue of W
𝒙 𝒏+𝟏 = 𝑾𝒙 𝒏 = 𝝀 𝟏
𝒏
𝒗 𝟏 + 𝝀 𝟐
𝒏
𝒗 𝟐 + … + 𝝀 𝒌
𝒏
𝒗 𝒌
 If largest eigenvalue > 1, 𝒙 𝒏 goes to infinite
 If largest eigenvalue < 1, 𝒙 𝒏 goes to zero
 In both cases, training becomes infeasible
 LSTM, GRU and other variants of RNN are designed to solve this problem while preserving long term
dependencies.
12
DM
RNN in session based
recommender system
13
DMStructure of proposed model
 Input : Sequence of an user
 유저가 본 아이템 리스트
 𝑖1,𝑡1
, 𝑖1,𝑡2
, … 𝑖1,𝑡 𝑘
 Output :
 유저가 볼 아이템 리스트(의 확률 분포)
 𝑝1,𝑡2
, 𝑝1,𝑡3
, … 𝑝1,𝑡 𝑘+1
 𝑖1,𝑡1
는 item id
 𝑝1,𝑡2
는 유저 1이 𝑡2 시간에 볼 아이템의 확률
분포
Ex
𝟏 𝟐 𝟑 𝟒 𝟓
𝑝1,𝑡2
= 𝟎. 𝟐, 𝟎. 𝟑, 𝟎. 𝟏, 𝟎. 𝟏, 𝟎. 𝟑
이면 item 1을 볼 확률이 0.2, item 2를 볼 확
률이 0.3, item 3을 볼 확률이 0.1 …
14
Input :
One-
hot
Encod
ed
Vector
Embed
ding
Layer
GRU
Layer
GRU
Layer
GRU
Layer
…
Feedfo
rward
Layer
Output
:
scores
on
items
DMStructure of proposed model
Ont-hot vector
 The input vector of length equal
to the number of items and only
the elements corresponding to
the active item is one.
Embedding
 Assign a trainable vector for
every item.
 A model with embedding
performs worse
15
Input :
One-
hot
Encod
ed
Vector
Embed
ding
Layer
GRU
Layer
GRU
Layer
GRU
Layer
…
Feedfo
rward
Layer
Output
:
scores
on
items
DMTraining the model (1)
Training RNN using mini-batch.
EX)
1
3
(𝑜2 𝑖
− 𝑦2 𝑖)
16
Input 1
RNN layerℎ1
Input 2
RNN layer
Input 1
Inputs 1
Input 2
Inputs 2
ℎ1
ℎ1
ℎ2
ℎ2
ℎ2
𝑜2
𝑜2
𝑜2
𝑦2
𝑦2
𝑦2
DMTraining the model (2)
 For simplicity,
Model =
𝒇 𝒊𝒏𝒑𝒖𝒕, 𝒉𝒊𝒅𝒅𝒆𝒏 𝒔𝒕𝒂𝒕𝒆
= 𝒐𝒖𝒕𝒑𝒖𝒕, 𝒏𝒆𝒙𝒕 𝒉𝒊𝒅𝒅𝒆𝒏 𝒔𝒕𝒂𝒕𝒆
𝒇 𝒊 𝟎,𝟏, 𝒉 𝟏 = 𝟎 = 𝒐 𝟎,𝟐, 𝒉 𝟐
𝒐 𝟎,𝟐과 𝒊 𝟎,𝟐 로 패러미터 업데이트,
𝒇 𝒊 𝟎,𝒌, 𝒉 𝒌 = 𝟎 = 𝒐 𝟎,𝒌, 𝒉 𝒌
𝒐 𝟎,𝒌과 𝒊 𝟎,𝒌로 패러미터 업데이트
각 세션에 대해 𝒉 𝒌만 기억하고 있으면 여러
세션을 parallel하게 업데이트 할 수 있다.
17
DMTraining the model (3)
Designed to training various
length of sessions in parallel
Not to break sessions down into
fragments.
18
DMRanking Loss
Idea:
 explicitly force rating of positive sample higher than ratings of negative samples.
Ordinary goal
 𝑢 = 𝑎𝑛 user.
 𝑦 𝑢,𝑖 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑢𝑠𝑒𝑟 𝑢 𝑡𝑜 𝑙𝑖𝑘𝑒 𝑜𝑟 𝑠𝑒𝑒 𝑖𝑡𝑒𝑚 𝑖 , 𝑦 𝑢,𝑖 ∈ [0,1]
 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑖 𝑦 𝑢,𝑖 , i ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들
 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑗 𝑦 𝑢,𝑖 , j ∈ 유저 𝑢가 보지 않은 𝑖𝑡𝑒𝑚들
Goal of ranking loss
 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑖 𝑗(𝑦 𝑢,𝑖−𝑦 𝑢,𝑗)
 i ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들, j ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들
19
DM
Ranking loss functions used in the model
Bayesian Personalized Ranking
Loss(BPR)
𝑵 𝒔 ≔ 유저가 보지 않은 아이템 중 일부
𝒓 𝒔,𝒌 ≔ 세션 s가 item k를 봤는지 안 봤는지에 대한 예측 값, 0과 1 사이.
𝒊 ≔ 유저가 다음번에 실제로 본 값
𝝈 𝒙 =
𝟏
𝟏 + 𝒆𝒙𝒑 −𝒙
𝒍𝒐𝒔𝒔 = −
𝟏
𝐍 𝐬
𝒋∈𝑵 𝒔
𝒍𝒐𝒈 𝝈 𝒓 𝒔,𝒊 − 𝒓 𝒔,𝒋
Proposed loss function(TOP1)
𝑵 𝒔 ≔ 유저가 보지 않은 아이템 중 일부
𝒓 𝒔,𝒌 ≔ 세션 s가 item k를 봤는지 안 봤는지에 대한 예측 값, 0과 1 사이.
𝒊 ≔ 유저가 다음번에 실제로 본 값
𝝈 𝒙 =
𝟏
𝟏 + 𝒆𝒙𝒑 −𝒙
𝒍𝒐𝒔𝒔 = −
𝟏
𝐍 𝐬
𝒋∈𝑵 𝒔
𝝈 𝒓 𝒔,𝒋 − 𝒓 𝒔,𝒊 + 𝝈( 𝒓 𝒔,𝒋
𝟐
)
20
DM
Experiments and Discussion
21
DMDatsets
Dataset Recsys 2015(RCS15) OTT video(VIDEO)
# sessions 15,324 ~37k
# items 37,483 ~330k
# clicks 71,222 ~180k
22
Preprocessing
Erase items in test-set that do not appeared in train-set
Erase Sessions with length 1
Do not split session sequence into train-set and test-set
DM
Evaluation measure
Recall@k
 I want this answer : Cat
 But the computer says Your answer
might be one of [Chicken, Dog, Horse]
 Then Recall @ 3 = 0
 I want this answer ! : pizza
 But the computer says Your answer
might be one of [chicken, dog, pizza]
 Then Recall @ 3 = 1
 Computer는 candidates를 갖고 있고, 이
를 정렬한 뒤 top-k개를 뽑아 이 중에 정
답을 있으면 1, 아니면 0이 된다.
MRR@k
Computer tries to guess my hair
color. He have 3 chances, and
says Red(1st), Black(2nd),
Yellow(3rd).
My hair color is black.
 MRR @ 3 = 1 / 2 = 0.5
 Computer는 candidates를 갖고 있고,
이를 정렬했을 때 (top-k개를 뽑아) 정
답의 rank의 역수를 말한다.
 정답이 top-k개 밖이면 0 MRR@k = 0
23
DMRecall@20 and MMR@20 using baseline methods
Baseline RSC15 VIDEO
Recall@20 MRR@20 Recall@20 MRR@20
POP 0.0050 0.0012 0.0499 0.0117
S-POP 0.2672 0.1775 0.1301 0.0863
Item-KNN 0.5065 0.2048 0.5598 0.3381
BPR-MF 0.2574 0.0618 0.0692 0.0374
24
DMRecall@20 and MRR@20 for different types of single-layer
GRU
Loss function # Units
Length of
hidden state
vector(𝒉𝒊)
0.5777RSC15 VIDEO
Recall@20 MRR@20 Recall@20 MRR@20
TOP1 100 0.5853 0.2305 0.6141 0.3511
BPR 100 0.6069 0.2407 0.5999 0.3260
Cross-Entropy 100 0.6074 0.2430 0.6372 0.3720
TOP1 1000 0.6206 0.2693 0.6624 0.3891
BPR 1000 0.6322 0.2467 0.6311 0.3136
Cross-Entropy 1000 0.5777 0.2153 N/A N/A
25
DMDiscussion
Larger hidden-state(unit) gives better performance.
 at 100 < at 1000 < at 10^4
Pointwise-loss is unstable
 I do not understand this means ‘numerically unstable’, overflow or underflow, or
means that results is inconsistent.
Deeper GRU layer improves performance.
Embedding is not good for this model.
26
DMWhat makes this paper great?
New parallel training methods for training RNN (in recommender
system)
Devise new Ranking Loss (I think other models can exploit this loss
function)
Performance improvement
 20~25% performance improved compared to best baseline Item-KNN
Designing novel model framework to solve session-based
Recommender system problem using RNN
27

Weitere ähnliche Inhalte

Was ist angesagt?

Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
taeseon ryu
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 

Was ist angesagt? (20)

Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Deep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
 
Machine Learning Lecture 2 Basics
Machine Learning Lecture 2 BasicsMachine Learning Lecture 2 Basics
Machine Learning Lecture 2 Basics
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network  Prediction of Exchange Rate Using Deep Neural Network
Prediction of Exchange Rate Using Deep Neural Network
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Week 3 Deep Learning And POS Tagging Hands-On
Week 3 Deep Learning And POS Tagging Hands-OnWeek 3 Deep Learning And POS Tagging Hands-On
Week 3 Deep Learning And POS Tagging Hands-On
 
What Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial IntelligenceWhat Deep Learning Means for Artificial Intelligence
What Deep Learning Means for Artificial Intelligence
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 

Ähnlich wie Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, Alexandros Karatzoglou et al)

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 

Ähnlich wie Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, Alexandros Karatzoglou et al) (20)

Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Methodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniquesMethodological study of opinion mining and sentiment analysis techniques
Methodological study of opinion mining and sentiment analysis techniques
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
New Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral RecognitionNew Approach of Preprocessing For Numeral Recognition
New Approach of Preprocessing For Numeral Recognition
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Illustrative Introductory Neural Networks
Illustrative Introductory Neural NetworksIllustrative Introductory Neural Networks
Illustrative Introductory Neural Networks
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 

Kürzlich hochgeladen

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Kürzlich hochgeladen (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, Alexandros Karatzoglou et al)

  • 1. DM Session-Based Recommendations with Recurrent Neural Networks (Balazs Hidasi, Alexandros Karatzoglou et al) 1
  • 2. DM목차  Backgrounds  Factor model based approach in recommender system  Neighborhood approach in recommender system  Recurrent Neural Networks and GRU(Gated Recurrent Unit)  RNN in session based recommender system  Structure of proposed model  Session based mini-batch  Ranking Loss  Experiments and Discussion  Conclusion  What makes this paper great? 2
  • 4. DMFactor model based recommender system Represent users and items in latent space numerically EX)  Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇  Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇 Targets(what we want to predict) are calculated using numerically represented using user, item, and other contents information. EX)  Predicted rating of user U on item I 𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇 𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035 4
  • 5. DMNeighborhood based recommender system  Rating of user u on item I is calculated by how user u’s neighbors rated item I  determining neighborhood of users is important  finding similar users given an user is big issue  Targets are calculated by weighted normalized sum of rating of neighbors where similarity is used as weight. EX)  Predicted rating of user U on item I 𝑟𝑢,𝑖 == 𝑢𝑠𝑒𝑟𝑠(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ∗ 𝑟𝑎𝑡𝑖𝑛𝑔 𝑢𝑠𝑒𝑟) 𝑢𝑠𝑒𝑟𝑠 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = 0.7 ∗ 3 + 0.3 ∗ 4 + 0.05 ∗ 2 0.7 + 0.3 + 0.05 = 3.24 5 User similarity with user U Rating on item I U 70% 3 B 30% 4 C 5% 2
  • 6. DMLimits of factor model in session-based recommender system Same user in different session are classified as different user.  Hard to construct user-profile  Lack of user-profile Neighborhood based recommender system still works  Computing similarities between items are based on co-occurrences of items in sessions(user profiles). In session-based recommender system, Neighborhood methods are used extensively. 6
  • 7. DMRecurrent Neural Network, GRU Recurrent Neural network is a kind of network that sequential input  text sentences, series of actions of a user on web gives arbitrary goals (mainly next element/action in sequential data)  sentiment of given sentence (given text sentence)  Which page a user will visit next (given actions on web) Gated Recurrent Unit(GRU) (Cho et al., 2014)  Designed to solve gradient vanishing/exploding problem in RNN like LSTM  Faster training than LSTM because it have lower number of parameters than LSTM 7
  • 8. DMAbstract view of Recurrent Neural Network (1)  RNN layer takes two input  Given input  Hidden state from previous state (initially zero)  Input hidden state determines the state of RNN layer  RNN layer gives two output  Output  Hidden state to next state.  RNN can span arbitrary length.  We can train using only last output, whole output sequence, or some of them. 8 Input 1 Output 1 RNN layerℎ1 Initially zero Input 2 Output 2 RNN layer ℎ2
  • 9. DM Abstract view of Recurrent Neural Network (2) Training RNN (1) EX)  𝒍𝒐𝒔𝒔 = 𝒊 𝒚𝒊 − 𝒐𝒊 𝟐 9 Input 1 𝑜1 RNNℎ1 Input 2 𝑜2 RNN … Input n 𝑜 𝑘 RNN 𝑦1 𝑦2 𝑦 𝑘 … … … ℎ2
  • 10. DM Abstract view of Recurrent Neural Network (3) Training RNN (2) EX)  𝒍𝒐𝒔𝒔 = 𝒚 − 𝒐 𝒌 𝟐 10 Input 1 𝑜1 RNNℎ1 Input 2 𝑜2 RNN … Input n 𝑜 𝑘 RNN 𝑦 … … Discard 𝑜1, 𝑜2, … , 𝑜 𝑘−1 ℎ2
  • 11. DM Abstract view of Recurrent Neural Network (4) 11 Input 1 RNN 1ℎ1 Input 2 RNN 1 … Input n RNN 1 … … RNN 2 RNN 2 RNN 2ℎ1 ℎ2 …RNN 3 RNN 3 RNN 3ℎ1 Deep RNN … … … … ℎ2 ℎ2 𝑜1 𝑜1 𝑜2 𝑜2 𝑜 𝑘 𝑜 𝑘
  • 12. DMgradient vanishing/exploding in Deep Learning presume we are familiar with basic linear algebra  Repeated matrix-vector multiplication can be dangerous. 𝑾𝒙 𝒕 = 𝒙 𝒕+𝟏  Suppose that 𝒙0 = 𝒗 𝟏 + 𝒗 𝟐 + ⋯ 𝒗 𝒌, 𝒘𝒉𝒆𝒓𝒆 𝒗 𝟏, 𝒗 𝟐, . . 𝒗 𝒌 are eigenvectors of 𝑾  This is True for most cases. 𝑾𝒙 𝟎 = 𝝀 𝟏 𝒗 𝟏 + 𝝀 𝟐 𝒗 𝟐 + … + 𝝀 𝒌 𝒗 𝒌 , where 𝝀 𝒌 is eigenvalue of W 𝒙 𝒏+𝟏 = 𝑾𝒙 𝒏 = 𝝀 𝟏 𝒏 𝒗 𝟏 + 𝝀 𝟐 𝒏 𝒗 𝟐 + … + 𝝀 𝒌 𝒏 𝒗 𝒌  If largest eigenvalue > 1, 𝒙 𝒏 goes to infinite  If largest eigenvalue < 1, 𝒙 𝒏 goes to zero  In both cases, training becomes infeasible  LSTM, GRU and other variants of RNN are designed to solve this problem while preserving long term dependencies. 12
  • 13. DM RNN in session based recommender system 13
  • 14. DMStructure of proposed model  Input : Sequence of an user  유저가 본 아이템 리스트  𝑖1,𝑡1 , 𝑖1,𝑡2 , … 𝑖1,𝑡 𝑘  Output :  유저가 볼 아이템 리스트(의 확률 분포)  𝑝1,𝑡2 , 𝑝1,𝑡3 , … 𝑝1,𝑡 𝑘+1  𝑖1,𝑡1 는 item id  𝑝1,𝑡2 는 유저 1이 𝑡2 시간에 볼 아이템의 확률 분포 Ex 𝟏 𝟐 𝟑 𝟒 𝟓 𝑝1,𝑡2 = 𝟎. 𝟐, 𝟎. 𝟑, 𝟎. 𝟏, 𝟎. 𝟏, 𝟎. 𝟑 이면 item 1을 볼 확률이 0.2, item 2를 볼 확 률이 0.3, item 3을 볼 확률이 0.1 … 14 Input : One- hot Encod ed Vector Embed ding Layer GRU Layer GRU Layer GRU Layer … Feedfo rward Layer Output : scores on items
  • 15. DMStructure of proposed model Ont-hot vector  The input vector of length equal to the number of items and only the elements corresponding to the active item is one. Embedding  Assign a trainable vector for every item.  A model with embedding performs worse 15 Input : One- hot Encod ed Vector Embed ding Layer GRU Layer GRU Layer GRU Layer … Feedfo rward Layer Output : scores on items
  • 16. DMTraining the model (1) Training RNN using mini-batch. EX) 1 3 (𝑜2 𝑖 − 𝑦2 𝑖) 16 Input 1 RNN layerℎ1 Input 2 RNN layer Input 1 Inputs 1 Input 2 Inputs 2 ℎ1 ℎ1 ℎ2 ℎ2 ℎ2 𝑜2 𝑜2 𝑜2 𝑦2 𝑦2 𝑦2
  • 17. DMTraining the model (2)  For simplicity, Model = 𝒇 𝒊𝒏𝒑𝒖𝒕, 𝒉𝒊𝒅𝒅𝒆𝒏 𝒔𝒕𝒂𝒕𝒆 = 𝒐𝒖𝒕𝒑𝒖𝒕, 𝒏𝒆𝒙𝒕 𝒉𝒊𝒅𝒅𝒆𝒏 𝒔𝒕𝒂𝒕𝒆 𝒇 𝒊 𝟎,𝟏, 𝒉 𝟏 = 𝟎 = 𝒐 𝟎,𝟐, 𝒉 𝟐 𝒐 𝟎,𝟐과 𝒊 𝟎,𝟐 로 패러미터 업데이트, 𝒇 𝒊 𝟎,𝒌, 𝒉 𝒌 = 𝟎 = 𝒐 𝟎,𝒌, 𝒉 𝒌 𝒐 𝟎,𝒌과 𝒊 𝟎,𝒌로 패러미터 업데이트 각 세션에 대해 𝒉 𝒌만 기억하고 있으면 여러 세션을 parallel하게 업데이트 할 수 있다. 17
  • 18. DMTraining the model (3) Designed to training various length of sessions in parallel Not to break sessions down into fragments. 18
  • 19. DMRanking Loss Idea:  explicitly force rating of positive sample higher than ratings of negative samples. Ordinary goal  𝑢 = 𝑎𝑛 user.  𝑦 𝑢,𝑖 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑢𝑠𝑒𝑟 𝑢 𝑡𝑜 𝑙𝑖𝑘𝑒 𝑜𝑟 𝑠𝑒𝑒 𝑖𝑡𝑒𝑚 𝑖 , 𝑦 𝑢,𝑖 ∈ [0,1]  𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑖 𝑦 𝑢,𝑖 , i ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들  𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑗 𝑦 𝑢,𝑖 , j ∈ 유저 𝑢가 보지 않은 𝑖𝑡𝑒𝑚들 Goal of ranking loss  𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑖 𝑗(𝑦 𝑢,𝑖−𝑦 𝑢,𝑗)  i ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들, j ∈ 유저 𝑢가 본 𝑖𝑡𝑒𝑚들 19
  • 20. DM Ranking loss functions used in the model Bayesian Personalized Ranking Loss(BPR) 𝑵 𝒔 ≔ 유저가 보지 않은 아이템 중 일부 𝒓 𝒔,𝒌 ≔ 세션 s가 item k를 봤는지 안 봤는지에 대한 예측 값, 0과 1 사이. 𝒊 ≔ 유저가 다음번에 실제로 본 값 𝝈 𝒙 = 𝟏 𝟏 + 𝒆𝒙𝒑 −𝒙 𝒍𝒐𝒔𝒔 = − 𝟏 𝐍 𝐬 𝒋∈𝑵 𝒔 𝒍𝒐𝒈 𝝈 𝒓 𝒔,𝒊 − 𝒓 𝒔,𝒋 Proposed loss function(TOP1) 𝑵 𝒔 ≔ 유저가 보지 않은 아이템 중 일부 𝒓 𝒔,𝒌 ≔ 세션 s가 item k를 봤는지 안 봤는지에 대한 예측 값, 0과 1 사이. 𝒊 ≔ 유저가 다음번에 실제로 본 값 𝝈 𝒙 = 𝟏 𝟏 + 𝒆𝒙𝒑 −𝒙 𝒍𝒐𝒔𝒔 = − 𝟏 𝐍 𝐬 𝒋∈𝑵 𝒔 𝝈 𝒓 𝒔,𝒋 − 𝒓 𝒔,𝒊 + 𝝈( 𝒓 𝒔,𝒋 𝟐 ) 20
  • 22. DMDatsets Dataset Recsys 2015(RCS15) OTT video(VIDEO) # sessions 15,324 ~37k # items 37,483 ~330k # clicks 71,222 ~180k 22 Preprocessing Erase items in test-set that do not appeared in train-set Erase Sessions with length 1 Do not split session sequence into train-set and test-set
  • 23. DM Evaluation measure Recall@k  I want this answer : Cat  But the computer says Your answer might be one of [Chicken, Dog, Horse]  Then Recall @ 3 = 0  I want this answer ! : pizza  But the computer says Your answer might be one of [chicken, dog, pizza]  Then Recall @ 3 = 1  Computer는 candidates를 갖고 있고, 이 를 정렬한 뒤 top-k개를 뽑아 이 중에 정 답을 있으면 1, 아니면 0이 된다. MRR@k Computer tries to guess my hair color. He have 3 chances, and says Red(1st), Black(2nd), Yellow(3rd). My hair color is black.  MRR @ 3 = 1 / 2 = 0.5  Computer는 candidates를 갖고 있고, 이를 정렬했을 때 (top-k개를 뽑아) 정 답의 rank의 역수를 말한다.  정답이 top-k개 밖이면 0 MRR@k = 0 23
  • 24. DMRecall@20 and MMR@20 using baseline methods Baseline RSC15 VIDEO Recall@20 MRR@20 Recall@20 MRR@20 POP 0.0050 0.0012 0.0499 0.0117 S-POP 0.2672 0.1775 0.1301 0.0863 Item-KNN 0.5065 0.2048 0.5598 0.3381 BPR-MF 0.2574 0.0618 0.0692 0.0374 24
  • 25. DMRecall@20 and MRR@20 for different types of single-layer GRU Loss function # Units Length of hidden state vector(𝒉𝒊) 0.5777RSC15 VIDEO Recall@20 MRR@20 Recall@20 MRR@20 TOP1 100 0.5853 0.2305 0.6141 0.3511 BPR 100 0.6069 0.2407 0.5999 0.3260 Cross-Entropy 100 0.6074 0.2430 0.6372 0.3720 TOP1 1000 0.6206 0.2693 0.6624 0.3891 BPR 1000 0.6322 0.2467 0.6311 0.3136 Cross-Entropy 1000 0.5777 0.2153 N/A N/A 25
  • 26. DMDiscussion Larger hidden-state(unit) gives better performance.  at 100 < at 1000 < at 10^4 Pointwise-loss is unstable  I do not understand this means ‘numerically unstable’, overflow or underflow, or means that results is inconsistent. Deeper GRU layer improves performance. Embedding is not good for this model. 26
  • 27. DMWhat makes this paper great? New parallel training methods for training RNN (in recommender system) Devise new Ranking Loss (I think other models can exploit this loss function) Performance improvement  20~25% performance improved compared to best baseline Item-KNN Designing novel model framework to solve session-based Recommender system problem using RNN 27