2. DM목차
Backgrounds
Factor model based approach in recommender system
Neighborhood approach in recommender system
Recurrent Neural Networks and GRU(Gated Recurrent Unit)
RNN in session based recommender system
Structure of proposed model
Session based mini-batch
Ranking Loss
Experiments and Discussion
Conclusion
What makes this paper great?
2
4. DMFactor model based recommender system
Represent users and items in latent space numerically
EX)
Represent a user U as vector 𝑢 = 0.7, 1.3, −0.5, 0.6 𝑇
Represent a item I as vector 𝑖 = 2.05, 1.2, 2.6, 3.9 𝑇
Targets(what we want to predict) are calculated using numerically
represented using user, item, and other contents information.
EX)
Predicted rating of user U on item I
𝑟𝑢,𝑖 = 𝑑𝑜𝑡 𝑢, 𝑖 = 𝑢 𝑇 𝑖 = 0.7 ∗ 2.05 + 1.3 ∗ 1.2 − 0.5 ∗ 2.6 + 0.6 ∗ 3.9 = 4.035
4
5. DMNeighborhood based recommender system
Rating of user u on item I is calculated by how user u’s neighbors rated item I
determining neighborhood of users is important
finding similar users given an user is big issue
Targets are calculated by weighted normalized sum of rating of neighbors where similarity is used as weight.
EX)
Predicted rating of user U on item I
𝑟𝑢,𝑖 ==
𝑢𝑠𝑒𝑟𝑠(𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ∗ 𝑟𝑎𝑡𝑖𝑛𝑔 𝑢𝑠𝑒𝑟)
𝑢𝑠𝑒𝑟𝑠 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒
=
0.7 ∗ 3 + 0.3 ∗ 4 + 0.05 ∗ 2
0.7 + 0.3 + 0.05
= 3.24
5
User similarity with user U Rating on item I
U 70% 3
B 30% 4
C 5% 2
6. DMLimits of factor model in session-based recommender
system
Same user in different session are classified as different user.
Hard to construct user-profile
Lack of user-profile
Neighborhood based recommender system still works
Computing similarities between items are based on co-occurrences of items in
sessions(user profiles).
In session-based recommender system, Neighborhood methods are
used extensively.
6
7. DMRecurrent Neural Network, GRU
Recurrent Neural network is a kind of network that
sequential input
text sentences, series of actions of a user on web gives
arbitrary goals (mainly next element/action in sequential data)
sentiment of given sentence (given text sentence)
Which page a user will visit next (given actions on web)
Gated Recurrent Unit(GRU) (Cho et al., 2014)
Designed to solve gradient vanishing/exploding problem in RNN like LSTM
Faster training than LSTM because it have lower number of parameters than LSTM
7
8. DMAbstract view of Recurrent Neural Network (1)
RNN layer takes two input
Given input
Hidden state from previous state (initially zero)
Input hidden state determines the state of RNN layer
RNN layer gives two output
Output
Hidden state to next state.
RNN can span arbitrary length.
We can train using only last output, whole
output sequence, or some of them.
8
Input 1
Output 1
RNN layerℎ1
Initially zero
Input 2
Output 2
RNN layer
ℎ2
12. DMgradient vanishing/exploding in Deep Learning
presume we are familiar with basic linear algebra
Repeated matrix-vector multiplication can be dangerous.
𝑾𝒙 𝒕 = 𝒙 𝒕+𝟏
Suppose that 𝒙0 = 𝒗 𝟏 + 𝒗 𝟐 + ⋯ 𝒗 𝒌, 𝒘𝒉𝒆𝒓𝒆 𝒗 𝟏, 𝒗 𝟐, . . 𝒗 𝒌 are eigenvectors of 𝑾
This is True for most cases.
𝑾𝒙 𝟎 = 𝝀 𝟏 𝒗 𝟏 + 𝝀 𝟐 𝒗 𝟐 + … + 𝝀 𝒌 𝒗 𝒌 , where 𝝀 𝒌 is eigenvalue of W
𝒙 𝒏+𝟏 = 𝑾𝒙 𝒏 = 𝝀 𝟏
𝒏
𝒗 𝟏 + 𝝀 𝟐
𝒏
𝒗 𝟐 + … + 𝝀 𝒌
𝒏
𝒗 𝒌
If largest eigenvalue > 1, 𝒙 𝒏 goes to infinite
If largest eigenvalue < 1, 𝒙 𝒏 goes to zero
In both cases, training becomes infeasible
LSTM, GRU and other variants of RNN are designed to solve this problem while preserving long term
dependencies.
12
14. DMStructure of proposed model
Input : Sequence of an user
유저가 본 아이템 리스트
𝑖1,𝑡1
, 𝑖1,𝑡2
, … 𝑖1,𝑡 𝑘
Output :
유저가 볼 아이템 리스트(의 확률 분포)
𝑝1,𝑡2
, 𝑝1,𝑡3
, … 𝑝1,𝑡 𝑘+1
𝑖1,𝑡1
는 item id
𝑝1,𝑡2
는 유저 1이 𝑡2 시간에 볼 아이템의 확률
분포
Ex
𝟏 𝟐 𝟑 𝟒 𝟓
𝑝1,𝑡2
= 𝟎. 𝟐, 𝟎. 𝟑, 𝟎. 𝟏, 𝟎. 𝟏, 𝟎. 𝟑
이면 item 1을 볼 확률이 0.2, item 2를 볼 확
률이 0.3, item 3을 볼 확률이 0.1 …
14
Input :
One-
hot
Encod
ed
Vector
Embed
ding
Layer
GRU
Layer
GRU
Layer
GRU
Layer
…
Feedfo
rward
Layer
Output
:
scores
on
items
15. DMStructure of proposed model
Ont-hot vector
The input vector of length equal
to the number of items and only
the elements corresponding to
the active item is one.
Embedding
Assign a trainable vector for
every item.
A model with embedding
performs worse
15
Input :
One-
hot
Encod
ed
Vector
Embed
ding
Layer
GRU
Layer
GRU
Layer
GRU
Layer
…
Feedfo
rward
Layer
Output
:
scores
on
items
22. DMDatsets
Dataset Recsys 2015(RCS15) OTT video(VIDEO)
# sessions 15,324 ~37k
# items 37,483 ~330k
# clicks 71,222 ~180k
22
Preprocessing
Erase items in test-set that do not appeared in train-set
Erase Sessions with length 1
Do not split session sequence into train-set and test-set
23. DM
Evaluation measure
Recall@k
I want this answer : Cat
But the computer says Your answer
might be one of [Chicken, Dog, Horse]
Then Recall @ 3 = 0
I want this answer ! : pizza
But the computer says Your answer
might be one of [chicken, dog, pizza]
Then Recall @ 3 = 1
Computer는 candidates를 갖고 있고, 이
를 정렬한 뒤 top-k개를 뽑아 이 중에 정
답을 있으면 1, 아니면 0이 된다.
MRR@k
Computer tries to guess my hair
color. He have 3 chances, and
says Red(1st), Black(2nd),
Yellow(3rd).
My hair color is black.
MRR @ 3 = 1 / 2 = 0.5
Computer는 candidates를 갖고 있고,
이를 정렬했을 때 (top-k개를 뽑아) 정
답의 rank의 역수를 말한다.
정답이 top-k개 밖이면 0 MRR@k = 0
23
24. DMRecall@20 and MMR@20 using baseline methods
Baseline RSC15 VIDEO
Recall@20 MRR@20 Recall@20 MRR@20
POP 0.0050 0.0012 0.0499 0.0117
S-POP 0.2672 0.1775 0.1301 0.0863
Item-KNN 0.5065 0.2048 0.5598 0.3381
BPR-MF 0.2574 0.0618 0.0692 0.0374
24
25. DMRecall@20 and MRR@20 for different types of single-layer
GRU
Loss function # Units
Length of
hidden state
vector(𝒉𝒊)
0.5777RSC15 VIDEO
Recall@20 MRR@20 Recall@20 MRR@20
TOP1 100 0.5853 0.2305 0.6141 0.3511
BPR 100 0.6069 0.2407 0.5999 0.3260
Cross-Entropy 100 0.6074 0.2430 0.6372 0.3720
TOP1 1000 0.6206 0.2693 0.6624 0.3891
BPR 1000 0.6322 0.2467 0.6311 0.3136
Cross-Entropy 1000 0.5777 0.2153 N/A N/A
25
26. DMDiscussion
Larger hidden-state(unit) gives better performance.
at 100 < at 1000 < at 10^4
Pointwise-loss is unstable
I do not understand this means ‘numerically unstable’, overflow or underflow, or
means that results is inconsistent.
Deeper GRU layer improves performance.
Embedding is not good for this model.
26
27. DMWhat makes this paper great?
New parallel training methods for training RNN (in recommender
system)
Devise new Ranking Loss (I think other models can exploit this loss
function)
Performance improvement
20~25% performance improved compared to best baseline Item-KNN
Designing novel model framework to solve session-based
Recommender system problem using RNN
27