Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

ECIR 2016, PADUA, ITALY
EFFICIENT PSEUDO-RELEVANCE FEEDBACK
METHODS FOR COLLABORATIVE FILTERING
RECOMMENDATION
Daniel Valcarce, Javier Parapar, Álvaro Barreiro
@dvalcarce @jparapar @AlvaroBarreiroG
Information Retrieval Lab
@IRLab_UDC
University of A Coruña
Spain

Outline
1. Pseudo-Relevance Feedback (PRF)
2. Collaborative Filtering (CF)
3. PRF Methods for CF
4. Experiments
5. Conclusions and Future Work
1/28

PSEUDO-RELEVANCE FEEDBACK (PRF)

Pseudo-Relevance Feedback (I)
Pseudo-Relevance Feedback provides an automatic method for
query expansion:
Assumes that the top retrieved documents with the
original query are relevant (pseudo-relevant set).
The query is expanded with the most representative terms
from this set.
The expanded query is expected to yield better results than
the original one.
3/28

Pseudo-Relevance Feedback (II)
Information need
4/28

Information need
query
4/28

Information need
query Retrieval
System
4/28

Information need
query Retrieval
System
Query
Expansion
expanded
query
4/28

Pseudo-Relevance Feedback (III)
Some popular PRF approaches:
Based on Rocchio’s model
(Rocchio, 1971 & Carpineto et al., ACM TOIS 2001)
Relevance-Based Language Models
(Lavrenko & Croft, SIGIR 2001)
Divergence Minimization Model
(Zhai & Laﬀerty, SIGIR 2006)
Mixture Models
(Tao & Zhai, SIGIR 2006)
5/28

Recommender Systems
Notation:
The set of users U
The set of items I
The rating that the user u gave to the item i is ru,i
The set of items rated by user u is denoted by Iu
The set of users that rated item i is denoted by Ui
The neighbourhood of user u is denoted by Vu
Top-N recommendation: create a ranked list containing
relevant and unknown items for each user u ∈ U.
7/28

Collaborative Filtering (I)
Collaborative Filtering (CF) employs the past interaction
between users and items to generate recommendations.
Idea: If this user who is similar to you likes this item, maybe you will
also like it.
Diﬀerent input data:
Explicit feedback: ratings, reviews...
Implicit feedback: clicks, purchases...
Perhaps the most popular approach to recommendation given
the increasing amount of information about users.
8/28

Collaborative Filtering (II)
Collaborative Filtering (CF) techniques can be classiﬁed in:
Model-based methods: learn a predictive model from the
user-item ratings.
◦ Matrix factorisation (e.g., SVD)
Neighbourhood-based (or memory-based) methods:
compute recommendations using directly part of the
ratings.
◦ k-NN approaches
9/28

PRF for CF
PRF CF
User’s query User’s proﬁle
mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5
Documents
Neighbours
Terms
Items
11/28

Previous Work on Adapting PRF Methods to CF
Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).
Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).
Two models: RM1 and RM2.
High precision ﬁgures in recommendation.
12/28

Previous Work on Adapting PRF Methods to CF
Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).
Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).
Two models: RM1 and RM2.
High precision ﬁgures in recommendation.
... but high computational cost!
RM1 : p(i|Ru) ∝
v∈Vu
p(v) p(i|v)
j∈Iu
p(j|v)
RM2 : p(i|Ru) ∝ p(i)
j∈Iu v∈Vu
p(i|v) p(v)
p(i)
p(j|v)
12/28

Our Proposals based on Rocchio’s Framework
Rocchio’s Weights
pRocchio(i|u)
v∈Vu
rv,i
|Vu|
Robertson Selection Value g
pRSV(i|u)
v∈Vu
rv,i
|Vu|
p(i|Vu)
CHI-2 g
pCHI−2(i|u)
p(i|Vu) − p(i|C)
2
p(i|C)
Kullback–Leibler Divergence
pKLD(i|u) p(i|Vu) log
p(i|Vu)
p(i|C)
13/28

Probability Estimation
Maximum Likelihood Estimate under a Multinomial
Distribution over the ratings:
pmle(i|Vu)
v∈Vu
rv,i
v∈Vu , j∈I rv,j
pmle(i|C)
u∈U ru,i
u∈U, j∈I ru,j
14/28

Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
Hard clustering: every user is in only one cluster. Clusters
may have different sizes. Example: k-means.
Soft clustering: each user has its own neighbours. When
we set k to a high value, we may find different amounts of
neighbours. Example: k-NN.
15/28

Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
Hard clustering: every user is in only one cluster. Clusters
may have different sizes. Example: k-means.
Soft clustering: each user has its own neighbours. When
we set k to a high value, we may find different amounts of
neighbours. Example: k-NN.
Idea: consider the variability of the neighbourhood lengths:
Big neighbourhoods is equivalent to a query with a lot of
results: the collection model is closed to the target user.
Small neighbourhoods implies that neighbours are highly
specific: the collection is very different from the target user.
15/28

Experimental settings
Baselines:
UB: traditional user-based neighbourhood approach.
SVD: matrix factorisation.
UIR-Item: probabilistic approach.
RM1 and RM2: Relevance-Based Language Models.
Our algorithms:
Rocchio’s Weights (RW)
Robertson Selection Value (RSV)
CHI-2
Kullback-Leibler Divergence (KLD)
18/28

Efficiency
0.01
0.1
1
10
ML 100k ML 1M ML 10M
recommendationtimeperuser(s)
dataset
UIR
RM1
RM2
SVD++
RSV
UB
RW
CHI-2
KLD
19/28

Accuracy (nDCG@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
UB 0.0468 0.0313 0.0108 0.0055b
SVD 0.0936a 0.0608a 0.0101 0.0015
UIR-Item 0.2188ab 0.1795abd 0.0174abd 0.0673abd
RM1 0.2473abc 0.1402ab 0.0146ab 0.0444ab
RM2 0.3323abcd 0.1992abd 0.0207abcd 0.0957abcd
Rocchio’s Weights 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd
RSV 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd
KLD
MLE 0.2693abcd 0.1264ab 0.0197abcd 0.1576abcde
NMLE 0.3120abcd 0.1546ab 0.0201abcd 0.1101abcde
CHI-2
MLE 0.0777a 0.0709ab 0.0149ab 0.0939abcd
NMLE 0.3220abcd 0.1419ab 0.0204abcd 0.1459abcde
Table: Values of nDCG@10. Pink = best algorithm. Blue = not
signiﬁcantly diﬀerent to the best (Wilcoxon two-sided p < 0.01). 20/28

Diversity (Gini@10)
UIR-Item 0.0124 0.0050 0.0137 0.0005
RM2 0.0256 0.0069 0.0207 0.0019
CHI-2 NMLE 0.0450 0.0106 0.0506 0.0539
Table: Values of the complement of Gini index at 10. Pink = best
algorithm.
21/28

Novelty (MSI@10)
UIR-Item 5.2337e 8.3713e 3.7186e 17.1229e
RM2 6.8273c 8.9481c 4.9618c 19.27343c
CHI-2 NMLE 8.1711ec 10.0043ec 7.5555ec 8.8563
Table: Values of Mean Self-Information at 10. Pink = best algorithm.
22/28

Trade-off Accuracy-Diversity
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
200 300 400 500 600 700 800 900
G–(Gini,nDCG)
k
RM2
CHI-2 NMLE
Figure: G-measure of nDCG@10 and Gini@10 on MovieLens 100k
varying the number of neighbours k using Pearson’s correlation
similarity.
23/28

Trade-off Accuracy-Novelty
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
200 300 400 500 600 700 800 900
G–(MSI,nDCG)
k
RM2
CHI-2 NMLE
Figure: G-measure of nDCG@10 and MSI@10 on MovieLens 100k
varying the number of neighbours k using Pearson’s correlation
similarity.
24/28

Conclusions
We proposed to use fast PRF methods (Rocchio’s Weigths, RSV,
KLD and CHI-2):
They are orders of magnitude faster than the Relevance
Models (up to 200x).
They generate quite accurate recommendations.
Good novelty and diversity ﬁgures with a better trade-oﬀ
than RM2.
They lack of parameters (only clustering parameters).
26/28

Future Work
Other approaches for computing neighbourhoods:
Posterior Probability Clustering (a non-negative matrix
factorisation).
Normalised Cut (spectral clustering).
27/28

Future Work
Other approaches for computing neighbourhoods:
Posterior Probability Clustering (a non-negative matrix
factorisation).
Normalised Cut (spectral clustering).
Explore other PRF methods:
Divergence Minimization Models.
Mixture Models.
27/28

THANK YOU!
@DVALCARCE
http://www.dc.fi.udc.es/~dvalcarce

Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (6)

Ähnlich wie Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]

Ähnlich wie Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides] (20)

Mehr von Daniel Valcarce

Mehr von Daniel Valcarce (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation [ECIR '16 Slides]