Slides of the presentation given at ECIR 2016 for the following paper:
Daniel Valcarce, Javier Parapar, Alvaro Barreiro: Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recommendation. ECIR 2016: 602-613
http://dx.doi.org/10.1007/978-3-319-30671-1_44
1. ECIR 2016, PADUA, ITALY
EFFICIENT PSEUDO-RELEVANCE FEEDBACK
METHODS FOR COLLABORATIVE FILTERING
RECOMMENDATION
Daniel Valcarce, Javier Parapar, Ălvaro Barreiro
@dvalcarce @jparapar @AlvaroBarreiroG
Information Retrieval Lab
@IRLab_UDC
University of A Coruña
Spain
2. Outline
1. Pseudo-Relevance Feedback (PRF)
2. Collaborative Filtering (CF)
3. PRF Methods for CF
4. Experiments
5. Conclusions and Future Work
1/28
4. Pseudo-Relevance Feedback (I)
Pseudo-Relevance Feedback provides an automatic method for
query expansion:
Assumes that the top retrieved documents with the
original query are relevant (pseudo-relevant set).
The query is expanded with the most representative terms
from this set.
The expanded query is expected to yield better results than
the original one.
3/28
15. Recommender Systems
Notation:
The set of users U
The set of items I
The rating that the user u gave to the item i is ru,i
The set of items rated by user u is denoted by Iu
The set of users that rated item i is denoted by Ui
The neighbourhood of user u is denoted by Vu
Top-N recommendation: create a ranked list containing
relevant and unknown items for each user u â U.
7/28
16. Collaborative Filtering (I)
Collaborative Filtering (CF) employs the past interaction
between users and items to generate recommendations.
Idea: If this user who is similar to you likes this item, maybe you will
also like it.
DiïŹerent input data:
Explicit feedback: ratings, reviews...
Implicit feedback: clicks, purchases...
Perhaps the most popular approach to recommendation given
the increasing amount of information about users.
8/28
17. Collaborative Filtering (II)
Collaborative Filtering (CF) techniques can be classiïŹed in:
Model-based methods: learn a predictive model from the
user-item ratings.
⊠Matrix factorisation (e.g., SVD)
Neighbourhood-based (or memory-based) methods:
compute recommendations using directly part of the
ratings.
⊠k-NN approaches
9/28
20. Previous Work on Adapting PRF Methods to CF
Relevance-Based Language Models
Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).
Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).
Two models: RM1 and RM2.
High precision ïŹgures in recommendation.
12/28
21. Previous Work on Adapting PRF Methods to CF
Relevance-Based Language Models
Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).
Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).
Two models: RM1 and RM2.
High precision ïŹgures in recommendation.
... but high computational cost!
RM1 : p(i|Ru) â
vâVu
p(v) p(i|v)
jâIu
p(j|v)
RM2 : p(i|Ru) â p(i)
jâIu vâVu
p(i|v) p(v)
p(i)
p(j|v)
12/28
22. Our Proposals based on Rocchioâs Framework
Rocchioâs Weights
pRocchio(i|u)
vâVu
rv,i
|Vu|
Robertson Selection Value g
pRSV(i|u)
vâVu
rv,i
|Vu|
p(i|Vu)
CHI-2 g
pCHIâ2(i|u)
p(i|Vu) â p(i|C)
2
p(i|C)
KullbackâLeibler Divergence
pKLD(i|u) p(i|Vu) log
p(i|Vu)
p(i|C)
13/28
23. Our Proposals based on Rocchioâs Framework
Rocchioâs Weights
pRocchio(i|u)
vâVu
rv,i
|Vu|
Robertson Selection Value g
pRSV(i|u)
vâVu
rv,i
|Vu|
p(i|Vu)
CHI-2 g
pCHIâ2(i|u)
p(i|Vu) â p(i|C)
2
p(i|C)
KullbackâLeibler Divergence
pKLD(i|u) p(i|Vu) log
p(i|Vu)
p(i|C)
13/28
24. Our Proposals based on Rocchioâs Framework
Rocchioâs Weights
pRocchio(i|u)
vâVu
rv,i
|Vu|
Robertson Selection Value g
pRSV(i|u)
vâVu
rv,i
|Vu|
p(i|Vu)
CHI-2 g
pCHIâ2(i|u)
p(i|Vu) â p(i|C)
2
p(i|C)
KullbackâLeibler Divergence
pKLD(i|u) p(i|Vu) log
p(i|Vu)
p(i|C)
13/28
25. Probability Estimation
Maximum Likelihood Estimate under a Multinomial
Distribution over the ratings:
pmle(i|Vu)
vâVu
rv,i
vâVu , jâI rv,j
pmle(i|C)
uâU ru,i
uâU, jâI ru,j
14/28
26. Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
Hard clustering: every user is in only one cluster. Clusters
may have diïŹerent sizes. Example: k-means.
Soft clustering: each user has its own neighbours. When
we set k to a high value, we may ïŹnd diïŹerent amounts of
neighbours. Example: k-NN.
15/28
27. Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
Hard clustering: every user is in only one cluster. Clusters
may have diïŹerent sizes. Example: k-means.
Soft clustering: each user has its own neighbours. When
we set k to a high value, we may ïŹnd diïŹerent amounts of
neighbours. Example: k-NN.
Idea: consider the variability of the neighbourhood lengths:
Big neighbourhoods is equivalent to a query with a lot of
results: the collection model is closed to the target user.
Small neighbourhoods implies that neighbours are highly
speciïŹc: the collection is very diïŹerent from the target user.
15/28
28. Neighbourhood Length Normalisation (II)
We bias the MLE to perform neighbourhood length
normalisation:
pnmle(i|Vu)
rank 1
|Vu|
vâVu
rv,i
vâVu , jâI rv,j
pnmle(i|C)
rank 1
|U|
uâU ru,i
uâU, jâI ru,j
16/28
38. Conclusions
We proposed to use fast PRF methods (Rocchioâs Weigths, RSV,
KLD and CHI-2):
They are orders of magnitude faster than the Relevance
Models (up to 200x).
They generate quite accurate recommendations.
Good novelty and diversity ïŹgures with a better trade-oïŹ
than RM2.
They lack of parameters (only clustering parameters).
26/28
39. Future Work
Other approaches for computing neighbourhoods:
Posterior Probability Clustering (a non-negative matrix
factorisation).
Normalised Cut (spectral clustering).
27/28
40. Future Work
Other approaches for computing neighbourhoods:
Posterior Probability Clustering (a non-negative matrix
factorisation).
Normalised Cut (spectral clustering).
Explore other PRF methods:
Divergence Minimization Models.
Mixture Models.
27/28