Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
1. @cataldomusto cataldo.musto@uniba.it
Fairness and Popularity Bias
in Recommender Systems:
an Empirical Evaluation
CATALDO MUSTO, PASQUALE LOPS, GIOVANNI SEMERARO
UNIVERSITÀ DEGLI STUDI DI BARI ‘ALDO MORO’ - ITALY
2. Recommender Systems
Technology able to push
relevant items (movies, news,
books, etc.) to the users
based on their preferences.
2
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
3. How to Evaluate a RecSys?
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Recommender Systems are
typically evaluated by using
accuracy metrics (Precision,
Recall, NDCG, HitRate, etc.)
Is there anything else we can
evaluate?
3
4. Popular Non-Accuracy Metrics
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Novelty Serendipity Fairness
4
5. Popular Non-Accuracy Metrics
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Novelty Serendipity Fairness
5
6. Fairness in AI
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
By definition, fairness refers to the
non-discrimination of certain groups
based on particular protected
attributes (gender, race, etc.)
Fair Al algorithms do not suffer of
these bias (or implement particular
strategies to mitigate them)
6
7. Fairness in RecSys
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
As regards RecSys, the concept of fairness
is multi-sided
Items Fairness: if all the items have equal
probability of being recommended
Users Fairness: if the recommendation list
reflects the actual preferences of the users
based on a target features (e.g., genre)
7
8. Problem: Popularity Bias
8
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Fairness of RecSys (and quality of recommendations) is
strongly affected by popularity bias
What does it mean?
Users tend to express preferences on popular
items, and this make popular items to be
recommended more frequently w.r.t. niche
ones [*].
[*] H. Abdollahpouri, M. Mansoury, R. Burke, B. Mobasher,
The unfairness of popularity biasin recommendation, arXiv
preprint arXiv:1907.13286 (2019).
9. Contribution
9
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
The goal of our study is to evaluate the
fairness-by-design of the most popular
recommendation paradigms.
11. Collaborative
Filtering
Paradigms: Collaborative Filtering
Exploits the preferences of the
community to generate
recommendations.
Intuition: to suggest items liked
by users similar to the target
one
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 11
12. Paradigms: Collaborative Filtering
target
user
neighbor
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 12
13. Paradigms: Collaborative Filtering
target
user
neighbor
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 13
14. Paradigms: Collaborative Filtering
target
user Problems: sparsity (cold start)
Neighbor ?
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 14
Neighbor ?
Neighbor ?
Neighbor ?
15. Paradigms: Collabor. Filtering (advanced)
15
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
16. Paradigms: Content-based RecSys
Exploit descriptive features
of the items (e.g. genre of a
book, director of a movie) to
generate recommendations.
Insight: to suggest items
similar to those the user
already liked
16
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
17. Paradigms: Content-based RecSys
user profile items
Recommendations
are generated by
matching the
features stored in
the user profile
with those
describing the
items to be
recommended.
17
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
18. Paradigms: (Recent) Content RecSys
18
Reecent methods exploit word
embedding techniques, which are based
on the distributional hypothesis
beer
wine
glass
spoon
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
19. 19
Matrix Revolutions
Donnie Darko
Grandi Magazzini
The Matrix
Paradigms: (Recent) Content RecSys
Reecent methods exploit word
embedding techniques, which are based
on the distributional hypothesis
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
20. Paradigms: Graph-based RecSys
Exploit a graph-based data
model connecting users,
items and descriptive
properties of the items (e.g.,
gathered from a knowledge
graph)
Insight: recommendation
seen as a path ranking
problem
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 20
21. Typically, the
relevance score for all
the item nodes is
calculated and the
top-N are returned as
recommendations
PageRank
Spreading Activation
Personalized PR
…
Paradigms: Graph-based RecSys
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021 21
22. Research Questions
22
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Which recommendation paradigm is more
prone by design to popularity bias?
2. Which recommendation paradigm
generates more fair recommendation
lists?
23. Experimental Protocol
23
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithm
Top-10
Recommmendations
Evaluation Metrics
Ratings
Content
Data are typically split into
training and test
24. Experimental Evaluation
24
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Two datasets
GoodBooks MovieLens1M
Users 53,424 6,040
Items 10,000 3,883
Ratings 6,000,000 1,000,209
%Positive 68.97% 57.91%
Sparsity 99.82% 96.42%
Content Plot Plot
25. Experimental Evaluation
25
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
GoodBooks MovieLens1M
Users 53,424 6,040
Items 10,000 3,883
Ratings 6,000,000 1,000,209
%Positive 68.97% 57.91%
Sparsity 99.82% 96.42%
Content Plot Plot
Two datasets
26. Experimental Evaluation
26
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Collaborative Filtering
User-to-User CF
Item-to-Item CF
FunkSVD
Biased Matrix Factorization
27. Experimental Evaluation
27
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Content-based RecSys
Vector Space Model +
TF/IDF
Word2Vec
Doc2Vec
Latent Semantic Indexing
28. Experimental Evaluation
28
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Graph-based RecSys
PageRank
Personalized PageRank
29. Experimental Evaluation
29
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Algorithms: Baselines
Random (upper bound)
Popularity-based (lower bound)
30. Evaluation Metrics
30
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Items Fairness: Catalogue Coverage
Items Fairness: Gini Index
Percentage of items in the catalogue recommended to at least one
users (the higher, the better)
Measures how unbalanced (in terms of frequency) is the
distribution of the recommendations to all the users. Values in
range [0,1], the lower the better. Gini Index close to 1 means that
just a few items appear in the recommendation lists.
31. Evaluation Metrics
31
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Users Fairness: ΔGAP
Gap between the average popularity of the items in the profile w.r.t. the average popularity
of the items in the recommendation list. A fair recommendation list reflects the average
popularity of the items in the profile.
Users are first split in groups (block-buster, diverse, niche users) and the average
popularity of each group is calculated
32. Evaluation Metrics
32
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Users Fairness: ΔGAP
Gap between the average popularity of the items in the profile w.r.t. the average popularity
of the items in the recommendation list. A fair recommendation list reflects the average
popularity of the items in the profile.
Users are first split in groups (block-buster, diverse, niche users) and the average
popularity of each group is calculated
ΔGAP=0 fair recommendation list
ΔGAP<0 under-estimation of the popularity
ΔGAP>0 over-estimation of the popularity
33. Results
33
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
34. Results
34
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
35. Results
35
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Catalogue Coverage
MovieLens 1M Goodbooks
Baselines Random 3,688 94.98% 10,000 100.00%
Popular 67 1.63% 243 2.43%
Collaborative
Filtering
U2U-CF 296 7.62% 2,210 22.10%
I2I-CF 471 12.13% 2,819 28.19%
Biased-MF 547 14.09% 1,830 18.30%
Content-based
Recsys
VSM/TF-IDF 444 11.43% 2,622 26.22%
Word2Vec 492 12.67% 3,081 30.81%
Doc2Vec 476 12.26% 2,987 29.87%
Graph-based
RecSys
PR 15 0.38% 11 0.11%
PPR 36 0.92% 22 0.22%
36. Take-Home Messages
36
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger
number of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse
than popularity-based recsys
• All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
• Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
• DGap are close to zero. Good fairness for the users, on
average
37. Results
37
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Gini Index
MovieLens1M GoodBooks
Baselines Random 0.185 0.334
Popular 0.995 0.998
Collaborative Filtering U2U-CF 0.989 0.973
I2I-CF 0.990 0.986
Biased-MF 0.984 0.987
Content-based Recsys VSM/TF-IDF 0.990 0.961
Word2Vec 0.985 0.956
Doc2Vec 0.987 0.959
Graph-based RecSys PR 0.996 0.999
PPR 0.995 0.998
38. Results
38
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
Gini Index
MovieLens1M GoodBooks
Baselines Random 0.185 0.334
Popular 0.995 0.998
Collaborative Filtering U2U-CF 0.989 0.973
I2I-CF 0.990 0.986
Biased-MF 0.984 0.987
Content-based Recsys VSM/TF-IDF 0.990 0.961
Word2Vec 0.985 0.956
Doc2Vec 0.987 0.959
Graph-based RecSys PR 0.996 0.999
PPR 0.995 0.998
39. Take-Home Messages
39
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger number
of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse than
popularity-based recsys
4. All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
• Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
• DGap are close to zero. Good fairness for the users, on
average
40. Results
40
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – MovieLens Data
41. Results
41
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – MovieLens Data
42. Results
42
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – GoodBooks Data
43. Results
43
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
ΔGAP – GoodBooks Data
44. Take-Home Messages
44
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
1. Collaborative Filtering algorithms cover a larger number
of items when the dataset is less sparse.
2. When the sparsity is higher, content-based
recommendations are best option
3. Graph-based recommender systems behave worse than
popularity-based recsys
4. All the algorithms are very prone to popularity bias (Gini
Index values very close to 1)
5. Recommendations algorithms tipically under-estimate
the average popularity of the recommendations
6. ΔGap is close to zero. Good fairness for the users, on
average
45. Conclusions
45
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
This paper provides a benchmark of the fairness-by-design of some
popular recommendation paradigms
46. Conclusions
46
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021
This paper provides a benchmark of the fairness-by-design of some
popular recommendation paradigms
• Results showed that algorithms are strongly affected by popularity bias.
• Item coverage is not satisfying, recommendation lists are not fair (as
confirmed by Gini Index). Mitigation strategies are needed.
• ΔGAP values are more satisfying when more advanced techniques are
exploited
• Future Work. Evaluation of hybrid recommendation models and more
recent content-based recommendation techniques. Introduction of other
non-accuracy metrics (novelty, serendipity, etc.)
47. Thank you! cataldo.musto@uniba.it
@cataldomusto
Contacts
47
Cataldo Musto, Pasquale Lops, Giovanni Semeraro. Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation.
AIxIA 2021 – 20th International Conference of the Italian Association for Artificial Intelligence. Online Event. December 3, 2021