Evaluating decision-aware recommender systems

399 Aufrufe

Veröffentlicht am

Short paper presentation at RecSys '17

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

Evaluating decision-aware recommender systems

  1. 1. Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017 Evaluating Decision-Aware Recommender Systems
  2. 2. 2 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8%
  3. 3. 3 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7% 
  4. 4. 4 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7%  Method Precision Coverage Best? R1 0.093 100% R2 0.181 95.6% ? R3 0.283 59.0% ? R4 0.326 28.2% ?
  5. 5. 5 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision ▪ To force different coverage levels, we allow recommenders to decide if a recommendation is worthy of being presented to the user or not Estimations
  6. 6. 6 Alejandro Bellogín – RecSys, August 2017 Balancing coverage and precision ▪ [Herlocker et al 2004]: “there is no general coverage metric that, at the same time, gives more weight to relevant items when accounting for coverage, and combines coverage and accuracy measures” ▪ [Gunawardana & Shani 2015] leave the problem of balancing coverage and precision as an open issue in the area
  7. 7. 7 Alejandro Bellogín – RecSys, August 2017 Combination metrics
  8. 8. 8 Alejandro Bellogín – RecSys, August 2017 Our proposal: Correctness metric ▪ Adapted from Question Answering: • Several questions to be answered by a system • Each question has several options • Only one option is correct • If an answer is not given, it should not be considered as an incorrect answer • Hence, if two systems have the same number of correct answers but one has failed less questions (it has decided not to respond), it should be better than the other one A. Peñas & Á. Rodrigo. 2011. A simple measure to assess non response. ACL.
  9. 9. 9 Alejandro Bellogín – RecSys, August 2017 Correctness metric for recommendation ▪ Each recommendation algorithm is a system ▪ Each candidate item to be ranked is a question ▪ If an item is recommended, it could be relevant or not ▪ The same set of items is presented to each system Recommended list Precision@5 Correctness
  10. 10. 10 Alejandro Bellogín – RecSys, August 2017 Correctness metrics for recommendation ▪ Four instantiations: • Based on users • Based on items
  11. 11. 11 Alejandro Bellogín – RecSys, August 2017 What about the decision-aware recommenders? Estimations
  12. 12. 12 Alejandro Bellogín – RecSys, August 2017 Decision-aware recommender systems ▪ Exploiting the confidence a system has on its own recommendations ▪ Not completely new • Significance weighting • Support and confidence in case-based recommenders ▪ Focus on Collaborative Filtering algorithms • Support of prediction score of nearest-neighbour methods • Uncertainty in prediction score of a probabilistic matrix factorisation algorithm
  13. 13. 13 Alejandro Bellogín – RecSys, August 2017 Estimating confidence in decision-aware recommendation ▪ For user-based KNN ▪ For probabilistic MF At least n (out of k) neighbours have participated in rating estimation?
  14. 14. 14 Alejandro Bellogín – RecSys, August 2017 Experimental setup ▪ Datasets • MovieLens 100K, MovieLens 1M, Jester • Random 5-fold training/test split ▪ Evaluation • Generate a ranking with every item in the test set • Metrics at cutoff 10: precision (P), user space coverage (USC), item space coverage (ISC), correctness (UC, RUC, IC, RIC), novelty (EPC), diversity (AggrDiv) ▪ Frameworks • RankSys: evaluation metrics, KNN recommenders • RiVal: data splitting
  15. 15. 15 Alejandro Bellogín – RecSys, August 2017 Performance: prediction uncertainty
  16. 16. 16 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction uncertainty • More strict constraints (smaller uncertainty) decrease novelty and diversity
  17. 17. 17 Alejandro Bellogín – RecSys, August 2017 Conclusions ▪ We have proposed a family of metrics based on the assumption that it is better to avoid a recommendation rather than providing a bad recommendation ▪ We have shown that a balance between precision, coverage, diversity, and novelty is critical ▪ We have proposed two strategies to decide if an item should be presented to the user
  18. 18. 18 Alejandro Bellogín – RecSys, August 2017 Future work ▪ Extend the correctness metrics to combine other evaluation dimensions ▪ Objective way to discriminate between systems: which one is really the best one? ▪ Consider the psychological aspect of the recommendation: the user is expecting to receive N recommendations (better bad than none?)
  19. 19. 19 Alejandro Bellogín – RecSys, August 2017 Thank you Evaluating Decision-Aware Recommender Systems Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017
  20. 20. 20 Alejandro Bellogín – RecSys, August 2017 Performance: prediction support
  21. 21. 21 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction support • Larger n decreases the diversity and novelty of the lists • More popular items are being recommended
  22. 22. 22 Alejandro Bellogín – RecSys, August 2017 Motivation ▪ Typical evaluation: it is better to fail than avoiding a recommendation • Assumption: no returning an item is an advocate of that item being considered as not relevant ▪ In this work: a recommender system may decide not to recommend a specific item • We need a metric where “no recommendation” does not mean relevant or not relevant. If possible, it should mean “better than not relevant”
  23. 23. 23 Alejandro Bellogín – RecSys, August 2017 Definition of uncertainty for PMF ▪ PMF: probabilistic matrix factorisation using a Bayesian approximation proposed in [Lim & Teh 2007] ▪ The standard deviation is derived using mean-field variational inference:

×