Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
1
Practical lessons in mining and
evaluating information systems
Hung-Hsuan Chen, National Central University
│ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI
Data Analytics Research Team (DART)
• Discover ...
│ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI
My background
• An engineer wearing a
scientist...
Outline
• I will present 4 common pitfalls in training and
evaluating recommender systems
• These pitfalls appeared in man...
A typical flow to build a
recommender system
5
t0 t1 t2
No recommendation
period
Initial recommendation algorithm
Rorig is applied online
The logs of this period is used...
Issue 1: trained model may be
biased toward highly reachable
products
7
Clicks resulted from the in-page
direct links
Day Day 1 Day 2
Percentage 19.3150% 21.2812%
8
• If we use the clickstreams ...
Percentage of promoted products in
the recommendation list
Meth
od
MC Categ
oryTP
TotalT
P
ICF-
U2I
ICF-I2I NMF-
U2I
NMF-
...
Lessons learned
• The common wisdom that the clickstream
represents a user’s interest/habit could be
problematic
§ Clickst...
Issue 2: the online
recommendation algorithm affects
the distribution of the test data
11
CTRs when using different online
recommendation algorithm
12
Lessons learned
• Previous studies sometimes use all the
available test data as the ground truth for
evaluation
• Unfortun...
Issue 3: click through rates are
mediocre proxy to the
recommendation revenues
14
CTR vs recommendation revenue
15
recommendation revenue
CTR
• Based on ~1 year log
• The correlation of determination is o...
Lessons learned
• Comparing recommendation algorithms based
on the user-centric metrics (e.g., CTR) may fail
to capture th...
Issue 4: evaluating
recommendation revenue is not
straightforward
17
Comparing number of purchases
1/25 1/29 2/2 2/6 2/10 2/14 2/18
date
totalorders
1/25 1/29 2/2 2/6 2/10 2/14 2/18
date
recm...
Lessons learned
• Although a recommendation module may
help users discover their needs, these users,
even without the reco...
Discussion
• We discussed 4 pitfalls in training and
evaluating recommender systems
• The first two issues are due to the ...
21
• Hung-Hsuan Chen
• https://www.ncu.edu.tw/~hhchen/
Questions?
Nächste SlideShare
Wird geladen in …5
×

[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒

349 Aufrufe

Veröffentlicht am

陳弘軒 / 中央大學資訊工程學系助理教授

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

[2018 台灣人工智慧學校校友年會] Practical experience in mining and evaluating information systems / 陳弘軒

  1. 1. 1 Practical lessons in mining and evaluating information systems Hung-Hsuan Chen, National Central University
  2. 2. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI Data Analytics Research Team (DART) • Discover the problems or needs (need) • Have the programming, math skills, and domain knowledge to solve the problem (skill) • Have passion to realize the plan (passion) 2 https://ncu-dart.github.io/
  3. 3. │ ITRI CONFIDENTIAL DOCUMENT DO NOT COPY OR DISTRIBUTE Copyright 2015 ITRI My background • An engineer wearing a scientist’s hat? • Deep learning and ensemble learning on recommender systems (2014 – 2018) • Academic search engine CiteSeerX (2008 – 2013) § 4M+ documents § 87M+ citations § 2M – 4M hits per day § 300K+ monthly downloads § 100K documents added monthly 3
  4. 4. Outline • I will present 4 common pitfalls in training and evaluating recommender systems • These pitfalls appeared in many previous studies on recommender systems and information systems • Details are in the following paper § Chen, H. H., Chung, C. A., Huang, H. C., & Tsui, W. (2017). Common pitfalls in training and evaluating recommender systems. ACM SIGKDD Explorations Newsletter, 19(1), 37-45. 4
  5. 5. A typical flow to build a recommender system 5
  6. 6. t0 t1 t2 No recommendation period Initial recommendation algorithm Rorig is applied online The logs of this period is used to train and compare the the initial algorithm Rorig and the new algorithm Rnew The data used to train the new recommendation algorithm Rnew and re-train the the original algorithm Rorig Test data to compare Rorig and Rnew ts The logs (e.g., clickstream) of this period is used to train the initial recommendation algorithm Rorig
  7. 7. Issue 1: trained model may be biased toward highly reachable products 7
  8. 8. Clicks resulted from the in-page direct links Day Day 1 Day 2 Percentage 19.3150% 21.2812% 8 • If we use the clickstreams to generate the positive samples, by rearranging the layout of the pages or the link targets in the pages, approximately 1/5 of the positive training instances are likely to be different.
  9. 9. Percentage of promoted products in the recommendation list Meth od MC Categ oryTP TotalT P ICF- U2I ICF-I2I NMF- U2I NMF- I2I train- all 100% 1.48% 1.84% 93.22 % 1.40% 1.48% 1.34% train- sel 1.08% 0.86% 0.98% 14.46 % 1.28% 1.32% 1.24% 9 • When using train-all as the training data, several algorithms recommend many of the “promoted products” § We seem to learn the “layout” of the product page (i.e., the direct links from !" to !# ) instead of the intrinsic relatedness of between products
  10. 10. Lessons learned • The common wisdom that the clickstream represents a user’s interest/habit could be problematic § Clickstreams are highly influenced by the reachability of the products and the layouts of the product pages • Training a recommender system based on the clickstreams are likely to learn § The “layout” of the pages § The recommendation rules of the online recommender system • Need to select training data more carefully 10
  11. 11. Issue 2: the online recommendation algorithm affects the distribution of the test data 11
  12. 12. CTRs when using different online recommendation algorithm 12
  13. 13. Lessons learned • Previous studies sometimes use all the available test data as the ground truth for evaluation • Unfortunately, such an evaluation process inevitably favors the algorithms that suggest products close to the online recommendation algorithm • We should carefully select the test dataset to perform a fairer evaluation. 13
  14. 14. Issue 3: click through rates are mediocre proxy to the recommendation revenues 14
  15. 15. CTR vs recommendation revenue 15 recommendation revenue CTR • Based on ~1 year log • The correlation of determination is only 0.089 § A weak positive relationship
  16. 16. Lessons learned • Comparing recommendation algorithms based on the user-centric metrics (e.g., CTR) may fail to capture the business owner’s satisfaction (e.g., revenue) • Unfortunately, studies on recommender systems mostly perform comparisons based on the user-centric metrics • Even if a recommendation algorithm attracts many clicks, we cannot assure this algorithm will bring a large amount of revenue to the website 16
  17. 17. Issue 4: evaluating recommendation revenue is not straightforward 17
  18. 18. Comparing number of purchases 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date totalorders 1/25 1/29 2/2 2/6 2/10 2/14 2/18 date recmdorders 18 Green line: the channel with a recommendation panel Blue line: the channel without a recommendation panel
  19. 19. Lessons learned • Although a recommendation module may help users discover their needs, these users, even without the recommendations, may still be able to locate the desired products by other processes • It is not clear whether a recommendation module brings extra purchases, or simply re- direct users from other purchasing processes to recommendation • A/B-testing might be necessary 19
  20. 20. Discussion • We discussed 4 pitfalls in training and evaluating recommender systems • The first two issues are due to the biased data collection of the training and the test datasets • The third issue is regarding the proper selection of the evaluation metrics • The fourth issue discusses the extra purchase vs re-directed purchase of the recommender systems 20
  21. 21. 21 • Hung-Hsuan Chen • https://www.ncu.edu.tw/~hhchen/ Questions?

×