SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
Get on with it!
Recommender system industry
challenges move towards real-world,
online evaluation
Padova – March 24th, 2016
Andreas Lommatzsch - TU Berlin, Berlin, Germany
Jonas Seiler - plista, Berlin, Germany
Daniel Kohlsdorf - XING, Hamburg, Germany
CrowdRec - www.crowdrec.eu
Idomaar - http://rf.crowdrec.eu
• Andreas
Andreas Lommatzsch
Andreas.Lommatzsch@tu-berlin.de
http://www.dai-lab.de
• s
Jonas Seiler
Jonas.Seiler@plista.com
http://www.plista.com
• Daniel
Daniel Kohlsdorf
Daniel.Kohlsdorf@xing.com
http://www.xing.com
Where are recommender
system challenges headed?
Direction 1:
Use info beyond the
user-item matrix.
Direction 2:
Online evaluation +
multiple metrics.
Moving towards real-world evaluation
Flickr credit: rodneycampbell
Why evaluate?
• Evaluation is crucial for the success of real-life systems
• How should we evaluate?
Precision and
Recall
Technical
complexity
Influence
on sales
Required hardware
resources
Business
models
Scalability
Diversity of the
presented results
User
satisfaction
Evaluation Settings
• A static collection of documents
• A set of queries
• A list of relevant documents defined by
experts for each query
Traditional Evaluation in IR
“The Cranfield paradigm”
Advantages
• Reproducible setting
• All researches have exactly the same
information
• Optimized for measuring precision
Query0
* #nn
* #nn
* #nn
Traditional Evaluation in IR
Weaknesses of traditional IR evaluation
• High costs for creating dataset
• Datasets are not up-to-date
• Domain-specific documents
• The expert-defined ground truth does not
consider individual user preferences
• Individual user preferences
• Context-awareness is not considered
• Technical aspects are ignored
Context is
everything
Industry and recsys challenges
• Challenges benefit both industry and academic research.
• We look at how industry challenges have evolved since
the Netflix prize 2009.
Traditional Evaluation in RecSys
Evaluation Settings
• Rating prediction on user-item matrices
• Large, sparse dataset
• Predict personalized ratings
• Cross-validation, RMSE
Advantages
• Reproducible setting
• Personalization
• Dataset is based on
real user ratings “The Netflix paradigm”
Traditional Evaluation in RecSys
Weaknesses of traditional Recommender evaluation
• Static data
• Only one type of data - only user ratings
• User ratings are noisy
• Temporal aspects tend to be ignored
• Context-awareness is not considered
• Technical aspects are ignored
Challenges of Developing Applications
Challenges
• Data streams - continuous changes
• Big data
• Combine knowledge from different sources
• Context-Awareness
• Users expect personally relevant results
• Heterogeneous devices
• Technical complexity, real-time requirements
How to address these challenges in the Evaluation?
• Realistic evaluation setting
• Heterogeneous data sources
• Streams
• Dynamic user feedback
• Appropriate metrics
• Precision and User satisfaction
• Technical complexity
• Sales and Business models
• Online and Offline Evaluation
How to Setup a better Evaluation?
Approaches for a better Evaluation
• News recommendations
@ plista
• Job recommendations
@ XING
The plista Recommendation Scenario
Setting
● 250 ms response time
● 350 Mio AI/day
● In 10 Countries
Challenges
● News change
continuously
● User do not log-in
explicitly
● Seasonality,
context-depend user
preferences
Offline
• Cross-validation
• Metric Optimization Engine
(https://github.com/Yelp/MOE)
• Integration into Spark
• How well does it correlate with
Online Evaluation?
• Time Complexity
Evaluation @ plista
Online
• AB Tests
• Limited
• by Caching Memory
• Computational
Resources
• MOE*
Offline
• Mean and variance estimation of parameter space with
Gaussian Process
• Evaluate parameter with highest Expected Improvement (EI),
Upper Confidence Interval ….
• Rest API
Evaluation using MOE
Online
• A/B Tests are expensive
• Model non-stationarity
• Integrate out non-stationarity
to get mean EI
Evaluation using MOE
Provide an API enabling researchers testing own ideas
• The CLEF-NewsREEL challenge
• A Challenge in CLEF (Conferences and Labs of the Evaluation Forum)
• 2 Tasks: Online and Offline Evaluation
The CLEF-NewsREEL challenge
How does the challenge work?
• Live streams consisting of impressions, requests, and
clicks, 5 publishers, approx 6 Million messages per day
• Technical requirements: 100 ms per request
• Live evaluation
based on CTR
CLEF-NewsREEL
Online Task
Online vs. Offline Evaluation
• Technical aspects can be evaluated without user feedback
• Analyze the required resources and the response time
• Simulate the online evaluation by replaying a recorded
stream
CLEF-NewsREEL
Offline Task
Challenge
• Realistic simulation of streams
• Reproducible setup of computing environments
Solution
• A framework simplifying
the setup of the evaluation
environment
• The Idomaar framework
developed in the CrowdRec project
CLEF-NewsREEL
Offline Task
http://rf.crowdrec.eu
More Information
• SIGIR forum Dec 2015 (Vol 49, #2)
http://sigir.org/files/forum/2015D/p129.pdf
Evaluate your algorithm online and offline in NewsREEL
• Register for the challenge!
http://crowdrec.eu/2015/11/clef-newsreel-2016/
(register until 22nd of April)
• Tutorials and Templates are provided at orp.plista.com
CLEF-NewsREEL
https://recsys.xing.com/
XING - RecSys Challenge
Job Recommendations @ XING
XING - Evaluation based on interaction
● On Xing users can give feedback on recommendations.
● Number of user feedback way lower than implicit measures.
● A/B Tests focus on clickthrough rate.
XING - RecSys Challenge, Scoring,
Space on Page
● Predict 30 items for each user.
● Score: weighted combination of the
precision
○ precisionAt(2)
○ precisionAt(4)
○ precisionAt(6)
○ precisionAt(20)
Top 6
XING - RecSys Challenge, User Data
• User ID
• Job Title
• Educational Degree
• Field of Study
• Location
XING - RecSys Challenge, User Data
• Number of past jobs
• Years of Experience
• Current career level
• Current discipline
• Current industry
XING - RecSys Challenge, Item Data
• Job title
• Desired career level
• Desired discipline
• Desired industry
XING - RecSys Challenge, Interaction Data
• Timestamp
• User
• Job
• Type:
• Deletion
• Click
• Bookmark
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Anonymization
XING - RecSys Challenge, Future
• Live Challenge
• Users submit predicted future interactions
• The solution is recommended on the platform
• Participants get points for actual user clicks
Release to Challenge Collect Clicks
Work On Predictions
Score
How to setup a better Evaluation
• Consider different quality criteria
(prediction, technical, business models)
• Aggregate heterogeneous information sources
• Consider user feedback
• Use online and offline analyses
to understand users and their
requirements
Concluding ...
Participate in challenges based on real-life scenarios
• NewsREEL challenge
Concluding ...
• RecSys 2016 challenge
=> Organize a challenge. Focus on real-life data.
More Information
• http://www.crowdrec.eu
• (http://www.clef-newsreel.org)
• http://orp.plista.com
• http://2016.recsyschallenge.com
• http://www.xing.com
Thank You
Questions?

Weitere ähnliche Inhalte

Ähnlich wie ECIR Recommendation Challenges

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AISanjana Chowdhury
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryMark Constable
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Peter Schleinitz
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission TeamsDashlane
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...Comit Projects Ltd
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...TriNimbus
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design ProcessFulvio Corno
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentAlexey Pyshkin
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingNorbertKroth
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my viewOuti Aramo
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationAdaCore
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...DianaGray10
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory FrameworkBIWUG
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerProduct School
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3Azhar Shaik
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkPeter Skomoroch
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 

Ähnlich wie ECIR Recommendation Challenges (20)

Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Advanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project DeliveryAdvanced Project Data Analytics for Improved Project Delivery
Advanced Project Data Analytics for Improved Project Delivery
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
 
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
Chris Munns, DevOps @ Amazon: Microservices, 2 Pizza Teams, & 50 Million Depl...
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Ambient Intelligence Design Process
Ambient Intelligence Design ProcessAmbient Intelligence Design Process
Ambient Intelligence Design Process
 
Decision Matrix for IoT Product Development
Decision Matrix for IoT Product DevelopmentDecision Matrix for IoT Product Development
Decision Matrix for IoT Product Development
 
ATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinkingATD-2018_kroth_agile_thinking
ATD-2018_kroth_agile_thinking
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
 
Product Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configurationProduct Lines and Ecosystems: from customization to configuration
Product Lines and Ecosystems: from customization to configuration
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
So Now You’re a UiPath Developer – What’s Next?” What Role do You Play as Dev...
 
2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework2014 12-16 biwug - cgi SharePoint Factory Framework
2014 12-16 biwug - cgi SharePoint Factory Framework
 
How to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product ManagerHow to Build Winning Products by Microsoft Sr. Product Manager
How to Build Winning Products by Microsoft Sr. Product Manager
 
Software engineering jwfiles 3
Software engineering jwfiles 3Software engineering jwfiles 3
Software engineering jwfiles 3
 
Executive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you thinkExecutive Briefing: Why managing machines is harder than you think
Executive Briefing: Why managing machines is harder than you think
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 

Kürzlich hochgeladen

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Kürzlich hochgeladen (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

ECIR Recommendation Challenges

  • 1. Get on with it! Recommender system industry challenges move towards real-world, online evaluation Padova – March 24th, 2016 Andreas Lommatzsch - TU Berlin, Berlin, Germany Jonas Seiler - plista, Berlin, Germany Daniel Kohlsdorf - XING, Hamburg, Germany CrowdRec - www.crowdrec.eu Idomaar - http://rf.crowdrec.eu
  • 5. Where are recommender system challenges headed? Direction 1: Use info beyond the user-item matrix. Direction 2: Online evaluation + multiple metrics. Moving towards real-world evaluation Flickr credit: rodneycampbell
  • 6. Why evaluate? • Evaluation is crucial for the success of real-life systems • How should we evaluate? Precision and Recall Technical complexity Influence on sales Required hardware resources Business models Scalability Diversity of the presented results User satisfaction
  • 7. Evaluation Settings • A static collection of documents • A set of queries • A list of relevant documents defined by experts for each query Traditional Evaluation in IR “The Cranfield paradigm” Advantages • Reproducible setting • All researches have exactly the same information • Optimized for measuring precision Query0 * #nn * #nn * #nn
  • 8. Traditional Evaluation in IR Weaknesses of traditional IR evaluation • High costs for creating dataset • Datasets are not up-to-date • Domain-specific documents • The expert-defined ground truth does not consider individual user preferences • Individual user preferences • Context-awareness is not considered • Technical aspects are ignored Context is everything
  • 9. Industry and recsys challenges • Challenges benefit both industry and academic research. • We look at how industry challenges have evolved since the Netflix prize 2009.
  • 10. Traditional Evaluation in RecSys Evaluation Settings • Rating prediction on user-item matrices • Large, sparse dataset • Predict personalized ratings • Cross-validation, RMSE Advantages • Reproducible setting • Personalization • Dataset is based on real user ratings “The Netflix paradigm”
  • 11. Traditional Evaluation in RecSys Weaknesses of traditional Recommender evaluation • Static data • Only one type of data - only user ratings • User ratings are noisy • Temporal aspects tend to be ignored • Context-awareness is not considered • Technical aspects are ignored
  • 12. Challenges of Developing Applications Challenges • Data streams - continuous changes • Big data • Combine knowledge from different sources • Context-Awareness • Users expect personally relevant results • Heterogeneous devices • Technical complexity, real-time requirements
  • 13. How to address these challenges in the Evaluation? • Realistic evaluation setting • Heterogeneous data sources • Streams • Dynamic user feedback • Appropriate metrics • Precision and User satisfaction • Technical complexity • Sales and Business models • Online and Offline Evaluation How to Setup a better Evaluation?
  • 14. Approaches for a better Evaluation • News recommendations @ plista • Job recommendations @ XING
  • 15. The plista Recommendation Scenario Setting ● 250 ms response time ● 350 Mio AI/day ● In 10 Countries Challenges ● News change continuously ● User do not log-in explicitly ● Seasonality, context-depend user preferences
  • 16. Offline • Cross-validation • Metric Optimization Engine (https://github.com/Yelp/MOE) • Integration into Spark • How well does it correlate with Online Evaluation? • Time Complexity Evaluation @ plista Online • AB Tests • Limited • by Caching Memory • Computational Resources • MOE*
  • 17. Offline • Mean and variance estimation of parameter space with Gaussian Process • Evaluate parameter with highest Expected Improvement (EI), Upper Confidence Interval …. • Rest API Evaluation using MOE
  • 18. Online • A/B Tests are expensive • Model non-stationarity • Integrate out non-stationarity to get mean EI Evaluation using MOE
  • 19. Provide an API enabling researchers testing own ideas • The CLEF-NewsREEL challenge • A Challenge in CLEF (Conferences and Labs of the Evaluation Forum) • 2 Tasks: Online and Offline Evaluation The CLEF-NewsREEL challenge
  • 20. How does the challenge work? • Live streams consisting of impressions, requests, and clicks, 5 publishers, approx 6 Million messages per day • Technical requirements: 100 ms per request • Live evaluation based on CTR CLEF-NewsREEL Online Task
  • 21. Online vs. Offline Evaluation • Technical aspects can be evaluated without user feedback • Analyze the required resources and the response time • Simulate the online evaluation by replaying a recorded stream CLEF-NewsREEL Offline Task
  • 22. Challenge • Realistic simulation of streams • Reproducible setup of computing environments Solution • A framework simplifying the setup of the evaluation environment • The Idomaar framework developed in the CrowdRec project CLEF-NewsREEL Offline Task http://rf.crowdrec.eu
  • 23. More Information • SIGIR forum Dec 2015 (Vol 49, #2) http://sigir.org/files/forum/2015D/p129.pdf Evaluate your algorithm online and offline in NewsREEL • Register for the challenge! http://crowdrec.eu/2015/11/clef-newsreel-2016/ (register until 22nd of April) • Tutorials and Templates are provided at orp.plista.com CLEF-NewsREEL
  • 26. XING - Evaluation based on interaction ● On Xing users can give feedback on recommendations. ● Number of user feedback way lower than implicit measures. ● A/B Tests focus on clickthrough rate.
  • 27. XING - RecSys Challenge, Scoring, Space on Page ● Predict 30 items for each user. ● Score: weighted combination of the precision ○ precisionAt(2) ○ precisionAt(4) ○ precisionAt(6) ○ precisionAt(20) Top 6
  • 28. XING - RecSys Challenge, User Data • User ID • Job Title • Educational Degree • Field of Study • Location
  • 29. XING - RecSys Challenge, User Data • Number of past jobs • Years of Experience • Current career level • Current discipline • Current industry
  • 30. XING - RecSys Challenge, Item Data • Job title • Desired career level • Desired discipline • Desired industry
  • 31. XING - RecSys Challenge, Interaction Data • Timestamp • User • Job • Type: • Deletion • Click • Bookmark
  • 32. XING - RecSys Challenge, Anonymization
  • 33. XING - RecSys Challenge, Anonymization
  • 34. XING - RecSys Challenge, Future • Live Challenge • Users submit predicted future interactions • The solution is recommended on the platform • Participants get points for actual user clicks Release to Challenge Collect Clicks Work On Predictions Score
  • 35. How to setup a better Evaluation • Consider different quality criteria (prediction, technical, business models) • Aggregate heterogeneous information sources • Consider user feedback • Use online and offline analyses to understand users and their requirements Concluding ...
  • 36. Participate in challenges based on real-life scenarios • NewsREEL challenge Concluding ... • RecSys 2016 challenge => Organize a challenge. Focus on real-life data.
  • 37. More Information • http://www.crowdrec.eu • (http://www.clef-newsreel.org) • http://orp.plista.com • http://2016.recsyschallenge.com • http://www.xing.com Thank You Questions?