SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
The plista Dataset

ACM RecSys 2013, Hong Kong

Authors:
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
Speaker:
Brodt, Torben
International News Recommender
Systems Workshop and Challenge
October 13th, 2013
Introduction and Motivation
● Context: News Article Recommendation
Introduction and Motivation
● Do we need another recommendation data
set?
we have
...
● What features are those data sets missing?
● What requirements entail news articles for
recommendation?
Introduction and Motivation
● Features that had not been available in
existing data sets:
○ contextual features: device, operating system,
browser, etc.
○ cross-domain features: 13 different news providers
included
○ different interaction types: interactions with
recommendations (clicks), as well as news items
(impressions)
○ content features: headline, URL, images, text
snippets, etc.
Introduction and Motivation
● Additional requirements for recommending news articles
○ real-time → recommendations must be provided within a
short time interval (< 200ms)
○ changing relevancy → items’ relevancy decreases with
time
○ dynamics → new news items are being continuously
added
● Requirements inherent to existing recommender systems:
○ sparsity → users typically read only few news articles
○ cold start → systems refrain from requesting users to
create profiles; this results in a majority of small user
profiles
Dataset characteristics
{ // json
"type": "impression",
"context": {
"simple": {
"27": 418, // publisher
"14": 31721, // widget
...
},
"lists": {
"10": [100, 101] // channel
}
... specs hosted at http://orp.plista.
api
} com
Dataset characteristics
● object types
○
○
○
○

impressions → users reading news articles
clicks → users clicking recommendations
creates → news articles being created
updates → news articles being updated

api specs hosted at http://orp.plista.
com
Dataset usage
Dataset usage
● Evaluation based on
Click-Through-Rate
(CTR)
● ~ 84 million
impressions
● ~ 1 million clicks
Dataset usage
● evaluation cross-news portal
recommenders
● 10 - 36 % user overlap in
between different news
portals
Dataset usage
● news portal comparisons
● do we observe similar user
behaviour on news portals
offering similar content?
Dataset usage
● evaluating contextual
recommendation algorithms
● sensitive to
○ weekday
○ hour of day
○ ...
Dataset usage
When using the data set you may consider…
● … we identify users by session IDs
○
○

individual users may have several IDs
users sharing their device might be mapped to one ID

● … interactions (clicks, impressions) and content
dynamics (creates, updates) differ between news
portals
● … contents are restricted to German
● … preferences are represented on a binary scale (user
read article, user clicked recommendation)
● … clicking on recommendations might not reveal the
actual relevancy of an item
Conclusions
● we introduce a new data set intended to
support recommender systems research
● we outlined novel features which existing
data sets lacked
● we presented scenarios which can be
evaluated using the data set
● we pointed to critical aspects which ought
to be considered when working with the data
set
Summary
● news articles
○ of ~13 publishers

● transactional data
○ Impressions
○ Clicks

● contextual data
○ of ~50 attributes

● cross domain application
The plista Dataset
@inproceedings{Kille:2013,
title = {The plista Dataset},
author = {
Kille, Benjamin
and Hopfgartner, Frank
and Brodt, Torben
and Heintz, Tobias
},
booktitle = {
NRS'13: Proceedings of
the International Workshop and
Challenge on News Recommender Systems
},
year = {2013},
month = {10},
location = {Hong Kong, China},
publisher = {ACM},
pages={14--21}
}

Weitere ähnliche Inhalte

Ähnlich wie Paper the plista dataset

[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
Gabriel Moreira
 

Ähnlich wie Paper the plista dataset (20)

Understanding and responding to content blocking
Understanding and responding to content blockingUnderstanding and responding to content blocking
Understanding and responding to content blocking
 
Semantic e commerce
Semantic e commerceSemantic e commerce
Semantic e commerce
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
 
What can media learn from game analytics
What can media learn from game analyticsWhat can media learn from game analytics
What can media learn from game analytics
 
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
[Phd Thesis Defense] CHAMELEON: A Deep Learning Meta-Architecture for News Re...
 
Data + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create ImpactData + Audience: Connecting to Create Impact
Data + Audience: Connecting to Create Impact
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
 
Pinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web AnalyticsPinpoint, Prepare, and Perform with Web Analytics
Pinpoint, Prepare, and Perform with Web Analytics
 
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
A Framework For Effective Content Strategy Based On Heuristic Evaluation (Res...
 
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
Jarod Sickler and Morley Tooke - DITA Support Portals: A One Stop Shop to Giv...
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
UXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user rolesUXPA 2023: Learn how to get over personas by swiping right on user roles
UXPA 2023: Learn how to get over personas by swiping right on user roles
 
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
#twbconf 2017: Digital transformation in London - Natalie Taylor, Mayor of Lo...
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
 
Nicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at TwitterNicholas Gorski: Real-time revenue science at Twitter
Nicholas Gorski: Real-time revenue science at Twitter
 
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + DemosDrools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
Drools5 Community Training Module 5 Drools BLIP Architectural Overview + Demos
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018Deep Recommender Systems - PAPIs.io LATAM 2018
Deep Recommender Systems - PAPIs.io LATAM 2018
 
Google analytics overview
Google analytics overviewGoogle analytics overview
Google analytics overview
 

Mehr von Torben Brodt

Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platform
Torben Brodt
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon
Torben Brodt
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp Digital
Torben Brodt
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
Torben Brodt
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
Torben Brodt
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
Torben Brodt
 
Content recommendations
Content recommendationsContent recommendations
Content recommendations
Torben Brodt
 

Mehr von Torben Brodt (18)

Living Labs Challenge Workshop
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
 
Recommender Trends 2014
Recommender Trends 2014Recommender Trends 2014
Recommender Trends 2014
 
Nrs2013 recap
Nrs2013 recapNrs2013 recap
Nrs2013 recap
 
Open recommendation platform
Open recommendation platformOpen recommendation platform
Open recommendation platform
 
#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon#TOA13 - Tech Opoen Air Recommender Hackathon
#TOA13 - Tech Opoen Air Recommender Hackathon
 
Algorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp DigitalAlgorithmus, Good School, Camp Digital
Algorithmus, Good School, Camp Digital
 
Realtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands onRealtime Recommender with Redis: Hands on
Realtime Recommender with Redis: Hands on
 
Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04Recommender Hackathon @plista 2013/04
Recommender Hackathon @plista 2013/04
 
SIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender systemSIGIR 2013 BARS Keynote - the search for the best live recommender system
SIGIR 2013 BARS Keynote - the search for the best live recommender system
 
Content recommendations
Content recommendationsContent recommendations
Content recommendations
 
RecSys2012 inside the plista contest
RecSys2012   inside the plista contestRecSys2012   inside the plista contest
RecSys2012 inside the plista contest
 
Webhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQLWebhacks am Beispiel PHP + MySQL
Webhacks am Beispiel PHP + MySQL
 
GIT / SVN
GIT / SVNGIT / SVN
GIT / SVN
 
Collaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische EmpfehlungenCollaborative Filtering.. für automatische Empfehlungen
Collaborative Filtering.. für automatische Empfehlungen
 
Google Web Toolkit
Google Web ToolkitGoogle Web Toolkit
Google Web Toolkit
 
Geld Verdienen Mit Adsense
Geld Verdienen Mit AdsenseGeld Verdienen Mit Adsense
Geld Verdienen Mit Adsense
 
AJAX
AJAXAJAX
AJAX
 
Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"Web 2.0 - "Fluch oder Segen"
Web 2.0 - "Fluch oder Segen"
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Paper the plista dataset

  • 1. The plista Dataset ACM RecSys 2013, Hong Kong Authors: Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias Speaker: Brodt, Torben International News Recommender Systems Workshop and Challenge October 13th, 2013
  • 2. Introduction and Motivation ● Context: News Article Recommendation
  • 3. Introduction and Motivation ● Do we need another recommendation data set? we have ... ● What features are those data sets missing? ● What requirements entail news articles for recommendation?
  • 4. Introduction and Motivation ● Features that had not been available in existing data sets: ○ contextual features: device, operating system, browser, etc. ○ cross-domain features: 13 different news providers included ○ different interaction types: interactions with recommendations (clicks), as well as news items (impressions) ○ content features: headline, URL, images, text snippets, etc.
  • 5. Introduction and Motivation ● Additional requirements for recommending news articles ○ real-time → recommendations must be provided within a short time interval (< 200ms) ○ changing relevancy → items’ relevancy decreases with time ○ dynamics → new news items are being continuously added ● Requirements inherent to existing recommender systems: ○ sparsity → users typically read only few news articles ○ cold start → systems refrain from requesting users to create profiles; this results in a majority of small user profiles
  • 6. Dataset characteristics { // json "type": "impression", "context": { "simple": { "27": 418, // publisher "14": 31721, // widget ... }, "lists": { "10": [100, 101] // channel } ... specs hosted at http://orp.plista. api } com
  • 7. Dataset characteristics ● object types ○ ○ ○ ○ impressions → users reading news articles clicks → users clicking recommendations creates → news articles being created updates → news articles being updated api specs hosted at http://orp.plista. com
  • 9. Dataset usage ● Evaluation based on Click-Through-Rate (CTR) ● ~ 84 million impressions ● ~ 1 million clicks
  • 10. Dataset usage ● evaluation cross-news portal recommenders ● 10 - 36 % user overlap in between different news portals
  • 11. Dataset usage ● news portal comparisons ● do we observe similar user behaviour on news portals offering similar content?
  • 12. Dataset usage ● evaluating contextual recommendation algorithms ● sensitive to ○ weekday ○ hour of day ○ ...
  • 13. Dataset usage When using the data set you may consider… ● … we identify users by session IDs ○ ○ individual users may have several IDs users sharing their device might be mapped to one ID ● … interactions (clicks, impressions) and content dynamics (creates, updates) differ between news portals ● … contents are restricted to German ● … preferences are represented on a binary scale (user read article, user clicked recommendation) ● … clicking on recommendations might not reveal the actual relevancy of an item
  • 14. Conclusions ● we introduce a new data set intended to support recommender systems research ● we outlined novel features which existing data sets lacked ● we presented scenarios which can be evaluated using the data set ● we pointed to critical aspects which ought to be considered when working with the data set
  • 15. Summary ● news articles ○ of ~13 publishers ● transactional data ○ Impressions ○ Clicks ● contextual data ○ of ~50 attributes ● cross domain application
  • 16. The plista Dataset @inproceedings{Kille:2013, title = {The plista Dataset}, author = { Kille, Benjamin and Hopfgartner, Frank and Brodt, Torben and Heintz, Tobias }, booktitle = { NRS'13: Proceedings of the International Workshop and Challenge on News Recommender Systems }, year = {2013}, month = {10}, location = {Hong Kong, China}, publisher = {ACM}, pages={14--21} }