SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
User Engagement as Evaluation: a Ranking or a 
Regression Problem? 
Frederic Guillou, Romaric Gaudel, Jeremie Mary et Philippe Preux 
Recsys - 10/10/2014 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 1 / 17
Plan 
1 Introduction 
2 Recsys Challenge 
Description 
User Engagement as Evaluation 
The Retweet Eect 
3 Method 
LambdaMART Model 
Regression Approach 
4 Experiments 
Experimental Results 
Relevant Features 
5 Conclusion 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 2 / 17
Introduction 
Standards approaches for recommender systems 
Try to predict ratings or interests by minimizing prediction error 
Such approaches have 
aws: good predictions does not mean good 
recommendation 
Learning to Rank (LTR) 
Use a set of query-item pairs described by a set of input features and 
a score 
Try to predict the most accurate ranking of items for a query 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 3 / 17
Recsys Challenge / Description 
A speci
c recommendation task 
Find items with the highest user engagement 
Rank a set of tweets according to their success (retweet, favorite) 
Use a regression or a LTR method? 
Data 
Tweets from MovieTweetings dataset: tweets representing ratings on 
IMDb app 
Input features are metadata about the user, the tweet and the movie 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 4 / 17
Recsys Challenge / User Engagement as Evaluation 
The overall evaluation: average of the NDCG@10 calculated for each 
user 
Metric Value 
Tweets 170,285 
Unique users 22,079 
Unique items 13,618 
Tweets with 0 engagement 162,107 (95.2%) 
Unsuccessful users 17,502 (79.27%) 
1 
2 
3 
185,496 
5-14 
4 
15-51 
Figure : Distribution of user engagement for 
successful tweets 
Almost a binary classi
cation problem? 
Low number of successful tweets: most users only have tweet with 
zero engagement 
Even among successful tweets, the engagement is mostly 1 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 5 / 17
Recsys Challenge / User Engagement as Evaluation 
1 
2 
3-10 
11-20 
20 
Figure : Distribution of the number of tweets per user 
Some cases where any ranking is equivalent: 
The user doesn't have any successful tweets 
The user has only one tweet or tweets with the exact same engagement 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 6 / 17
Recsys Challenge / The Retweet Eect 
Some of the tweets are retweets, not original tweets 
This has strong consequences: 
All these tweets have a strictly positive engagement 
The retweet count of the tweet is directly accessible through the 
original tweet metadata 
Metric Value 
Number of retweets 1,808 
Percentage among successful tweets 22,10% 
Arti
cial successful users 28,44% 
After removal of the retweet eect and cleaning users, 2,792 users 
provide a clean ranking 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 7 / 17
Method / LambdaMART 
A listwise approach 
Consider the whole list of items as instances of learning 
Try directly to optimize a performance measure 
A combination of two previous algorithms 
MART: a pointwise approach based on boosted tree model 
Output is a linear combination of the output of a set of regression tree 
Can be viewed as performing gradient descent in a function space using 
regression trees 
LambdaRank: a method based on neural networks 
Express gradients based on the ranks of documents 
Lambda terms: rules de
ning how to change the ranks of items in 
order to optimize a performance measure 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 8 / 17
Method / LambdaMART 
Build the model 
Train on a modi
ed dataset with only useful users 
Very small dataset, with most users having only few successful tweets 
Lambda-MART score: 0.838 
Lambda-MART cannot capture exactly the information about the 
retweet eect 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 9 / 17
Method / Regression methods 
The retweet eect can be taken into account through three steps: 
1 Remove the retweet count of the original tweet from the features 
2 Modify the user engagement by substracting the retweet count of the 
original tweet 
3 Add the retweet count of the original tweet to the result returned by 
the learned model 
Pointwise approachs: builds a regression model and tries to predict 
the score of items 
Use of Linear Regression and Random Forests 
Lin / WrapLin : 0.806 / 0.843 
RF / WrapRF : 0.823 / 0.858 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 10 / 17
Experiments / Results 
Table : NDCG@10 on test dataset 
Model NDCG@10 
Retweet 0.806 
-MART 0.838 
RF / WrapRF 0.823 / 0.858 
Lin / WrapLin 0.806 / 0.843 
Mean(-MART, Retweet) 0.876 
Mean(-MART, WrapRF, WrapLin) 0.874 
OptAvg(-MART, WrapRF, WrapLin) 0.876 
OptAvg(-MART, RF, Retweet) 0.878 
The improvement through wrapping shows the importance of the 
retweet eect 
All best scores of NDCG are using the LTR method LambdaMART 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 11 / 17
Experiments / Relevant Features with Random Forests 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 
retweeted_fav 
retweeted_retweet 
listed_count 
followers_count 
retweeted_status 
lang_ar 
rating 
votes 
release_date 
created_at 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 
followers_count 
listed_count 
favourites_count 
lang_ar 
rating 
media 
friends_count 
user_mentions 
time_tweet_scrap 
created_at 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 
rating 
lang_ar 
time_tweet_movierelease 
release_date 
media 
user_mentions 
budget 
time_tweet_scrap 
lang_fa 
votes 
All features: features of the original tweet contribute the most 
Without retweet: user features are important ones 
Without user features: the rating given to the movie by the user 
contributes the most 
F. Guillou, R. Gaudel; J. Mary  P. Preux A Ranking or a Regression Problem? October 10, 2014 12 / 17

Weitere ähnliche Inhalte

Ähnlich wie User Engagement as Evaluation: a Ranking or a Regression Problem?

Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
Steve Feldman
 
Implementing Data Structures and Algorithms
Implementing Data Structures and AlgorithmsImplementing Data Structures and Algorithms
Implementing Data Structures and Algorithms
Ioan Tuns
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
templedf
 
Enterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management InformationEnterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management Information
Ajay Kumar Uppal
 

Ähnlich wie User Engagement as Evaluation: a Ranking or a Regression Problem? (20)

IT Disaster Recovery Readiness
IT Disaster Recovery ReadinessIT Disaster Recovery Readiness
IT Disaster Recovery Readiness
 
IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement) IT Disaster Recovery Readiness (Maturity Assessement)
IT Disaster Recovery Readiness (Maturity Assessement)
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 
Dilshod Achilov Gage R&R
Dilshod Achilov Gage R&RDilshod Achilov Gage R&R
Dilshod Achilov Gage R&R
 
Implementing Data Structures and Algorithms
Implementing Data Structures and AlgorithmsImplementing Data Structures and Algorithms
Implementing Data Structures and Algorithms
 
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
CSC410-Presentation
CSC410-PresentationCSC410-Presentation
CSC410-Presentation
 
Customer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclustCustomer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclust
 
ADOPT 2012 FFI CRC
ADOPT 2012 FFI CRCADOPT 2012 FFI CRC
ADOPT 2012 FFI CRC
 
pptpresentation.xmlcustomXmlitem1.xml .docx
pptpresentation.xmlcustomXmlitem1.xml      .docxpptpresentation.xmlcustomXmlitem1.xml      .docx
pptpresentation.xmlcustomXmlitem1.xml .docx
 
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORINGMACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
MACHINE LEARNING FOR SATELLITE-GUIDED WATER QUALITY MONITORING
 
PAC 2020 Santorin - Stijn Schepers
PAC 2020 Santorin - Stijn SchepersPAC 2020 Santorin - Stijn Schepers
PAC 2020 Santorin - Stijn Schepers
 
Rkfl Problem Solving
Rkfl Problem SolvingRkfl Problem Solving
Rkfl Problem Solving
 
Sustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and TransparencySustainability: Actors, Behavior, and Transparency
Sustainability: Actors, Behavior, and Transparency
 
Adoptability 2011
Adoptability 2011Adoptability 2011
Adoptability 2011
 
information retrival evaluation.ppt
information retrival evaluation.pptinformation retrival evaluation.ppt
information retrival evaluation.ppt
 
Enterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management InformationEnterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management Information
 
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
V Jornadas eMadrid sobre “Educación Digital”. Jesús G. Boticario, Universidad...
 
Linear Regression with R programming.pptx
Linear Regression with R programming.pptxLinear Regression with R programming.pptx
Linear Regression with R programming.pptx
 

Kürzlich hochgeladen

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Kürzlich hochgeladen (20)

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

User Engagement as Evaluation: a Ranking or a Regression Problem?

  • 1. User Engagement as Evaluation: a Ranking or a Regression Problem? Frederic Guillou, Romaric Gaudel, Jeremie Mary et Philippe Preux Recsys - 10/10/2014 F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 1 / 17
  • 2. Plan 1 Introduction 2 Recsys Challenge Description User Engagement as Evaluation The Retweet Eect 3 Method LambdaMART Model Regression Approach 4 Experiments Experimental Results Relevant Features 5 Conclusion F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 2 / 17
  • 3. Introduction Standards approaches for recommender systems Try to predict ratings or interests by minimizing prediction error Such approaches have aws: good predictions does not mean good recommendation Learning to Rank (LTR) Use a set of query-item pairs described by a set of input features and a score Try to predict the most accurate ranking of items for a query F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 3 / 17
  • 4. Recsys Challenge / Description A speci
  • 5. c recommendation task Find items with the highest user engagement Rank a set of tweets according to their success (retweet, favorite) Use a regression or a LTR method? Data Tweets from MovieTweetings dataset: tweets representing ratings on IMDb app Input features are metadata about the user, the tweet and the movie F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 4 / 17
  • 6. Recsys Challenge / User Engagement as Evaluation The overall evaluation: average of the NDCG@10 calculated for each user Metric Value Tweets 170,285 Unique users 22,079 Unique items 13,618 Tweets with 0 engagement 162,107 (95.2%) Unsuccessful users 17,502 (79.27%) 1 2 3 185,496 5-14 4 15-51 Figure : Distribution of user engagement for successful tweets Almost a binary classi
  • 7. cation problem? Low number of successful tweets: most users only have tweet with zero engagement Even among successful tweets, the engagement is mostly 1 F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 5 / 17
  • 8. Recsys Challenge / User Engagement as Evaluation 1 2 3-10 11-20 20 Figure : Distribution of the number of tweets per user Some cases where any ranking is equivalent: The user doesn't have any successful tweets The user has only one tweet or tweets with the exact same engagement F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 6 / 17
  • 9. Recsys Challenge / The Retweet Eect Some of the tweets are retweets, not original tweets This has strong consequences: All these tweets have a strictly positive engagement The retweet count of the tweet is directly accessible through the original tweet metadata Metric Value Number of retweets 1,808 Percentage among successful tweets 22,10% Arti
  • 10. cial successful users 28,44% After removal of the retweet eect and cleaning users, 2,792 users provide a clean ranking F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 7 / 17
  • 11. Method / LambdaMART A listwise approach Consider the whole list of items as instances of learning Try directly to optimize a performance measure A combination of two previous algorithms MART: a pointwise approach based on boosted tree model Output is a linear combination of the output of a set of regression tree Can be viewed as performing gradient descent in a function space using regression trees LambdaRank: a method based on neural networks Express gradients based on the ranks of documents Lambda terms: rules de
  • 12. ning how to change the ranks of items in order to optimize a performance measure F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 8 / 17
  • 13. Method / LambdaMART Build the model Train on a modi
  • 14. ed dataset with only useful users Very small dataset, with most users having only few successful tweets Lambda-MART score: 0.838 Lambda-MART cannot capture exactly the information about the retweet eect F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 9 / 17
  • 15. Method / Regression methods The retweet eect can be taken into account through three steps: 1 Remove the retweet count of the original tweet from the features 2 Modify the user engagement by substracting the retweet count of the original tweet 3 Add the retweet count of the original tweet to the result returned by the learned model Pointwise approachs: builds a regression model and tries to predict the score of items Use of Linear Regression and Random Forests Lin / WrapLin : 0.806 / 0.843 RF / WrapRF : 0.823 / 0.858 F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 10 / 17
  • 16. Experiments / Results Table : NDCG@10 on test dataset Model NDCG@10 Retweet 0.806 -MART 0.838 RF / WrapRF 0.823 / 0.858 Lin / WrapLin 0.806 / 0.843 Mean(-MART, Retweet) 0.876 Mean(-MART, WrapRF, WrapLin) 0.874 OptAvg(-MART, WrapRF, WrapLin) 0.876 OptAvg(-MART, RF, Retweet) 0.878 The improvement through wrapping shows the importance of the retweet eect All best scores of NDCG are using the LTR method LambdaMART F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 11 / 17
  • 17. Experiments / Relevant Features with Random Forests 0.0 0.1 0.2 0.3 0.4 0.5 0.6 retweeted_fav retweeted_retweet listed_count followers_count retweeted_status lang_ar rating votes release_date created_at 0.0 0.1 0.2 0.3 0.4 0.5 0.6 followers_count listed_count favourites_count lang_ar rating media friends_count user_mentions time_tweet_scrap created_at 0.0 0.1 0.2 0.3 0.4 0.5 0.6 rating lang_ar time_tweet_movierelease release_date media user_mentions budget time_tweet_scrap lang_fa votes All features: features of the original tweet contribute the most Without retweet: user features are important ones Without user features: the rating given to the movie by the user contributes the most F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 12 / 17
  • 18. Conclusion Recommendation problems are Learn To Rank problems Best approach builds upon a LTR strategy We can combine LambdaMART and Random Forests through a linear combination Scarcity of data make the robustness of model dicult to be proven The success of Twitter or IMDb should make the data collection easier to build bigger datasets and continue analysis and experiments F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 13 / 17
  • 19. Thank you for your attention Questions ? ... F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 14 / 17
  • 20. Thank you for your attention Questions ? F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 15 / 17
  • 21. Discussions Temporal aspects Dierence of time interval between train/test set in uences prediction or evaluation Possibly a large impact of time on some features: A user has more chances to gather followers throughout a longer period of time Some movies are released at a speci
  • 22. c time of the year A sequential approach? The evolution of a user or the popularity of a movie can be seen as sequential problems NDCG does not capture these information ) See metrics such as regret F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 16 / 17
  • 23. Thank you for your attention Now is the real question time F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 17 / 17