User Engagement as Evaluation: a Ranking or a Regression Problem?

User Engagement as Evaluation: a Ranking or a
Regression Problem?
Frederic Guillou, Romaric Gaudel, Jeremie Mary et Philippe Preux
Recsys - 10/10/2014
F. Guillou, R. Gaudel; J. Mary P. Preux A Ranking or a Regression Problem? October 10, 2014 1 / 17

Plan
1 Introduction
2 Recsys Challenge
Description
User Engagement as Evaluation
The Retweet Eect
3 Method
LambdaMART Model
Regression Approach
4 Experiments
Experimental Results
Relevant Features
5 Conclusion

Introduction
Standards approaches for recommender systems
Try to predict ratings or interests by minimizing prediction error
Such approaches have
aws: good predictions does not mean good
recommendation
Learning to Rank (LTR)
Use a set of query-item pairs described by a set of input features and
a score
Try to predict the most accurate ranking of items for a query

Recsys Challenge / Description
A speci

c recommendation task
Find items with the highest user engagement
Rank a set of tweets according to their success (retweet, favorite)
Use a regression or a LTR method?
Data
Tweets from MovieTweetings dataset: tweets representing ratings on
IMDb app
Input features are metadata about the user, the tweet and the movie

Recsys Challenge / User Engagement as Evaluation
The overall evaluation: average of the NDCG@10 calculated for each
user
Metric Value
Tweets 170,285
Unique users 22,079
Unique items 13,618
Tweets with 0 engagement 162,107 (95.2%)
Unsuccessful users 17,502 (79.27%)
1
2
3
185,496
5-14
4
15-51
Figure : Distribution of user engagement for
successful tweets
Almost a binary classi

cation problem?
Low number of successful tweets: most users only have tweet with
zero engagement
Even among successful tweets, the engagement is mostly 1

Recsys Challenge / User Engagement as Evaluation
1
2
3-10
11-20
20
Figure : Distribution of the number of tweets per user
Some cases where any ranking is equivalent:
The user doesn't have any successful tweets
The user has only one tweet or tweets with the exact same engagement

Recsys Challenge / The Retweet Eect
Some of the tweets are retweets, not original tweets
This has strong consequences:
All these tweets have a strictly positive engagement
The retweet count of the tweet is directly accessible through the
original tweet metadata
Metric Value
Number of retweets 1,808
Percentage among successful tweets 22,10%
Arti

cial successful users 28,44%
After removal of the retweet eect and cleaning users, 2,792 users
provide a clean ranking

Method / LambdaMART
A listwise approach
Consider the whole list of items as instances of learning
Try directly to optimize a performance measure
A combination of two previous algorithms
MART: a pointwise approach based on boosted tree model
Output is a linear combination of the output of a set of regression tree
Can be viewed as performing gradient descent in a function space using
regression trees
LambdaRank: a method based on neural networks
Express gradients based on the ranks of documents
Lambda terms: rules de

ning how to change the ranks of items in
order to optimize a performance measure

Method / LambdaMART
Build the model
Train on a modi

ed dataset with only useful users
Very small dataset, with most users having only few successful tweets
Lambda-MART score: 0.838
Lambda-MART cannot capture exactly the information about the
retweet eect

Method / Regression methods
The retweet eect can be taken into account through three steps:
1 Remove the retweet count of the original tweet from the features
2 Modify the user engagement by substracting the retweet count of the
original tweet
3 Add the retweet count of the original tweet to the result returned by
the learned model
Pointwise approachs: builds a regression model and tries to predict
the score of items
Use of Linear Regression and Random Forests
Lin / WrapLin : 0.806 / 0.843
RF / WrapRF : 0.823 / 0.858

Experiments / Results
Table : NDCG@10 on test dataset
Model NDCG@10
Retweet 0.806
-MART 0.838
RF / WrapRF 0.823 / 0.858
Lin / WrapLin 0.806 / 0.843
Mean(-MART, Retweet) 0.876
Mean(-MART, WrapRF, WrapLin) 0.874
OptAvg(-MART, WrapRF, WrapLin) 0.876
OptAvg(-MART, RF, Retweet) 0.878
The improvement through wrapping shows the importance of the
retweet eect
All best scores of NDCG are using the LTR method LambdaMART

Experiments / Relevant Features with Random Forests
0.0 0.1 0.2 0.3 0.4 0.5 0.6
retweeted_fav
retweeted_retweet
listed_count
followers_count
retweeted_status
lang_ar
rating
votes
release_date
created_at
0.0 0.1 0.2 0.3 0.4 0.5 0.6
followers_count
listed_count
favourites_count
lang_ar
rating
media
friends_count
user_mentions
time_tweet_scrap
created_at
0.0 0.1 0.2 0.3 0.4 0.5 0.6
rating
lang_ar
time_tweet_movierelease
release_date
media
user_mentions
budget
time_tweet_scrap
lang_fa
votes
All features: features of the original tweet contribute the most
Without retweet: user features are important ones
Without user features: the rating given to the movie by the user
contributes the most

User Engagement as Evaluation: a Ranking or a Regression Problem?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie User Engagement as Evaluation: a Ranking or a Regression Problem?

Ähnlich wie User Engagement as Evaluation: a Ranking or a Regression Problem? (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

User Engagement as Evaluation: a Ranking or a Regression Problem?