Recommender systems have become very important for many online activities, such as watching movies, shopping for products, and connecting with friends on social networks. User behavioral analysis and user feedback (both explicit and implicit) modeling are crucial for the improvement of any online recommender system. Widely adopted recommender systems at LinkedIn such as “People You May Know” and “Suggested Skills Endorsement” are evolving by analyzing user behaviors on impressed recommendation items.
In this paper, we address modeling impression discounting of
recommended items, that is, how to model users’ no-action feedback on impressed recommended items. The main contributions of this paper include (1) large-scale analysis of impression data from LinkedIn and KDD Cup; (2) novel anti-noise regression techniques, and its application to learn four different impression discounting functions including linear decay, inverse decay, exponential decay, and quadratic decay; (3) applying these impression discounting functions to LinkedIn’s “People You May Know” and “Suggested Skills Endorsements” recommender systems.
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Modeling Impression discounting in large-scale recommender systems
1. Modeling Impression Discounting In
Large-scale Recommender Systems
KDD 2014, presented by Mitul Tiwari
1
Pei Lee, Laks V.S. Lakshmanan
University of British Columbia
Vancouver, BC, Canada
Mitul Tiwari, Sam Shah
LinkedIn
Mountain View, CA, USA
2015/3/12
5. What is Impression Discounting?
5
Examples
People You May Know
Skills Endorsement
Problem Definition
6. 5 days ago:
Recommended, but
not invited
Today:
If we recommend
again, would you
invite or not?
If you are not likely to invite, we better discount this recommendation.
1 day ago:
Recommended
again, but not invited
Example: People You May Know
7. Example: Skills Endorsement
If you are likely to endorse, we will show a skills
suggestion again;
Otherwise, we should leave this space for other skills
suggestions
Recommended Recommended Again
8. Problem Definition
Impressions: recommendations shown to user
Conversion: positive action - invite, endorse, etc.
In natural language:
We already impressed an item several times with no
conversion. How much should we discount that
item?
More formally:
For an user u and item i, given an impression history
(T1, T2 , …, Tn) between u and i, can we predict the
conversion rate for u on i?
10. Outline
10
Define an impression
User, Item, Conversion Rate
Features
How many times the user saw this item?
When is the last time the user saw this item? …
Data analysis: impression features and conversion
relationship
Impression discounting model fitting
Experimental evaluation
12. Features on Impressed Items
12
ImpCount (frequency):
the number of historical impressions before the current
impression, associated with the same (user, item)
LastSeen (recency):
the day difference between the last impression and the
current impression, associated with the same (user, item)
Position:
the offset of item in the recommendation list of user
UserFreq:
the interaction frequency of user in a recommender system
13. Data Analysis: PYMK Dataset
Data: 1.09 B impression tuples
Training dataset: 80% of data
0.55 billion unique impressions
0.87 billion total impressions
20 millions invitations
Testing dataset: 20% of data
0.14 billion unique impressions (3.7%)
0.22 billion total impressions (2.4%)
5.2 millions invitations
14. Skills Endorsement Dataset
14
Total dataset size: 190 million impression tuples
Training dataset: 80% of data
Testing dataset: 20% of data
15. Tencent SearchAds Dataset
15
Publicly available for KDD Cup 2012 by the
Tencent search engine
Total Size: 150 million impression sequences
CTR of the Ad at the 1st, 2nd and 3rd
position: 4.8%, 2.7%, and 1.4%
16. Data Analysis: Conversion Rate
Changes with Impression Count
16
If we show an item to a
user repeatedly, the
conversion rate
decreases
17. 17
The conversion rate
decreases with both
ImpCount and
LastSeen
PYMK Invitation Rate Changes with
Impression Count and Last Seen
18. 18
Similar observation:
The conversion rate
also decreases with
both ImpCount and
LastSeen
Endorsement Rate Changes with
Impression Count and Last Seen
19. Impression Discounting Model Fitting
20
Workflow:
Model fitting process:
Fundamental discounting functions
Aggregation Model
Offline and online
23. Aggregation Models
24
Linear Aggregation
Multiplicative Aggregation
f(Xi) is one of
discounting function
given earlier
24. Anti-Noise Regression Model
26
The sparsity of observations increases variance in the
conversion rate.
Typically happens when the number of observations are
relatively low.
25. Density Weighted Regression
27
Add a weight with square error
Original RMSE:
Density weighted RMSE:
vi is assessed by the number of observations in a small
neighborhood of (Xi, yi)
27. Offline Evaluation: Precision @k
29
Precision at top k
Precision improvement for different behavior sets:
More features yield
higher precision
28. Online Evaluation: Bucket Test on PYMK
30
Use behavior set (LastSeen, ImpCount)
On LinkedIn PYMK online system
29. Conclusion
31
Impression Discounting
Model
Learnt a discounting function
to capture ignored
impressions
Linear or multiplicative
aggregation model
Anti-noise regression model
Offline and Online
evaluation (Bucket testing)
Good afternoon! I am Mitul Tiwari. Today I am going to talk about Modeling Impression Discounting in Large-scale Recommender Systems. This is joint work with Pei Lee, Laks, and Sam Shah. Most of the work was done when Pei Lee was interning at LinkedIn last summer.
Let me start with giving some context. LinkedIn is the largest professional network with more than 300M members, and its growing very fast with more than 2 members joining LinkedIn per second.
Here are a couple of examples of large scale recommender systems at LinkedIn.
People You May Know: recommending other people to connect with. Daily we process 100s of tera-bytes of data and 100s of billions of edges to recommend connections for each of 300M members.
Suggested skills endorsements: is another example where we recommend a member and skill combination for endorsements.
There are many such large scale recommender systems at LinkedIn. For example, Jobs recommendations, news articles suggestions, companies to follow, groups to join, similar profile, related search query etc.
Let me describe what we mean by impression discounting using a couple of examples and formulate the problem.
Let me start with the case of People You May Know or PYMK. Let’s say 5 days ago a Deepak was recommended to me to invite and connect on LinkedIn, and I ignored that recommendation.
And Deepak is recommended to me again in PYMK 1 day ago. Now today PYMK systems needs to figure out whether to recommend Deepak again to me or not.
Basically, if I am not likely to invite Deepak to connect then better discount this recommendation from my PYMK recommendations.
Note that this is just an example since I am already connected to Deepak and I work very closely with him
Here is another example of skills endorsement suggestions. We have very small real-estate on the website to display endorsement suggestions. If we recommended certain skills in the past for some of your connections and you did not take any action on those suggestion. If you are not likely to endorse with those skills then it’s better if we suggest some other skills to endorse.
Let me define certain terms.
Impressions are recommendation that are seen by users.
Conversion is a positive action taken on a recommendation. For example, in case of PYMK, invitation to connect. Or in case of Skills Endorsements, endorsing a connection for a skill.
In Impression discounting problem, we have given a history of impressions or recommendations seen by users, and we would like to discount or remove the items from the recommendation set if it’s not likely to lead to conversion.
Our goal is to build an impression discounting framework that can be used as a feature in the recommendation engine model or a plugin that can be applied on top of the existing recommendation engine.
Our proposed impression discounting model only relies on the historical impression records, and can be treated as a plugin for existing recommendation engines. We use the dotted rectangle to circle the newly built recommender system with impression discounting, which produces discounted impressions with a higher overall conversion rate.
Here is the outline of the rest of my talk. Let me define impression more in detail in terms of user, item, conversion, and behavior features.
Then I will describe the exploratory data analysis work to find out the correlation between impression features and conversion rate
Then will describe the impression discounting model fitting
And finally will conclude with offline and online experimental evaluations
An impression can be seen as a tuple with multiple fields such as
User; (2) Item; (3) Conversion;
(4) Features such as how many times this item was shown to the user; when was the last time this item was shown to the user, etc
(5) Timestamp; (6) Recommendation score of the item to the use
We can derive various features from impressions such as:
(1) Impression count: the number of times the item was seen by the user; (2) Last seen: when was the last time the item was seen; (3) at what position the item was seen; (4) how frequently user
Here are some data analysis results that shows that conversation rate drops as the number of the same item shown to the user increase
The decrease in conversion rate is much more steep in case of endorsements
Also note that in case of PYMK the decrease is more gradual. That mean, a recommendation in PYMK still has chance of acceptance even after multiple impressions
Given this data and after doing correlation studies between various behavior features and conversion rate
We would like to learn discounting functions that fits this data
In the plot we show various discounting functions that we use to fit the relationship be tween conversion rate and the number of times an item was shown to a user.
These are various discounting functions that we use to fit the data
After fitting the best discounting function for each of the feature, we combined these functions using linear or multiplicative aggregation
Note that in certain cases we didn’t have a lot of data points. For example, a very few items were shown to users more than 40 times and there is a lot of variance because of small number of data points.
We developed an anti-noise regression technique to give lower weightage to such data points in the cost functions
We adopted this technique from density based clustering to give lower weight to certain observations in the cost function
Throw away information that you don’t need – analogy – denoising auto encoders in deep learning
Our proposed impression discounting model only relies on the historical impression records, and can be treated as a plugin for existing recommendation engines. We use the dotted rectangle to circle the newly built recommender system with impression discounting, which produces discounted impressions with a higher overall conversion rate.
Talked about modeling impression discounting in large scale recommender systems at LinkedIn such as People You May Know and Skills Endorsement Suggestions.