This is part 1 of the tutorial Xavier and Deepak gave at Recsys 2016 this year. You can find the second part http://www.slideshare.net/xamat/recsys-2016-tutorial-lessons-learned-from-building-reallife-recommender-systems
2. What is a recommender system ?
A recommender system recommends items to
users to optimize a utility composed of one or
more objectives
Almost every website is powered by a
recommender system
3. Web Recommender Problem
User i
with
user features xi
(demographics,
browse history,
geo-location,
search history,
topics of questions
answered,
Topics interested in, …)
visits
item j with item features xj
(keywords, content categories,
author, ...)
Algorithm selects
(i, j) : response yij
Interaction (Click, share, like, answer, ask, follow,..)/no-interaction
Which item should we select?
• The one with highest predicted utility
• The one most useful for improving
the utility prediction model
Exploit
Explore
4. Today
• We are going to talk about recommender systems at
5. Agenda
• Recommender Systems at LinkedIn (Deepak)
• Context & Overview
• End-to-end of recommender systems in practice:
• Examples --- Jobs Recommendation, LinkedIn Feed
• Lessons Learned
• Recommender Systems at Quora (Xavier)
• Context & Overview
• Lessons Learned
• Conclusion (Xavier)
6. 6
Our vision
Create economic opportunity for every member of
the global workforce
Our mission
Connect the world’s professionals to make them
more productive and successful
Our core value Members first!
9. Value proposition for Users (Members)
CONNECT
with your
professional world
STAY INFORMED
through professional
news and knowledge
GET HIRED
and build
your career
12. Recommendation Problems continued ….
• Customer experience
• Recruiter (source candidates for recruiters)
• Sales Solution (close deals with companies)
• Linkedin Learning (course recommendation)
• Recommend user segments in advertising
13. Recommendations: Delivery Mechanisms
• Pull Model: Serve most relevant when the user visits
• Desktop, mobile web, mobile app, tablet,..
• Push Model: Get in touch with user to deliver
recommendations {Email, Notifications}
• Higher relevance bar (do not spam and inundate the users)
• Right message, right user, right time, right frequency, right channel
Done through ML and optimization
15. User Characteristics
Profile Information
Title, seniority, skills,
education, endorsements,
presentations,…
Behavioral
Activities, search,..
Edge features (ego-
centric network)
Connection strength,
content affinities,..
Professional profile of record
17. User Intent
• Why are you here ?
• Hire, get hired, stay informed, grow network, nurture connections, sell,
market,..
• Explicit (e.g., visiting jobs homepage, search query),
• Implicit (needs to be inferred, e.g., based on activities)
18. How to Scale Recommendations?
• Formulate objectives to optimize
• Optimize via ML models
• incorporate both implicit and explicit signals about user and items
• Automate
19. Connecting long-term objectives to proxies that can be optimized by
machines/algorithms
Long-term objectives
(return visits to site, connections,
quality job applies,,..)
Short-term proxies (CTR,
connection prob, apply prob, …)
Large scale optimization
via ML, UI changes,..
Experiment
Learn
Innovate
20. Automation
Optimize proxies with short feedback loop via Machine Learning
!
!
Whom?
User!Profile,!User!Intent!
Item!Filtering,!
Understanding!
ContextWhat?
Interaction Data
INPUT SIGNALS
MACHINE LEARNING
RANK%Items%
Sort!by!Score!
Mul:;objec:ve!
Business!rule!
SCORE%Items%
P(Click),!P(Share)!
Similarity,…!
21. Under the Hood of a Typical Recommender System at
LinkedIn
21
23. Objective: Job Applications
Predict the probability that
a user i would apply for a job j
given …
• User features
• Profile: Industry, skills, job positions, companies, education
• Network: Connection patterns
• Item (i.e., job) features
• Source: Poster, company, location
• Content: Keywords, title, skills
• Data about users’ past interactions with diff types of items
• Items: Jobs, articles, updates, courses, comments
• Interactions: Apply, click, like, share, connect, follow
24. System Architecture
Front End
Service
Ranking
Service
Item
Index
User Feature
Stores User DB
Item DB
Offline Data Pipelines
Item Feature
Pipelines
User Feature
Pipelines
Data Stream
Processing
User Activity Data Streams
Live Index
Updater
ETL ETL
Online
Offline
Model Training
Pipelines
Offline Index
Builder
User
Photon-ML
Apache
Hadoop, Pig, Scalding,
Spark, …Search Index
Experimentation
platform
Ranking
Library
25. Feature Generation
• Types: User features, item features, activity features
• Processing methods: Streaming, offline
Streaming example: Skills required by a job
new job j
Job DB
Live Index
UpdaterItem
Index
Kafka Skill Extraction
Pipeline
Skill Extraction
Pipeline
Skill extractor
- ML model
- Predict p(job j requires skill s)
based on job description, …
- Skills are standardized
Distributed data/event delivery
and queueing system
Metadata
Data ETLed to Hadoop
26. Model Training
Raw User Features Raw Item Features
DAG of
Transformers
DAG of
Transformers
DAG of
Transformers
Feature Vector of User i
xi
Matching Feature Vector
mij
Feature Vector of Item j
zj
(trees, similarities)
Parameter
vector for
each user i
Parameter
vector for
each item j
p(i applies for j) = f( xi, zj, mij | θ, αi, βj )
Feature
Processing
Parameter Learning
Global
parameter
vector
28. Online Ranking
User Feature
Stores User DB
User Feature
Pipelines
Data Stream
Processing
User Activity Data Stream
Ranking
Service
Item
Index
Offline Data Pipelines
ETL ETL
Online
Offline
Model Training
Pipelines
Offline Index
Builder
Front End
Service
User
User
Features &
Parameters
Item DB
Live Index
Updater
Item Feature
Pipelines
29. Online A/B Experiments
Experiment setting
- Select a user segment
- Allocate traffic to different
models
Result reporting
- Report experimental results
- Impact on a large number of metrics
32. • Deliver on the Value Propositions:
• Stay connected with your Network (your network is your identity!)
• Ability to build your professional reputation
• Stay informed with relevant professional knowledge
• Discover opportunities
• Generate revenue (directly or indirectly)
Function of the Feed
33. • Heterogeneity of Types.
.
• Organic Content
• Articles by Influencers, Articles by
Network, Shares by network, Content
by topic (follows), Jobs, PYMK,group
discussions, etc
• Sponsored
• Sponsored updates, Jobs ads, ..
Challenges of the Feed
35. Impression Discounting
• Reduce the chance of showing
the same item to the same
user repeatedly
• Decay the score of an item
based on #times that the user
saw the item before
• Using real-time feedback
• Discounting by user segments
and item types
Global (over all types)
Impression discounting curves of a few item types
36. Diversification
• Users’ experience deteriorates when exposed to the same kind of items multiple
times on the same page
• Decay relevance scores of repeat items from
the same actor and of the same type
Discounting actor
repetitions
Group Discussion CTR Drop
2 adjacent discussions 21%
3 adjacent discussions 48%
37. How to Combine Different Objectives
• The feed system serves updates based on relevance scores
• Adjust the serving strategy to op?mize revenue while enforcing
engagement (e.g. CTR) constraints
For user x, item i
Rank by: eCPI(i|x) + SB * pCTR(i|x)
maximize revenue
such that engagement >= engagement target
– eCPI: es?mated revenue for a given update
– For organic updates, eCPI = 0
– SB: shadow bid (intrinsic valua?on of organic clicks to LinkedIn)
39. Encouraging Viral loops: Some heuristics
• Value of share, comment, like > Value of click
• Rank by using linear combination of CTR and Viral Action Rates
• Lose CTR but gain more viral actions (shares, likes, comments)
• Increasing viral actions increases unique user visits & feed sessions
• Viral action triggers notification to actors in many cases (e.g., like/comment on a
post written by your connection)
• Encourage users to share/comment/like more
• Boost article scores by users who share good stuff and who don’t share very often
• May lose some CTR in short-term but increase cohort that shares on LinkedIn
40. Update
Type 1
…
Update
Type N
Each type scores and orders its potential
updates
The Feed: A three stage Ranker
Mulitiple
Objective
The third stage adjusts for
diversity, impression discounting,
balance of objectives: engagement &
revenue
Blending Results
The second stage rank orders
every update using ML model
42. 1. Cost of a Bad Recommendation
• How ML works where a few bad recommendations can hurt brand ?
• Maximize precision without hurting performance metrics significantly
• Collect negative feedback from users, crowd; incorporate within algorithms
• Create better product focus, filter unnecessary content from inventory
• E.g., unprofessional content on Feed
• Better insights/explanations associated with recommendations help build trust
43. 2. Data Tracking
• Proper data tracking and monitoring is not always easy!
• Data literacy and understanding across organization (front-end, UI, SRE)
• Proper tooling, continuous monitoring very important to scale the process
• Our philosophy: Loose coupling between FE and BE teams!
• FE (client) emits limited events along with trackingid
• BE emits more details and joins against trackingid
• Tracking events can have significant impact
• View-port tracking (what the user actually saw) for more informative negatives
44. 3. Content Inventory
• High quality and comprehensive content inventory as important as
recommendation algorithms
• Examples: Learning, Jobs, Feed
• Supply and demand analysis, gap analysis, proactively producing more high
quality content for inventory
45. 4. A/B Testing with Network Interference
• Random treatment assignments (spillover effects, need to adjust)
• Treatment recommendations affect control group as well
• A like/share in treatment may create a new item when ranking in control
45