Recsys2016 Tutorial by Xavier and Deepak

Lessons Learned from
Building real-life Recsys
Xavier Amatriain (Quora)
Deepak Agarwal (LinkedIn)

What is a recommender system ?
A recommender system recommends items to
users to optimize a utility composed of one or
more objectives
Almost every website is powered by a
recommender system

Web Recommender Problem
User i
with
user features xi
(demographics,
browse history,
geo-location,
search history,
topics of questions
answered,
Topics interested in, …)
visits
item j with item features xj
(keywords, content categories,
author, ...)
Algorithm selects
(i, j) : response yij
Interaction (Click, share, like, answer, ask, follow,..)/no-interaction
Which item should we select?
• The one with highest predicted utility
• The one most useful for improving
the utility prediction model
Exploit
Explore

Today
•  We are going to talk about recommender systems at

Agenda
•  Recommender Systems at LinkedIn (Deepak)
• Context & Overview
• End-to-end of recommender systems in practice:
• Examples --- Jobs Recommendation, LinkedIn Feed
• Lessons Learned
•  Recommender Systems at Quora (Xavier)
• Context & Overview
• Lessons Learned
•  Conclusion (Xavier)

6
Our vision
Create economic opportunity for every member of
the global workforce
Our mission
Connect the world’s professionals to make them
more productive and successful
Our core value Members first!

Companies Jobs SkillsPeople Schools Knowledge

Value proposition for Users (Members)
CONNECT
with your
professional world
STAY INFORMED
through professional
news and knowledge
GET HIRED
and build
your career

Value proposition for Customers
HIRE MARKET SELL @WORK

Several Recommendation Problems
• Member experience
•  LinkedIn Feed
•  PYMK (People you may Know)
• Job recommendation
• …..

Recommendation Problems continued ….
• Customer experience
• Recruiter (source candidates for recruiters)
• Sales Solution (close deals with companies)
• Linkedin Learning (course recommendation)
• Recommend user segments in advertising

Recommendations: Delivery Mechanisms
• Pull Model: Serve most relevant when the user visits
• Desktop, mobile web, mobile app, tablet,..
• Push Model: Get in touch with user to deliver
recommendations {Email, Notifications}
• Higher relevance bar (do not spam and inundate the users)
• Right message, right user, right time, right frequency, right channel
Done through ML and optimization

MATCH-MAKING: Know your items, your users and
their interactions

User Characteristics
Profile Information
Title, seniority, skills,
education, endorsements,
presentations,…
Behavioral
Activities, search,..
Edge features (ego-
centric network)
Connection strength,
content affinities,..
  Professional profile of record

Item Features
Articles
author, sharer,
keywords,
named entities,
topics, category,
likes, comments,
latent representation, etc.
Jobs
company, title,skills,
keywords, geo, …
.......

User Intent
• Why are you here ?
• Hire, get hired, stay informed, grow network, nurture connections, sell,
market,..
• Explicit (e.g., visiting jobs homepage, search query),
• Implicit (needs to be inferred, e.g., based on activities)

How to Scale Recommendations?
•  Formulate objectives to optimize
•  Optimize via ML models
• incorporate both implicit and explicit signals about user and items
•  Automate

Connecting long-term objectives to proxies that can be optimized by
machines/algorithms
Long-term objectives
(return visits to site, connections,
quality job applies,,..)
Short-term proxies (CTR,
connection prob, apply prob, …)
Large scale optimization
via ML, UI changes,..
Experiment
Learn
Innovate

Automation
Optimize proxies with short feedback loop via Machine Learning
!
!
Whom?
User!Proﬁle,!User!Intent!
Item!Filtering,!
Understanding!
ContextWhat?
Interaction Data
INPUT SIGNALS
MACHINE LEARNING
RANK%Items%
Sort!by!Score!
Mul:;objec:ve!
Business!rule!
SCORE%Items%
P(Click),!P(Share)!
Similarity,…!

Under the Hood of a Typical Recommender System at
LinkedIn
21

Example Application: Job Recommendation

Objective: Job Applications
Predict the probability that
a user i would apply for a job j
given …
•  User features
•  Profile: Industry, skills, job positions, companies, education
•  Network: Connection patterns
•  Item (i.e., job) features
•  Source: Poster, company, location
•  Content: Keywords, title, skills
•  Data about users’ past interactions with diff types of items
•  Items: Jobs, articles, updates, courses, comments
•  Interactions: Apply, click, like, share, connect, follow

System Architecture
Front End
Service
Ranking
Service
Item
Index
User Feature
Stores User DB
Item DB
Offline Data Pipelines
Item Feature
Pipelines
User Feature
Pipelines
Data Stream
Processing
User Activity Data Streams
Live Index
Updater
ETL ETL
Online
Offline
Model Training
Pipelines
Offline Index
Builder
User
Photon-ML
Apache
Hadoop, Pig, Scalding,
Spark, …Search Index
Experimentation
platform
Ranking
Library

Feature Generation
•  Types: User features, item features, activity features
•  Processing methods: Streaming, offline
Streaming example: Skills required by a job
new job j
Job DB
Live Index
UpdaterItem
Index
Kafka Skill Extraction
Pipeline
Skill Extraction
Pipeline
Skill extractor
- ML model
- Predict p(job j requires skill s)
based on job description, …
- Skills are standardized
Distributed data/event delivery
and queueing system
Metadata
Data ETLed to Hadoop

Model Training
Raw User Features Raw Item Features
DAG of
Transformers
DAG of
Transformers
DAG of
Transformers
Feature Vector of User i
xi
Matching Feature Vector
mij
Feature Vector of Item j
zj
(trees, similarities)
Parameter
vector for
each user i
Parameter
vector for
each item j
p(i applies for j) = f( xi, zj, mij | θ, αi, βj )
Feature
Processing
Parameter Learning
Global
parameter
vector

Model Deployment
User
Feature
Stores
Live Index
UpdaterItem
Index
Parameter
vector for
each user i
Parameter
vector for
each item j
p(i applies for j) = f( xi, zj, mij | θ, αi, βj )
Global
parameter
vector

Online Ranking
User Feature
Stores User DB
User Feature
Pipelines
Data Stream
Processing
User Activity Data Stream
Ranking
Service
Item
Index
Offline Data Pipelines
ETL ETL
Online
Offline
Model Training
Pipelines
Offline Index
Builder
Front End
Service
User
User
Features &
Parameters
Item DB
Live Index
Updater
Item Feature
Pipelines

Online A/B Experiments
Experiment setting
- Select a user segment
- Allocate traffic to different
models
Result reporting
- Report experimental results
- Impact on a large number of metrics

•  Deliver on the Value Propositions:
•  Stay connected with your Network (your network is your identity!)
•  Ability to build your professional reputation
•  Stay informed with relevant professional knowledge
•  Discover opportunities
•  Generate revenue (directly or indirectly)
Function of the Feed

•  Heterogeneity of Types.
.
•  Organic Content
•  Articles by Influencers, Articles by
Network, Shares by network, Content
by topic (follows), Jobs, PYMK,group
discussions, etc
•  Sponsored
•  Sponsored updates, Jobs ads, ..
Challenges of the Feed

The Feed: Not all types are equal
34
Action rates per type (Normalized)

Impression Discounting
•  Reduce the chance of showing
the same item to the same
user repeatedly
•  Decay the score of an item
based on #times that the user
saw the item before
•  Using real-time feedback
•  Discounting by user segments
and item types
Global (over all types)
Impression discounting curves of a few item types

Diversification
•  Users’ experience deteriorates when exposed to the same kind of items multiple
times on the same page
•  Decay relevance scores of repeat items from
the same actor and of the same type
Discounting actor
repetitions
Group Discussion CTR Drop
2 adjacent discussions 21%
3 adjacent discussions 48%

How to Combine Different Objectives
•  The feed system serves updates based on relevance scores
•  Adjust the serving strategy to op?mize revenue while enforcing
engagement (e.g. CTR) constraints

For user x, item i
Rank by: eCPI(i|x) + SB * pCTR(i|x)
maximize revenue
such that engagement >= engagement target
–  eCPI: es?mated revenue for a given update
–  For organic updates, eCPI = 0
–  SB: shadow bid (intrinsic valua?on of organic clicks to LinkedIn)

Tradeoﬀs Points and Eﬃcient Fron?er
Revenuegain(relative)
Engagement gain (relative)
0
Conservative (high SB)
Aggressive (low SB)
Original System (no Optim)
- +
Better efficient frontier
More aggressive
(very low SB)

Encouraging Viral loops: Some heuristics
•  Value of share, comment, like > Value of click
• Rank by using linear combination of CTR and Viral Action Rates
• Lose CTR but gain more viral actions (shares, likes, comments)
• Increasing viral actions increases unique user visits & feed sessions
• Viral action triggers notification to actors in many cases (e.g., like/comment on a
post written by your connection)
• Encourage users to share/comment/like more
• Boost article scores by users who share good stuff and who don’t share very often
• May lose some CTR in short-term but increase cohort that shares on LinkedIn

Update
Type 1
…
Update
Type N
Each type scores and orders its potential
updates
The Feed: A three stage Ranker
Mulitiple
Objective
The third stage adjusts for
diversity, impression discounting,
balance of objectives: engagement &
revenue
Blending Results
The second stage rank orders
every update using ML model

1. Cost of a Bad Recommendation
•  How ML works where a few bad recommendations can hurt brand ?
• Maximize precision without hurting performance metrics significantly
• Collect negative feedback from users, crowd; incorporate within algorithms
• Create better product focus, filter unnecessary content from inventory
• E.g., unprofessional content on Feed
• Better insights/explanations associated with recommendations help build trust

2. Data Tracking
• Proper data tracking and monitoring is not always easy!
• Data literacy and understanding across organization (front-end, UI, SRE)
• Proper tooling, continuous monitoring very important to scale the process
• Our philosophy: Loose coupling between FE and BE teams!
• FE (client) emits limited events along with trackingid
• BE emits more details and joins against trackingid
• Tracking events can have significant impact
• View-port tracking (what the user actually saw) for more informative negatives

3. Content Inventory
•  High quality and comprehensive content inventory as important as
recommendation algorithms
• Examples: Learning, Jobs, Feed
• Supply and demand analysis, gap analysis, proactively producing more high
quality content for inventory

4. A/B Testing with Network Interference
• Random treatment assignments (spillover effects, need to adjust)
• Treatment recommendations affect control group as well
• A like/share in treatment may create a new item when ranking in control
45

Recsys2016 Tutorial by Xavier and Deepak

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Recsys2016 Tutorial by Xavier and Deepak

Ähnlich wie Recsys2016 Tutorial by Xavier and Deepak (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Recsys2016 Tutorial by Xavier and Deepak