Recent Trends in Personalization at Netflix

Recent Trends in
Personalization
at Netflix
Anuj Shah
@badshah79
https://www.linkedin.com/in/foranuj/

Help members find entertainment
to watch and enjoy to maximize
member satisfaction and retention

Ordering of videos is personalized
From how we rank
Ranking

Selection and placement of rows is personalized
... to how we construct a page
Rows

... to how we respond to queries
Search query & result recommendation

... to how we cover different needs
Personalized instant choices

... to how we reach out
Message personalization

Everything is a recommendation!

○ Every person is unique with a variety of interests
… and sometimes multiple people use the same profile
○ Help people find what they want when they’re not sure what they want
○ Non-stationary, context-dependent, mood-dependent, ...
○ Large datasets but small data per member
… and potentially biased by the output of your system
○ Cold-start problems on all sides
○ More than just accuracy: diversity, novelty, freshness, fairness, ...
○ ...
No, personalization is hard!

So how are we going to solve this?

Some recent avenues in approaching these challenges:
1. Deep Learning
2. Causality
3. Bandits & Reinforcement Learning
4. Objectives
Trending Now

Trend 1: Deep Learning for
Recommendations

~2012 ~2017
Deep Learning
becomes popular in
Machine Learning
Deep Learning
becomes popular in
Recommender Systems
What took so long?
~2019
Traditional methods do
as well or better than
Deep Learning for
Recommender Systems
… Wait, what?
Timeline

Traditional Recommendations
Collaborative Filtering:
Recommend items that
similar users have chosen
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items

U
≈
R
V
A Matrix Factorization view
2

U
A Feed-Forward Network view
V
2

U
A (deeper) feed-forward view
V
Mean
squared loss
?

… isn’t always the best
U
V
Mean squared
loss
?
Also see [Dacrema et al., 2019], [Rendle et. al,
2019], [Rendle et. al, 2021]. Make sure you
tune your baselines.

Understanding the relationships
From our forthcoming AI Magazine article “Deep Learning for Recommender Systems: A Netflix Case-Study”

X
R
EASE:
Embarrassingly Shallow Auto-Encoders [Steck, 2019]
● Super efficient model to train in a
collaborative filtering setting inspired
by SLIM
● Learn item-by-item matrix X such that
R.X is close to R and diag(X) is 0
○ Avoids trivial solution of identity
● Closed-form solution
● More on that: auto-encoders that
don’t overfit towards identity
≈ R
0
0

Modern Recommender Systems
0 1 0 1 0
0 0 1 1 0
1 0 0 1 1
0 1 0 0 0
0 0 0 0 1
Users
Items

Contextual event data
Modern Recommender Data
Model
Interactions
Impressions
Item data
Profile
settings
+

Contextual sequence data
2017-12-10 15:40:22
2017-12-23 19:32:10
2017-12-24 12:05:53
2017-12-27 22:40:22
2017-12-29 19:39:36
2017-12-30 20:42:13
Context Item
Sequence
per member
?
Time

V
Sequential Recommendation Network
Softmax
over items
Avg / Stack /
Sequence /
Attention
DNN / RNN / CNN / TNN
Input
interactions
(X)
(X)
p(Y|X)
2018-12-23
19:32:10
2018-12-24
12:05:53
2019-01-02
15:40:22

From Correlation to Causation
● Most recommendation algorithms
are correlational
○ Some early recommendation
algorithms literally computed
correlations between users and items
● Did you watch a movie because
we recommended it to you? Or
because you liked it? Or both?
● If you had to watch a movie, would
you like it? [Wang et al., 2020] p(Y|X) → p(Y|X, do(R))
(from http://www.tylervigen.com/spurious-correlations)

Feedback loops
Impression bias
inflates plays
Leads to inflated
item popularity
More plays
More
impressions
Oscillations in
distribution of genre
recommendations
Feedback loops can cause biases to be
reinforced by the recommendation system!
[Chaney et al., 2018]: simulations showing that this can reduce the
usefulness of the system
They’re real:

Closed Loop
Training
Data
Watches Model
Recs
Search
Training
Data
Watches Model
Recs
Open Loop

Closed Loop
Training
Data
Watches Model
Recs
Danger Zone
Search
Training
Data
Watches Model
Recs
Open Loop

V
Propensity Correction
Avg / Stack /
Sequence /
Attention
DNN / RNN / CNN / TNN
Input
interactions
(X)
(X)
2018-12-23
19:32:10
2018-12-24
12:05:53
2019-01-02
15:40:22
Policy
softmax
Propensity
softmax
E.g. [Chen et al., 2019]
p(Y|X, do(R))
p(R|X)

Challenges in Causal Recommendations
● Handling unobserved confounders
● Coming up with the right causal graph
● High variance (especially propensity-based ones)
● Computational challenges (e.g. [Wong, 2020])
● Off-policy evaluation
● When and how to introduce exploration

Trend 3: Bandits &
Reinforcement Learning in
Recommendations

Why contextual bandits for recommendations?
● Break feedback loops
● Want to explore to learn
● Uncertainty around member interests and new items
● Sparse and indirect feedback
● Changing trends
▶
Early news example: [Li et al., 2010]

Example:
What to show first?
?
...

Recommendation as
Contextual Bandit
● Environment: Netflix homepage
● Context: Member
● Arm: Display video at top of page
● Policy: Selects a video to recommend
● Reward: Member plays and enjoys video
Video Selector
▶
?

Winner
Bandit
Features
Model 1
Model 2
Model 3
Model 4
Member
(context)
Video
(arm)
Probability of
enjoyment
(Predicted reward)

Causality & Bandits [Dimakopoulou et al., 2021]
● Data collected from bandits is not IID
○ Bandits collect data adaptively
○ Initial noise may mean choosing an arm less
often, which can keep its sample mean low
● Inverse Propensity Weighting? High variance
○ Take inspiration from Doubly Robust
estimators
● Doubly Adaptive Thompson Sampling (DATS)
○ Thompson Sampling using the distribution of
the Adaptive Doubly Robust estimator in
place of the posterior
○ DATS performs better in practice and
matches TS regret bound

● Designing good exploration is an art
○ Especially to support future algorithm innovation
○ Challenging to do member-level A/B tests comparing
fully on-policy bandits at high scale
● Bandits over large action spaces: rankings and slates
● Layers of bandits that influence each other
● Handling delayed rewards
Challenges with bandits in the real world

Going Long-Term
● Want to maximize long-term member joy
● Involves many member visits, recommendation actions and delayed reward
● … sounds like Reinforcement Learning

Within a page
RL to optimize a
ranking or slate
How long?
Within a session
RL to optimize
multiple interactions
in a session
Across sessions
RL to optimize
interactions across
multiple sessions

Building simulators for evaluating recommenders
Page-level
Whole system (Accordion)
[McInerney et al., 2021]
Ranking

● Embeddings for actions: List-wise [Zhao et al., 2017] or Page-wise
recommendation [Zhao et al. 2018] based on [Dulac-Arnold et al., 2016]
● Adversarial model for user simulator: GAN-like model [Chen et al., 2019]
● Policy Gradient: Candidate generator using REINFORCE and TPRO [Chen
et al., 2019],
● Multi-task: Additional model head or Actor-Critic [Xin et al., 2020],
Auxiliary tasks for REINFORCE [Chen et al., 2021]
● Handling Diversity [Hansen et al., 2021], Slates [Ie et al., 2020], &
Multiple Recommenders [Zhao et. al, 2020]
● ...
Many potential directions

● We want to optimize long-term member joy
● While accounting for:
○ Avoiding “trust busters”
○ Coldstarting
○ Fairness
○ Findability
○ ...
What is your recommender trying to optimize?

Layers of Metrics
Training
Objective
Offline Metric Online Metric Goal

Layers of Metrics
RMSE
NDCG on
historical data
Member
Engagement in
A/B test
Joy
Example case: Misaligned Metrics
Training
Objective
Offline Metric Online Metric Goal

Our recommendations can only be as good as the
metrics we measure it on

Recap [More et al., 2019]
● Bandit replay-style metrics can
have high variance due to low
number of matches with large
action spaces
● Use a ranking approach: Good to
rank high reward arms near top, low
reward arms near bottom

● Nuanced metrics:
○ Differences between what you want and what you can encapsulate in a
metric
○ Where does enjoyment come from? How does that vary by person?
○ How do you measure that at scale?
● What about effects beyond typical A/B time horizon?
● Incorporating fairness
○ Calibration to distribution of user tastes [Steck, 2018]
○ Item cold-start [Zhu et. al, 2021]
● Beyond algorithms: Ensuring a positive impact on society
Challenges in objectives

1. Deep Learning
2. Causality
3. Bandits & Reinforcement Learning
4. Objectives
A few recent trends in personalization

Sound interesting?Join us
research.netflix.com/jobs

Thank you
Anuj Shah
@badshah79
https://www.linkedin.com/in/foranuj/

Recent Trends in Personalization at Netflix

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Recent Trends in Personalization at Netflix

Ähnlich wie Recent Trends in Personalization at Netflix (20)

Mehr von Förderverein Technische Fakultät

Mehr von Förderverein Technische Fakultät (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Recent Trends in Personalization at Netflix