Balancing Discovery and Continuation in Recommendations

Hossein Taghavi
With: Ashok Chandrashekar, Linas Baltrunas, and Justin Basilico
Balancing Discovery and Continuation
in Recommendations
RecSysTV 2016

Outline
§ Background: Netflix recommendations
§ Recommending for different modes of watching
§ Case study: Continue Watching row
§ Conclusions
2

Evolution of Netflix
2006 2016

Netflix Scale
§ > 83M members
§ > 190 countries
§ > 1000 device types
§ > 3.7B hours of content
streamed every month
§ 36% of peak US
downstream traffic
4

§ Recommendations through
predicted star rating
§ Contest:
§ Accuracy measured by root
mean squared error (RMSE)
§ Improve by 10% = $1 million!
§ Data size:
§ 100M ratings (back then
“almost massive”)
5

Turn on Netflix, and the
absolute best contents for you
would automatically start playing
Recommendation System: Ideal State
6

Create a page of recommendations
where the titles you are
most likely to watch and enjoy are
shown on the most visible parts of
the page
Meanwhile…
7

Title Ranking
Everything is a RecommendationRowSelection&Ordering
Recommendations are
driven by machine
learning algorithms
Over 80% of what
members watch comes
from our
recommendations
8

How the Homepage is Built
§ The titles are organized as rows
§ Ordering of titles within rows depends on the row type
§ Selection and ordering of rows:
§ Personalized page generation
algorithm
§ Also some business rules and
constraints
§ Balance thematic coherence,
relevance, and diversity
9

Various Types of Member Interactions/Feedback
§ Plays
§ How long, pause, rewind, skip, etc.
§ Rating and social
§ Rate, like, share
§ Context
§ Time, location, device, language
§ Interactions
§ Scrolling, opening a title page,
search, list add 10

Building the Recommendations is Data Driven
§ Try an idea offline using historical
data to see if it would have made
better recommendations
§ Offline metrics: AUC, nDCG, Recall, …
§ If it did, deploy a live A/B test to see
if it performs well in Production
§ Primary metric: Member retention
Idea /
Problem
Data
Algorithm
Model
Metrics
A/B
Testing
11

For More Reading
§ Netflix tech blog:
§ bit.ly/beyondfivestars
§ bit.ly/learnapage
§ bit.ly/sparktimetravel
12

Building recommendation algorithms that are
balanced for different modes of watching
13

The same you watched last time!
What Is the Most Likely Title You Will Watch?
§ A large portion of watching hours are spent in continue
watching mode
14

Different Modes of Watching
§ Continuation: Resume a
recently-watched TV/Movie
§ List: Play a title previously
added to My List
§ Rewatch: Rewatch a title
enjoyed in the past
§ Discovery: Discover a new
title to watch
15

Recommending for Different Modes:
Approach 1
§ Build one unified model for ranking the titles in each row
and one for ranking rows
§ Optimized for the likelihood of play/enjoyment from the page
§ Benefits:
§ Fewer models to maintain
§ Fewer A/B tests
16

Approach 1: Challenges
§ Members behave differently in different modes
§ Different row types are designed for different behaviors
§ Hard to capture and balance all that in one objective
§ E.g. simply ranking titles by likelihood of play will fill the page with
already-watched titles è Poor member experience
§ Recommendations for different modes have different
sensitivities to member actions
§ Continuation recs may react immediately to watching activities,
My List recs may react to My List add/remove activities, etc.
17

Approach 2: Dedicated Models + Blend
§ Build separate models for the each mode
§ Blend the results on the page
§ Blending can be done through a model trained offline, or a
parameter tuned online
§ E.g., one or more dedicated rows for each mode
§ Pro:
§ More modular, provides more intuitive knobs for balancing
§ Con:
§ Less elegant, more maintenance 18

Case Study: Continue Watching Row
19

Continue Watching Row: The Past
§ CW row was shown on some devices
§ Videos sorted by recency of last watch
§ Row appearance on page by business rules
§ On the website, only a single CW title
§ A very significant fraction of plays are continuations
§ CW deserved a better treatment
20

Objective
§ Unify the CW row across devices
§ Optimize the row in two dimensions:
§ Row position on page
§ Place it higher when the member is more
likely to resume a video
§ Re-order the titles within the CW row
§ By their likelihood to be resumed in the
current session
21

Some Intuitive Patterns
§ Member may be more likely to want to
§ Resume a video if:
§ In the middle of binging a TV show
§ Partially watched a movie recently
§ Often watched it around this time of the day, location, or on the current
device
§ Discover a new title if:
§ Just finished a movie or completed all episodes of a show
§ Hasn’t watched anything recently
§ Is a relatively new member
22

Building a Recommendation Model for CW
§ Feature Brainstorm
§ Training Data
§ Models and Metrics
§ Implementation
23

Feature Ideas
§ Member-level:
§ Member’s subscription: tenure, country, language
§ How active has the member been recently
§ Member past ratings, genre preferences, etc.
24

Feature Ideas
§ Video and member’s previous interactions with it:
§ How recently was the video added to the catalog, watched, ...
§ How much of the movie/show watched
§ Video metadata:
§ Type and genre of video, # episodes
§ E.g., kids titles may be re-watched more
§ What else is on the catalog
§ Popularity and relevance of the video
§ How often do members resume this video
25

Feature Ideas
§ Contextual:
§ Time of the day and day of the week
§ Location at various resolutions
§ Device
26

Title Ranking Model
§ Training data
§ Continuation sessions
§ Look at which of the recently-watched titles were played?
§ Model
§ Learn-to-rank: Linear/ensembles/…
§ Optimize for how well we rank the played title among other titles
27

Title Ranking Model: Performance
§ Baseline: Ranking by recency of
last play
§ Recency rank was also an
important feature in the model
§ Metrics significantly higher than
the baseline
§ E.g. Significant lift in precision
§ A/B testing also showed
improvements
28

Row Placement Model
§ Objective
§ Estimate the likelihood of continuation vs. discovery
§ Map that likelihood to a position on the page
§ Simplification:
§ Fix two candidate positions on the page and apply a threshold
§ Tune the threshold to optimize some accuracy metric
29

Row Placement Model: Training
§ Training data
§ Randomly select sessions with plays globally
§ Model
§ Binary classification of continuation vs. discovery sessions
§ Evaluated using classification and ranking metrics
30

Row Placement Model: Performance
§ Metrics
§ Achieved high classification metrics for predicting continuation vs
discovery
§ Error types:
§ False positives è CW occupies top of the page unnecessarily
§ False negative è Difficult for member to find the CW title
§ Placing the row
§ Threshold trades off FP and FN è Hard to tune offline
§ Tuned the threshold by A/B testing
31

Reusing the Title Ranking Model
§ Use the title-level scores
§ Calibrate scores to get probability Pt of continuation for each CW
title t
§ Aggregate into an overall probability of continuation
§ E.g., assuming independence:
PCW = 1 - ∏tϵCW (1- Pt)
§ Pro: Avoids maintaining two separate models
§ Con: Not as accurate as a dedicated model
32

Context Awareness
§ Title ranks highest on the same time of day and device
as last play
§ Experiment:
§ Played “Sid the Science Kid” on iPhone
§ Played “Narcos” on the website
è Different ranking on iPhone and Web
33

Serving the CW Row in Production
§ Score cannot be precomputed è Real- or near real-time
§ Some features are context dependent
§ Row should refresh each time a member watches a title
§ Need to push updates to clients to keep the row fresh
§ Latency bottleneck: Data transfers from the cache to
computation backend
§ Requires careful backend engineering
§ Fallback strategy: If computation fails, can use recency ranking
34

Conclusions and Future Directions
35

Conclusions
§ Important to understand different modes of behavior
§ Continuation is a key driver of streaming hours
§ Improving CW recommendations improves member experience
§ A/B testing showed significant boost in user engagement
§ Future:
§ Incorporate the placement of CW row (and others) into the main
page construction model
§ When can we automatically start resuming a title? 36

Questions?
Upcoming blog post on this topic at: techblog.netflix.com
Job openings: jobs.netflix.com
37

Balancing Discovery and Continuation in Recommendations

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Balancing Discovery and Continuation in Recommendations

Ähnlich wie Balancing Discovery and Continuation in Recommendations (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Balancing Discovery and Continuation in Recommendations