SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Sequential Decision Making in
Recommendations
Jaya Kawale
Toronto Machine Learning Conference,
Nov 2019
Quickly help members discover content they’ll love
Global Members, Personalized Tastes
158 Million Subscribers
~200 Countries
From what shows to recommend
Personalized images.
... to what images to select
... to reaching out to our members
Everything is a Recommendation!*
* Xavier Amatrian and Justin Basilico - https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429
Sequence of decisions
visit
click
play
visit
visit
play
thum
bs
up
time
m
ylistadd
Optimize satisfaction
Satisfaction
visit
click
play
visit
visit
play
time
Optimize satisfaction
Satisfaction
visit
click
play
visit
visit
play
time
How should
recommendation
algorithms take this
into account ?
Sequential Decision
Making
Recommendation
Observations
Rewards
Clinical
Trials
Network
Routing
Online
Advertising
AI for
Games
Hyperparameter
Optimization
Recommendation as a Sequential Decision Making
Problem
Learner Environment
Action
Reward
Context
Why is it challenging ?
… Because we don’t know!
● The current environment or state: We may not have full knowledge of the
state we are in.
● Reward: Taking an action at some state results in some reward not known
beforehand.
● The transition dynamics: No knowledge of how the environment or state
changes due to a particular action.
Multi-Armed Bandits
Multi-Armed Bandits
● A gambler playing multiple slot machines with
unknown reward distribution
● Which machine to play to maximize reward?
Multi-Armed Bandit For Recommendation
Exploration-Exploitation tradeoff :
Recommend the optimal title given the evidence i.e. exploit
OR
Recommend other titles to gather feedback i.e. explore.
Numerous Variants
● Different Strategies: ε-Greedy, Thompson Sampling (TS), Upper Confidence
Bound (UCB), etc.
● Different Environments:
○ Stochastic and stationary: Reward is generated i.i.d. from a distribution
specific to the action. No payoff drift.
○ Adversarial: No assumptions on how rewards are generated.
● Different objectives: Cumulative regret, tracking the best expert
● Continuous or discrete set of actions, finite vs infinite
● Extensions: Varying set of arms, Contextual Bandits, etc.
Epsilon Greedy for MABs
● Unbiased
training data
● Greedy
● Select optimal
action
Explore
ε 1-ε
Exploit
Greedy Exploit Policy
Member
Features
Candidate Pool
Model 1
Winner
Probability Of Play
Model 2
Model 3
Model 4
Considerations for the greedy policy
● Explore
○ Bandwidth allocation and cost of exploration
○ New vs existing titles
● Exploit
○ Title availability
○ Frequency of model update
○ Incremental updates vs batch training
■ Non-stationarity of title popularities
?
?
?
? ??
?
Opportunity Cost
Netflix homepage is an expensive real-estate:
- so many titles to promote
- so few opportunities to win a “moment of truth”
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶
Probability of
Play
Days
Lots of feedback loops...
p(Y|X, do(R))
Recommendation
Build policy, e.g. what R leads to max Y?
Some approaches
● Bandit approaches (with caveats)
● Counterfactual Risk Minimization [Swaminathan & Joachims, 2015]
● IPS Estimator for MF [Schnabel et al., 2016]
○ Train a debiasing model and reweight the data
● Causal Embeddings [Bonner & Vasile, 2018]
○ Jointly learn debiasing model and task model
○ Regularize the two towards each other
● Doubly-Robust MF [Wang et al., 2019]
Reinforcement
Learning
Long-term Reward: Road to RL
● Maximize user long term satisfaction rather than play clicks or
duration.
Some Preliminaries
Everything
we know
about the
user
Action
Changing
User
preferences
Reward
Some
starting
point / user
state
Discount
Factor
Policy Gradients
● Learn a policy that maximizes the cumulative
future reward from time t.
● Maximization solved by gradient w.r.t. some policy
parameter.
● E.g. Reinforce*
*Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, Ed H. Chi: Top-K Off-Policy
Correction for a REINFORCE Recommender System. WSDM 2019: 456-464
Deep Q-Learning
● Q-value: Optimal value for a state action pair.
● Off-policy algorithm
● Directly learn the function to approximate the
Q-value.
● Challenges on training and making it work in
practice.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin A.
Riedmiller: Playing Atari with Deep Reinforcement Learning CoRR abs/1312.5602 (2013)
Many more challenges...
● High-dimensional action space: Recommending a single item is
O(|C|); typically want to do ranking or page construction, which is
combinatorial, e.g. Marginal Slates [Dimakopoulou et al., 2019] or
SlateQ [le et al., 2019]
● Off-policy correction: Need to learn & evaluate from existing system
actions, e.g. [Chen et al., 2019] or ReCap [More et al., 2019]
● Good simulators: Requires knowing feedback for user on
recommended items, e.g. [Rohde et al., 2018]
● Changing rewards: Every action may change our ‘ground truth’
● Changing action space: New actions (items) become available and
need to be cold-started.
Thank you.
Jaya Kawale (jkawale@netflix.com)
Twitter: @ jayakawale

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
 
Data council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 

Ähnlich wie Sequential Decision Making in Recommendations

Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
SeriousGamesAssoc
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Xavier Amatriain
 

Ähnlich wie Sequential Decision Making in Recommendations (20)

Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Sprezzatura - Roelof van Zwol - May 2018
Sprezzatura  - Roelof van Zwol - May 2018Sprezzatura  - Roelof van Zwol - May 2018
Sprezzatura - Roelof van Zwol - May 2018
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Recommender systems
Recommender systems Recommender systems
Recommender systems
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
 
Video Recommendation Engines as a Service
Video Recommendation Engines as a ServiceVideo Recommendation Engines as a Service
Video Recommendation Engines as a Service
 
Building a deep learning ai.pptx
Building a deep learning ai.pptxBuilding a deep learning ai.pptx
Building a deep learning ai.pptx
 
Using Gamification to Incentivize Sites
Using Gamification to Incentivize SitesUsing Gamification to Incentivize Sites
Using Gamification to Incentivize Sites
 
Introduction to competitive machine learning
Introduction to competitive machine learningIntroduction to competitive machine learning
Introduction to competitive machine learning
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
What is Reinforcement Learning in Machine Learning
What is  Reinforcement Learning in Machine LearningWhat is  Reinforcement Learning in Machine Learning
What is Reinforcement Learning in Machine Learning
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 

Kürzlich hochgeladen

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Kürzlich hochgeladen (20)

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 

Sequential Decision Making in Recommendations