SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
A/B Testing at Netflix:
Experimentation Platform
Steve Urban
experimentation@netflix.com
• Technology is just one part of the equation: a culture of
experimentation is the other essential part
• All product ideas are subjected to the scientific method, with
actual data supporting changes before changes are rolled out
to all users
• The effectiveness of any idea is measured without bias - the
seniority of the person proposing the idea is irrelevant
Importance of A/B Testing at Netflix
A/B testing enables product decisions throughout Netflix, with
our users spread across all departments
• Data Scientists: Does this new ranking algorithm result in more plays?
• Product Managers: Does this new UI reduce the time for users to find content?
• Marketing: Which email campaign resulted in more new subscribers?
• Content: Which thumbnail image resulted in more streams of Daredevil?
• Engineers: Is the new implementation of this streaming algorithm more
performant when internet connectivity is spotty?
• and so on...
Our Users
• Being an internal tool is not an excuse for poor UX
• Given the diverse expertise of our users workflows must be
simple and effective while providing value
• Cover all generic test management scenarios
• Easily accommodate unique experimentation needs as they
come up
• Ingest and combine real-time behavioral and batch metadata
from numerous sources
A/B Testing Platform Objectives
We’re looking for a Full-Stack
Engineer to help across the board:
• Collaborate with users across Netflix to
understand their UI needs
• Be part of a team of engineers and UX
experts
• Tech stack: Java, React, Node
• Data visualization experience is a plus
We’re Hiring
Netflix has a unique culture. Read about it here.
We need a Server-Side Engineer with
expertise designing distributed systems:
• Help design and rebuild our allocation
engine
• Experience processing large datasets -
including efficient incorporation of near
real-time data
• Expertise with various Big Data databases
• Machine learning experience is a plus
WAIT, THAT’S NOT ENOUGH
I WANT TO GO DEEPER
orA B
Which Version is Better?
Which set of recommendations is better?
orA B
Given that I Watched House of Cards...
Hard to Answer Without Disciplined
Experimentation
orA? B?
A/B Testing Process
Target Population
Hypothesis: Retention and/or engagement will improve with new recommendation algorithm.
Process: Randomly group users into different buckets. Other than the tests, all other factors are
constant.
Control Group:
Continue to experience
the current version (A)
Test Group B:
Experience version B
Test Group C:
Experience version C
A/B Testing Process Continued
Analyze & Compare Key Results
Algorithm A (Control)
Algorithm B
Algorithm C?
...
Viewing hours delta: N/A
N/A as this is what
we are measuring
other options against
Viewing hours delta: +2.3%
Statistically Significant: Yes
Viewing hours delta: -5.7%
Statistically Significant: Yes
2.3% better than the
control, and we’re
confident about it
Ouch! Don’t use this
algorithm.
Data Driven Results
orA B
Experimentation Service
Persist/Retrieve
Allocations
Experiment
Criteria
Define
Experiments
Sampling
Metadata
Allocations
Evaluate Eligibility
Ad Hoc
queries
R
E
S
T
A
P
I
* Allocate
Customers
* Retrieve
Allocations
Real-time Analysis
& MonitoringPersist
Metrics
Health Metrics
Visualize
Technology Stack
Other
Netflix
Services
Allocation & Stratification
All US Regions
● Randomly distribute and assign customers to a variant in
the experiment utilizing Stratified Sampling
● Start, Stop, and Track allocations in near real-time
Percentage of Users*:
North East 22%
South East 13%
South West 17%
... ...
*Numerical values are for illustrative purposes only and are totally made up
“Random sampling” with
enforcement of sample
proportions across regions
Percentage of Users
Segmentation
Target Population
● Divide a broad target population into subsets with similar properties
● Some tests are meant to measure impact on specific populations
● Must maintain scale and low latencies
Segmentation by specific
properties
Haven’t used a tablet to
access Netflix in n days
Used a game console to
access Netflix within last
n days
Smart TV users
Test Health
● All test experiences are not equal, but we must ensure this isn’t due to buggy implementations
● Issues can be device specific, so must monitor at device, test, and experience granularity
● The example below is super-simplified - we need to create visualizations which effectively convey
test health, internationally, across thousands of devices
Control Cell
Experience B No errors/fallbacks
Experience A Issue on TV UI detected
No errors/fallbacks
ABlaze UI: Test Lifecycle Management
Initial Planning: Test Configuration Screens
● Determine hypothesis
● Implement each test experience
Schedule Test: Scheduler View
● Define real-time rules & conditions
● Consider potential conflicts
Monitor Test: Dashboard and Alert Views
● Monitor test health over time
○ Real-time analysis and alerting on metrics and
allocations
● Pull test if bugs/issues present themselves
Hypothesis Evaluation: Comparison Views
● Interactive filtering, analysis, & visualization of
data
● Call success or failure of test
Implement or Re-Test
● Devise plan to roll winning experience
(if any) out to production
● Else, potentially revise hypothesis and
retest
Some Challenges
• Operate resiliently and at low latencies, despite:
• Customer allocations taking place in real-time
• Need for near real-time insights into test health over massive datasets
• Data that is distributed across multiple clusters
• Data processing:
• Joins across billions of rows of data from many sources can cause massive increase in
number of rows
• Efficient management of datasets to support interactive analysis, dashboards, etc.
• Rich and flexible filtering to support interactive analysis
• Extract forecasts and insights
• Oh, and make it as easy to use as possible for the users...

Weitere ähnliche Inhalte

Was ist angesagt?

A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix ScaleAish Fenton
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
 
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityElasticsearch
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B TestingAlex Alwan
 
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기Jin Young Kim
 
Azure API Management
Azure API ManagementAzure API Management
Azure API ManagementDaniel Toomey
 
Why everything is an A/B Test at Pinterest
Why everything is an A/B Test at PinterestWhy everything is an A/B Test at Pinterest
Why everything is an A/B Test at PinterestKrishna Gade
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101Ashish Dua
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerProduct School
 
API Management Part 1 - An Introduction to Azure API Management
API Management Part 1 - An Introduction to Azure API ManagementAPI Management Part 1 - An Introduction to Azure API Management
API Management Part 1 - An Introduction to Azure API ManagementBizTalk360
 
10 Guidelines for A/B Testing
10 Guidelines for A/B Testing10 Guidelines for A/B Testing
10 Guidelines for A/B TestingEmily Robinson
 
How to Product Manage a Marketplace Business by Uber PM
How to Product Manage a Marketplace Business by Uber PMHow to Product Manage a Marketplace Business by Uber PM
How to Product Manage a Marketplace Business by Uber PMProduct School
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern ApplicationsAmazon Web Services
 
Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxVishPothapu
 
Experimentation as a growth strategy at Booking.com
Experimentation as a growth strategy at Booking.comExperimentation as a growth strategy at Booking.com
Experimentation as a growth strategy at Booking.comwebwinkelvakdag
 
Cross browser testing with browser stack
Cross browser testing with browser stackCross browser testing with browser stack
Cross browser testing with browser stackDenys Poloka
 

Was ist angesagt? (20)

A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Machine Learning at Netflix Scale
Machine Learning at Netflix ScaleMachine Learning at Netflix Scale
Machine Learning at Netflix Scale
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
 
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
apidays Paris 2022 - The next five years of the API Economy, Paolo Malinverno...
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
 
Combining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified ObservabilityCombining Logs, Metrics, and Traces for Unified Observability
Combining Logs, Metrics, and Traces for Unified Observability
 
Practical Introduction to A/B Testing
Practical Introduction to A/B TestingPractical Introduction to A/B Testing
Practical Introduction to A/B Testing
 
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
네이버서치ABT: 신뢰할 수 있는 A/B 테스트 플랫폼 개발 및 정착기
 
Azure API Management
Azure API ManagementAzure API Management
Azure API Management
 
Why everything is an A/B Test at Pinterest
Why everything is an A/B Test at PinterestWhy everything is an A/B Test at Pinterest
Why everything is an A/B Test at Pinterest
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
A/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product ManagerA/B Testing with Yammer's Product Manager
A/B Testing with Yammer's Product Manager
 
API Management Part 1 - An Introduction to Azure API Management
API Management Part 1 - An Introduction to Azure API ManagementAPI Management Part 1 - An Introduction to Azure API Management
API Management Part 1 - An Introduction to Azure API Management
 
10 Guidelines for A/B Testing
10 Guidelines for A/B Testing10 Guidelines for A/B Testing
10 Guidelines for A/B Testing
 
How to Product Manage a Marketplace Business by Uber PM
How to Product Manage a Marketplace Business by Uber PMHow to Product Manage a Marketplace Business by Uber PM
How to Product Manage a Marketplace Business by Uber PM
 
Observability For Modern Applications
Observability For Modern ApplicationsObservability For Modern Applications
Observability For Modern Applications
 
Google Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptxGoogle Cloud GenAI Overview_071223.pptx
Google Cloud GenAI Overview_071223.pptx
 
User behavior analytics
User behavior analyticsUser behavior analytics
User behavior analytics
 
Experimentation as a growth strategy at Booking.com
Experimentation as a growth strategy at Booking.comExperimentation as a growth strategy at Booking.com
Experimentation as a growth strategy at Booking.com
 
Cross browser testing with browser stack
Cross browser testing with browser stackCross browser testing with browser stack
Cross browser testing with browser stack
 

Andere mochten auch

Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsChris Saint-Amant
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test libraryNir Hartmann
 
Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案文波 张
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsBlake Irvine
 
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)Chris Saint-Amant
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 
Overlapping Experiments Infrastructure
Overlapping Experiments InfrastructureOverlapping Experiments Infrastructure
Overlapping Experiments InfrastructureSrihari Sriraman
 
Yeditepe Unc, IMS522 Glade
Yeditepe Unc, IMS522 Glade Yeditepe Unc, IMS522 Glade
Yeditepe Unc, IMS522 Glade Secil Ellibes
 
Devtribe a/ b testing on multiple platforms with recurring and paying users
Devtribe  a/ b testing on multiple platforms with recurring and paying usersDevtribe  a/ b testing on multiple platforms with recurring and paying users
Devtribe a/ b testing on multiple platforms with recurring and paying usersOscar Carlsson
 
State of the Bot - Sandeep Chivukula for SIPA
State of the Bot - Sandeep Chivukula for SIPA  State of the Bot - Sandeep Chivukula for SIPA
State of the Bot - Sandeep Chivukula for SIPA Sandeep Chivukula
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingTrieu Nguyen
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
925 Design - Time of experimentation has begun
925 Design - Time of experimentation has begun925 Design - Time of experimentation has begun
925 Design - Time of experimentation has begun925design
 
4장 테스트 자동화의 철학
4장 테스트 자동화의 철학4장 테스트 자동화의 철학
4장 테스트 자동화의 철학samagu0030
 
Digital Landscape 2013
Digital Landscape 2013Digital Landscape 2013
Digital Landscape 2013Tammy Mendoza
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean StartupsPete Mauro
 

Andere mochten auch (20)

Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
Drippler's A/B test library
Drippler's A/B test libraryDrippler's A/B test library
Drippler's A/B test library
 
Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案Ab test -互联网渐进式解决方案
Ab test -互联网渐进式解决方案
 
Netflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of AnalyticsNetflix - Enabling a Culture of Analytics
Netflix - Enabling a Culture of Analytics
 
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)
A/B Testing at Scale: Minimizing UI Complexity (SXSW 2015)
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
Culture
CultureCulture
Culture
 
Android m-demos
Android m-demosAndroid m-demos
Android m-demos
 
Overlapping Experiments Infrastructure
Overlapping Experiments InfrastructureOverlapping Experiments Infrastructure
Overlapping Experiments Infrastructure
 
Yeditepe Unc, IMS522 Glade
Yeditepe Unc, IMS522 Glade Yeditepe Unc, IMS522 Glade
Yeditepe Unc, IMS522 Glade
 
Devtribe a/ b testing on multiple platforms with recurring and paying users
Devtribe  a/ b testing on multiple platforms with recurring and paying usersDevtribe  a/ b testing on multiple platforms with recurring and paying users
Devtribe a/ b testing on multiple platforms with recurring and paying users
 
State of the Bot - Sandeep Chivukula for SIPA
State of the Bot - Sandeep Chivukula for SIPA  State of the Bot - Sandeep Chivukula for SIPA
State of the Bot - Sandeep Chivukula for SIPA
 
Lambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB TestingLambda Architecture 2.0 for Reactive AB Testing
Lambda Architecture 2.0 for Reactive AB Testing
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
925 Design - Time of experimentation has begun
925 Design - Time of experimentation has begun925 Design - Time of experimentation has begun
925 Design - Time of experimentation has begun
 
4장 테스트 자동화의 철학
4장 테스트 자동화의 철학4장 테스트 자동화의 철학
4장 테스트 자동화의 철학
 
Digital Landscape 2013
Digital Landscape 2013Digital Landscape 2013
Digital Landscape 2013
 
A/B Testing for Lean Startups
A/B Testing for Lean StartupsA/B Testing for Lean Startups
A/B Testing for Lean Startups
 
Ab test
Ab testAb test
Ab test
 

Ähnlich wie A/B Testing at Netflix: Experimentation Platform

Experimentation at Blue Apron (webinar)
Experimentation at Blue Apron (webinar)Experimentation at Blue Apron (webinar)
Experimentation at Blue Apron (webinar)Optimizely
 
Continuous testing in agile projects 2015
Continuous testing in agile projects 2015Continuous testing in agile projects 2015
Continuous testing in agile projects 2015Fabricio Epaminondas
 
Creating Functional Testing Strategy.pptx
Creating Functional Testing Strategy.pptxCreating Functional Testing Strategy.pptx
Creating Functional Testing Strategy.pptxMohit Rajvanshi
 
Is Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle WilliamsIs Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle WilliamsQA or the Highway
 
Shorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOpsShorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOpsPerfecto Mobile
 
Introduction to testing.
Introduction to testing.Introduction to testing.
Introduction to testing.Jithinctzz
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test AutomationSauce Labs
 
Software Engineering (Testing Overview)
Software Engineering (Testing Overview)Software Engineering (Testing Overview)
Software Engineering (Testing Overview)ShudipPal
 
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".Emerasoft, solutions to collaborate
 
Performance Testing in the Agile Lifecycle
Performance Testing in the Agile LifecyclePerformance Testing in the Agile Lifecycle
Performance Testing in the Agile LifecycleTechWell
 
Lecture3.se.pptx
Lecture3.se.pptxLecture3.se.pptx
Lecture3.se.pptxAmna Ch
 
Dev ops != Dev+Ops
Dev ops != Dev+OpsDev ops != Dev+Ops
Dev ops != Dev+OpsShalu Ahuja
 
software testing
 software testing software testing
software testingSara shall
 
Testing Attributes
Testing AttributesTesting Attributes
Testing AttributesAbiha Naqvi
 
Software test management
Software test managementSoftware test management
Software test managementVishad Garg
 

Ähnlich wie A/B Testing at Netflix: Experimentation Platform (20)

Experimentation at Blue Apron (webinar)
Experimentation at Blue Apron (webinar)Experimentation at Blue Apron (webinar)
Experimentation at Blue Apron (webinar)
 
Agile Testing - What is it?
Agile Testing - What is it?Agile Testing - What is it?
Agile Testing - What is it?
 
Agile Testing
Agile Testing  Agile Testing
Agile Testing
 
Continuous testing in agile projects 2015
Continuous testing in agile projects 2015Continuous testing in agile projects 2015
Continuous testing in agile projects 2015
 
Creating Functional Testing Strategy.pptx
Creating Functional Testing Strategy.pptxCreating Functional Testing Strategy.pptx
Creating Functional Testing Strategy.pptx
 
Is Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle WilliamsIs Test Planning a lost art in Agile? by Michelle Williams
Is Test Planning a lost art in Agile? by Michelle Williams
 
Shorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOpsShorten Business Life Cycle Using DevOps
Shorten Business Life Cycle Using DevOps
 
Introduction to testing.
Introduction to testing.Introduction to testing.
Introduction to testing.
 
7 steps to Software test automation success
7 steps to Software test automation success7 steps to Software test automation success
7 steps to Software test automation success
 
5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation5 Steps to Jump Start Your Test Automation
5 Steps to Jump Start Your Test Automation
 
Software Engineering (Testing Overview)
Software Engineering (Testing Overview)Software Engineering (Testing Overview)
Software Engineering (Testing Overview)
 
Methodology: IT test
Methodology: IT testMethodology: IT test
Methodology: IT test
 
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".
Webinar: "5 semplici passi per migliorare la Quality e i processi di Test".
 
Performance Testing in the Agile Lifecycle
Performance Testing in the Agile LifecyclePerformance Testing in the Agile Lifecycle
Performance Testing in the Agile Lifecycle
 
Lecture3.se.pptx
Lecture3.se.pptxLecture3.se.pptx
Lecture3.se.pptx
 
Dev ops != Dev+Ops
Dev ops != Dev+OpsDev ops != Dev+Ops
Dev ops != Dev+Ops
 
software testing
 software testing software testing
software testing
 
Testing Attributes
Testing AttributesTesting Attributes
Testing Attributes
 
Software test management
Software test managementSoftware test management
Software test management
 
QA Best Practices in Agile World_new
QA Best Practices in Agile World_newQA Best Practices in Agile World_new
QA Best Practices in Agile World_new
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

A/B Testing at Netflix: Experimentation Platform

  • 1. A/B Testing at Netflix: Experimentation Platform Steve Urban experimentation@netflix.com
  • 2. • Technology is just one part of the equation: a culture of experimentation is the other essential part • All product ideas are subjected to the scientific method, with actual data supporting changes before changes are rolled out to all users • The effectiveness of any idea is measured without bias - the seniority of the person proposing the idea is irrelevant Importance of A/B Testing at Netflix
  • 3. A/B testing enables product decisions throughout Netflix, with our users spread across all departments • Data Scientists: Does this new ranking algorithm result in more plays? • Product Managers: Does this new UI reduce the time for users to find content? • Marketing: Which email campaign resulted in more new subscribers? • Content: Which thumbnail image resulted in more streams of Daredevil? • Engineers: Is the new implementation of this streaming algorithm more performant when internet connectivity is spotty? • and so on... Our Users
  • 4. • Being an internal tool is not an excuse for poor UX • Given the diverse expertise of our users workflows must be simple and effective while providing value • Cover all generic test management scenarios • Easily accommodate unique experimentation needs as they come up • Ingest and combine real-time behavioral and batch metadata from numerous sources A/B Testing Platform Objectives
  • 5. We’re looking for a Full-Stack Engineer to help across the board: • Collaborate with users across Netflix to understand their UI needs • Be part of a team of engineers and UX experts • Tech stack: Java, React, Node • Data visualization experience is a plus We’re Hiring Netflix has a unique culture. Read about it here. We need a Server-Side Engineer with expertise designing distributed systems: • Help design and rebuild our allocation engine • Experience processing large datasets - including efficient incorporation of near real-time data • Expertise with various Big Data databases • Machine learning experience is a plus
  • 6. WAIT, THAT’S NOT ENOUGH I WANT TO GO DEEPER
  • 7. orA B Which Version is Better?
  • 8. Which set of recommendations is better? orA B Given that I Watched House of Cards...
  • 9. Hard to Answer Without Disciplined Experimentation orA? B?
  • 10. A/B Testing Process Target Population Hypothesis: Retention and/or engagement will improve with new recommendation algorithm. Process: Randomly group users into different buckets. Other than the tests, all other factors are constant. Control Group: Continue to experience the current version (A) Test Group B: Experience version B Test Group C: Experience version C
  • 11. A/B Testing Process Continued Analyze & Compare Key Results Algorithm A (Control) Algorithm B Algorithm C? ... Viewing hours delta: N/A N/A as this is what we are measuring other options against Viewing hours delta: +2.3% Statistically Significant: Yes Viewing hours delta: -5.7% Statistically Significant: Yes 2.3% better than the control, and we’re confident about it Ouch! Don’t use this algorithm.
  • 13. Experimentation Service Persist/Retrieve Allocations Experiment Criteria Define Experiments Sampling Metadata Allocations Evaluate Eligibility Ad Hoc queries R E S T A P I * Allocate Customers * Retrieve Allocations Real-time Analysis & MonitoringPersist Metrics Health Metrics Visualize Technology Stack Other Netflix Services
  • 14. Allocation & Stratification All US Regions ● Randomly distribute and assign customers to a variant in the experiment utilizing Stratified Sampling ● Start, Stop, and Track allocations in near real-time Percentage of Users*: North East 22% South East 13% South West 17% ... ... *Numerical values are for illustrative purposes only and are totally made up “Random sampling” with enforcement of sample proportions across regions Percentage of Users
  • 15. Segmentation Target Population ● Divide a broad target population into subsets with similar properties ● Some tests are meant to measure impact on specific populations ● Must maintain scale and low latencies Segmentation by specific properties Haven’t used a tablet to access Netflix in n days Used a game console to access Netflix within last n days Smart TV users
  • 16. Test Health ● All test experiences are not equal, but we must ensure this isn’t due to buggy implementations ● Issues can be device specific, so must monitor at device, test, and experience granularity ● The example below is super-simplified - we need to create visualizations which effectively convey test health, internationally, across thousands of devices Control Cell Experience B No errors/fallbacks Experience A Issue on TV UI detected No errors/fallbacks
  • 17. ABlaze UI: Test Lifecycle Management Initial Planning: Test Configuration Screens ● Determine hypothesis ● Implement each test experience Schedule Test: Scheduler View ● Define real-time rules & conditions ● Consider potential conflicts Monitor Test: Dashboard and Alert Views ● Monitor test health over time ○ Real-time analysis and alerting on metrics and allocations ● Pull test if bugs/issues present themselves Hypothesis Evaluation: Comparison Views ● Interactive filtering, analysis, & visualization of data ● Call success or failure of test Implement or Re-Test ● Devise plan to roll winning experience (if any) out to production ● Else, potentially revise hypothesis and retest
  • 18. Some Challenges • Operate resiliently and at low latencies, despite: • Customer allocations taking place in real-time • Need for near real-time insights into test health over massive datasets • Data that is distributed across multiple clusters • Data processing: • Joins across billions of rows of data from many sources can cause massive increase in number of rows • Efficient management of datasets to support interactive analysis, dashboards, etc. • Rich and flexible filtering to support interactive analysis • Extract forecasts and insights • Oh, and make it as easy to use as possible for the users...