SlideShare ist ein Scribd-Unternehmen logo
1 von 45
Downloaden Sie, um offline zu lesen
A/B Tes(ng Data-Driven

Algorithms in the Cloud
cloudacademy.com
7/25/2016
About us
Roberto Turrin Luca Baroffio
Sr. Data Scien8st (PhD) Data Scien8st (PhD)
@robytur @lucabaroffio
Agenda
Data-driven algorithms
Evalua8on
A/B tes8ng
Challenges in A/B tes8ng data-driven algorithms
A/B tes8ng in the cloud
Data-driven A/B tes8ng
Conclusions
Q&A
Data-driven algorithms
Decision problems that
can be modeled from data
Data-driven problems - I
Image recogni8on
Document classifica8on
Speech-to-text
Spam/fraud detec8on
Stock price predic8on
Content personaliza8on
Market basket
Search sugges8on
Playlist genera8on
Document clustering
User segmenta8on
Target Adver8sing
Data-driven problems - II
Image recogni8on
Document classifica8on
Speech-to-text
Spam/fraud detec8on
Stock price predic8on
Content personaliza8on
Market basket
Search sugges8on
Playlist genera8on
Document clustering
User segmenta8on
Target Adver8sing
classifica'on
regression clustering
rule extrac'on
?
170

cm
group A group B
A, B C
Supervised Unsupervised
Data-driven algorithm pipeline
Training Predic6on
batch real-8me
Feature
extrac6on
batch
data set informa(on
features ML models
real-(me data
Evalua8on: offline vs online
Offline evalua8on - I
Training Predic6on
batch real-8me
Feature
extrac6on
batch
data set
features ML models
real (me

data
informa(on
Offline experiments are
run on a snapshot of the
collected data set.
Offline evalua8on - I
PROS CONS
Quick
Large number of solu8ons
No impact on business
Applicable in most scenarios
They use past data
Risk to promote imita8on
Not considering the impact of 

the algorithm on the user context
Not suitable for “unpredictable” data

(e.g., stock price)
Online evalua8on
Training Predic6on
batch real-8me
Feature
extrac6on
batch
data set
features ML models
real-(me

data
informa(on
Online experiments use
live user feedback
Online: human-subject experiments - I
Controlled experiment
A B?
Human-subject
experiments work in a
controlled environment
Online: human-subject experiments - II
PROS CONS
Feedback of real users

affected by actual context
Implement controlled environment

(back-end+front-end)
Mul8ple KPIs can be measured Environment is simulated
Recrui8ng non-biased users
Not scaling: limited number of users
Few solu8ons can be tested
Mo8vate users
Medium running 8me
Online: live A/B tes8ng - I
A
B
Live tes8ng works in
produc8on
Online: live A/B tes8ng - II
PROS CONS
Capture real, full impact of the

data-driven solu8on
Very few solu8ons can be tested
Long running 8me
Large traffic required
May affect business
Some KPIs are hard to measure
A/B tes8ng: under the hood
Sta8s8cal hypothesis tes8ng:
1. Formulate a hypothesis
2. Set up a tes)ng campaign
3. Make use of sta)s)cs to evaluate the
hypothesis
A/B tes8ng: real-world similari8es
Clinical trials
Product comparison
Quality assurance
Decision making
A/B tes8ng: UI examples
ADD TO CART ADD TO CART
Register Register (it’s FREE!)
Lorem ipsum dolor sit amet, ius an aperiri sapientem disputando,
legimus mandamus reprimique mei ea. In aliquam euripidis ius. Ei
sea dico interesset. Sit et veri brute. Eu sed populo option apeirian,
essent blandit ei pro. No quo integre delicatissimi. Eos ea nostro
fabulas neglegentur, vel dolor splendide eu, vel ei illud blandit
scripserit. Dolor detracto efficiendi ei vel. Ad per error nullam.
Nec id facer impetus deseruisse. Pri dicunt phaedrum te. Ad cum
munere consectetuer, has odio referrentur in. Elit atqui prodesset quo
eu. Eu mei ubique bonorum deseruisse. Habeo sonet disputando et
duo. Et vim homero vocibus, vel ut dicunt omnium.
Start free trial Lorem ipsum dolor sit amet, ius an aperiri sapientem disputando,
legimus mandamus reprimique mei ea. In aliquam euripidis ius. Ei
sea dico interesset. Sit et veri brute. Eu sed populo option apeirian,
essent blandit ei pro. No quo integre delicatissimi. Eos ea nostro
fabulas neglegentur, vel dolor splendide eu, vel ei illud blandit
scripserit. Dolor detracto efficiendi ei vel. Ad per error nullam.
Nec id facer impetus deseruisse. Pri dicunt phaedrum te. Ad cum
munere consectetuer, has odio referrentur in. Elit atqui prodesset quo
eu. Eu mei ubique bonorum deseruisse. Habeo sonet disputando et
duo. Et vim homero vocibus, vel ut dicunt omnium.
Start
free
trial
A B“Control” “Varia8on”
A/B tes8ng: ingredients
Hypothesis formula8on
• Everything starts with an idea
Define metrics:
• How to measure if something is “successful”?
Run a test, collect data and compute metrics
Compare the two alterna8ves
A/B tes8ng: 1) hypothesis formula8on
A red bu4on is clicked more o7en than a blue bu4on
Sta6s6cs lingo:
Null hypothesis:
There is no difference between
the red and the blue buLons
GOAL: reject the null hypothesis
The null hypothesis is true:
• we fail to reject the null hypothesis
A/B tes8ng: 2) define a metric
Choose a measure that reflects your goals
Examples:
Click Through Rate (CTR)
Open rate, click rate
Conversion rate (# subs/# visitors)
Customer sa8sfac8on
Returning rate
A/B tes8ng: 3) run a test
It may affect your business!
1. Create the two alterna)ves
2. Assign a subset of users to each alterna8ve
3. Collect data and compute the metrics
A/B tes8ng: 4) compare the two alterna8ves
ADD TO CART ADD TO CART
1 view, 0 click —> 0% CTR 1 view, 1 click —> 100% CTR
100% > 0%,the red bupon is beper, right?
Not so fast…
A B
A/B tes8ng: confidence
What is the variability of our measure?
How confident are we in the outcome of the test?
Model our measure resor8ng to a sta)s)cal distribu)on, e.g., a Gaussian
distribu8on
E.g., the average click through rate for the blue bupon is 20% ± 7%
Confidence interval
A/B tes8ng: confidence interval
A confidence interval is a range defined so that there is a given probability
that the value of your measure falls within such range
The confidence interval depends on the confidence level
The higher the confidence level, the larger the confidence interval
E.g., the average click through rate for the blue bupon is 20% ± 7% at 90%
confidence level
Confidence interval
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 7%
90% confidence level
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 10%
95% confidence level
A/B tes8ng: rejec8ng the null hypothesis
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 10%
95% confidence level
The avg CTR for the varia8on falls outside the CI —> Null hypothesis rejected!
A/B tes8ng: errors
Null hypothesis
ACCEPTED
Null hypothesis
REJECTED
Null hypothesis
TRUE
True Nega)ve
The buLons are the
same, we acknowledge
it
Type I error
The buLons are the
same, we say the red
one is beLer
Null hypothesis
FALSE
Type II error
The red buLon is
beLer, we say they are
the same
True Posi)ve
The red buLon is
beLer, we
acknowledge it
Null hypothesis:
There is no difference between
the red and the blue buLons
A/B tes8ng: errors
Null hypothesis
ACCEPTED
Null hypothesis
REJECTED
Null hypothesis
TRUE
True Nega)ve
The buLons are the
same, we acknowledge
it
Type I error
The buLons are the
same, we say the red
one is beLer
Null hypothesis
FALSE
Type II error
The red buLon is
beLer, we say they are
the same
True Posi)ve
The red buLon is
beLer, we
acknowledge it
Null hypothesis:
There is no difference between
the red and the blue buLons
A/B tes8ng: errors
Null hypothesis
ACCEPTED
Null hypothesis
REJECTED
Null hypothesis
TRUE
True Nega)ve
The buLons are the
same, we acknowledge
it
Type I error
The buLons are the
same, we say the red
one is beLer
Null hypothesis
FALSE
Type II error
The red buLon is
beLer, we say they are
the same
True Posi)ve
The red buLon is
beLer, we
acknowledge it
Null hypothesis:
There is no difference between
the red and the blue buLons
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 7%
90% confidence level
⍺: type-I error rate
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 7%
90% confidence level
β: type-II error rate
A/B tes8ng: comparing distribu8ons
20%
p(CTR)
CTR40%
ADD TO CART ADD TO CART
20% ± 7%
90% confidence level
power = 1 - β
A/B tes8ng: 8ps and common mistakes
DO NOT run the two varia8ons under different condi)ons
DO NOT stop the test too early
Pay apen8on to external factors
DO NOT blind test without a hypothesis
DO NOT stop ater the first failures
Choose the right metric
Consider the impact on your business
Randomly split the popula8on
Keep the assignment consistent
Tom
A/B tes8ng data-driven algorithms - I
A
B
Training Predic6on
Feature
extrac6on
Training Predic6on
Feature
extrac6on
Mike
People like you
ChrisLena
People like you
Targeted Ad.
Recommended users
A
B
A
B
A/B tes8ng data-driven algorithms - II
CTR not always is the
right metric
Search engine:
ideally no click at all
Tweet sugges8ons: what users click
is not necessarily what they want
E-commerce recommenda8ons: users click to find

products alterna8ve to the one proposed
Find long-term metrics Reten8on/churn
Returning users
Time spent
Upgrading users
A/B tes8ng data-driven algorithms - III
Mul8ple goals are
addressed
Relevance
Transparence
Diversity
Novelty
Coverage
Robustness
Consider all the steps of the pipeline
Do not vary UI and data-driven algorithm simultaneously
A/B tes8ng in the cloud - I
Cloud compu8ng makes A/B tes8ng simpler:
1. Create mul8ple environments/modules with different features
2. Split traffic
• e.g., Google App Engine’s traffic splivng feature
Do the same with the serverless paradigm
A/B tes8ng in the cloud - II
If unsure, use a third-party service
A/B tes8ng as a service:
• AWS A/B tes8ng service
• Google Analy8cs A/B tes8ng feature
• Op8mizely, VWO
A/B tes8ng libraries:
• Sixpack, Planout, Clutch.io, Alephbet
Build your own
Data-driven algorithms to support A/B tes8ng: mul8-armed bandit - I
A
B
A
D
E
C
D
E
A/B tes6ng Mul6-armed bandit
CD
B
A
F
D
E
C
B
A
F
D
E
C
B
A
F
D
E
C
B
A
F
G
Training Predic6on
Feature
extrac6on
B
A
F
G
(me (me
F
Data-driven algorithms to support A/B tes8ng: mul8-armed bandit - II
PROS CONS
Increased average KPI Longer 8me to reach sta8s8cal

significance
Harder implementa8on
Harder maintain consistence
Main takeaways
Evaluate data-driven solu(ons both offline and online
Define the correct KPIs
Prefer long-term metrics to short-term conversions
Do not forget A/B tes(ng is a sta(s(cal test,

rely on some cloud services if you are not “confident”
Exploita(on/explora(on approaches can be an alterna(ve to A/B tes(ng
Conversion rate is not the only metric
Thank you for apending :)
cloudacademy.com
Q & A

Weitere ähnliche Inhalte

Andere mochten auch

Enabling Government through Open Source
Enabling Government through Open Source Enabling Government through Open Source
Enabling Government through Open Source David Peterson
 
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, Utrecht
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, UtrechtWant Continuous Delivery? Give testing a priority! 16-6-2016, Friss, Utrecht
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, UtrechtPavel Chunyayev
 
Rivera rojas presentaciónfinal
Rivera rojas presentaciónfinalRivera rojas presentaciónfinal
Rivera rojas presentaciónfinaljarm31
 
Getting Started with DevOps
Getting Started with DevOpsGetting Started with DevOps
Getting Started with DevOpsStefano Bellasio
 
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMC
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMCErste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMC
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMCAllFacebook.de
 
Tangencias clase 4º
Tangencias clase 4ºTangencias clase 4º
Tangencias clase 4ºvicvictoo
 

Andere mochten auch (9)

Algo
AlgoAlgo
Algo
 
Inequidad En La Justicia
Inequidad En La JusticiaInequidad En La Justicia
Inequidad En La Justicia
 
Enabling Government through Open Source
Enabling Government through Open Source Enabling Government through Open Source
Enabling Government through Open Source
 
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, Utrecht
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, UtrechtWant Continuous Delivery? Give testing a priority! 16-6-2016, Friss, Utrecht
Want Continuous Delivery? Give testing a priority! 16-6-2016, Friss, Utrecht
 
Rivera rojas presentaciónfinal
Rivera rojas presentaciónfinalRivera rojas presentaciónfinal
Rivera rojas presentaciónfinal
 
Pablo carrasco
Pablo carrascoPablo carrasco
Pablo carrasco
 
Getting Started with DevOps
Getting Started with DevOpsGetting Started with DevOps
Getting Started with DevOps
 
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMC
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMCErste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMC
Erste Facebook-Ad-Benchmark-Studie in Deutschland für die B2B-Branche #AFBMC
 
Tangencias clase 4º
Tangencias clase 4ºTangencias clase 4º
Tangencias clase 4º
 

Ähnlich wie A/B Testing Data-Driven Algorithms in the Cloud - Webinar

Predictive analytics retention
Predictive analytics retentionPredictive analytics retention
Predictive analytics retentionQubit
 
Driving customer retention using predictive analytics
Driving customer retention using predictive analyticsDriving customer retention using predictive analytics
Driving customer retention using predictive analyticsVanessa Beeswanger
 
User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015Marcus Merrell
 
Milion Dollar Impact Through Metrics, Analytics & A/B Testing
Milion Dollar Impact Through Metrics, Analytics & A/B TestingMilion Dollar Impact Through Metrics, Analytics & A/B Testing
Milion Dollar Impact Through Metrics, Analytics & A/B TestingAzhar Bandeali
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingJack Nguyen (Hung Tien)
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
 
All the data and still not enough
All the data and still not enoughAll the data and still not enough
All the data and still not enoughAndreea Bodnari
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of ProductProduct School
 
Growth hacking: the growth hacker guide to analytics: how to get to data-dri...
Growth hacking: the growth hacker guide to analytics:  how to get to data-dri...Growth hacking: the growth hacker guide to analytics:  how to get to data-dri...
Growth hacking: the growth hacker guide to analytics: how to get to data-dri...Eveline Smet
 
Artificial Intelligence in Action
Artificial Intelligence in ActionArtificial Intelligence in Action
Artificial Intelligence in ActionBenjamin Ejzenberg
 
A/B Testing Blueprint | Pirate Skills
A/B Testing Blueprint | Pirate SkillsA/B Testing Blueprint | Pirate Skills
A/B Testing Blueprint | Pirate SkillsPirate Skills
 
An Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
An Experiment a Day: A/B Testing Your Product - Serhiy KostyshynAn Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
An Experiment a Day: A/B Testing Your Product - Serhiy KostyshynSerhiy Kostyshyn
 
Culture of Optimization
Culture of OptimizationCulture of Optimization
Culture of OptimizationNikki Johnson
 
Big Data Hype (and Reality)
Big Data Hype (and Reality) Big Data Hype (and Reality)
Big Data Hype (and Reality) Srijani Das
 
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!Ton Wesseling
 
Online channels for your accounting practice
Online channels for your accounting practiceOnline channels for your accounting practice
Online channels for your accounting practicePractice Ignition
 
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkDavid Chiu
 

Ähnlich wie A/B Testing Data-Driven Algorithms in the Cloud - Webinar (20)

Predictive analytics retention
Predictive analytics retentionPredictive analytics retention
Predictive analytics retention
 
Driving customer retention using predictive analytics
Driving customer retention using predictive analyticsDriving customer retention using predictive analytics
Driving customer retention using predictive analytics
 
User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015
 
Milion Dollar Impact Through Metrics, Analytics & A/B Testing
Milion Dollar Impact Through Metrics, Analytics & A/B TestingMilion Dollar Impact Through Metrics, Analytics & A/B Testing
Milion Dollar Impact Through Metrics, Analytics & A/B Testing
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
All the data and still not enough
All the data and still not enoughAll the data and still not enough
All the data and still not enough
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
 
Growth hacking: the growth hacker guide to analytics: how to get to data-dri...
Growth hacking: the growth hacker guide to analytics:  how to get to data-dri...Growth hacking: the growth hacker guide to analytics:  how to get to data-dri...
Growth hacking: the growth hacker guide to analytics: how to get to data-dri...
 
Artificial Intelligence in Action
Artificial Intelligence in ActionArtificial Intelligence in Action
Artificial Intelligence in Action
 
A/B Testing Blueprint | Pirate Skills
A/B Testing Blueprint | Pirate SkillsA/B Testing Blueprint | Pirate Skills
A/B Testing Blueprint | Pirate Skills
 
An Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
An Experiment a Day: A/B Testing Your Product - Serhiy KostyshynAn Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
An Experiment a Day: A/B Testing Your Product - Serhiy Kostyshyn
 
Culture of Optimization
Culture of OptimizationCulture of Optimization
Culture of Optimization
 
Big Data Hype (and Reality)
Big Data Hype (and Reality) Big Data Hype (and Reality)
Big Data Hype (and Reality)
 
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!
Keynote Ton Wesseling at Superweek 2020: How an analyst can add value!
 
Online channels for your accounting practice
Online channels for your accounting practiceOnline channels for your accounting practice
Online channels for your accounting practice
 
Big data hype
Big data hypeBig data hype
Big data hype
 
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 

Kürzlich hochgeladen

Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 

Kürzlich hochgeladen (20)

Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 

A/B Testing Data-Driven Algorithms in the Cloud - Webinar

  • 1. A/B Tes(ng Data-Driven
 Algorithms in the Cloud cloudacademy.com 7/25/2016
  • 2. About us Roberto Turrin Luca Baroffio Sr. Data Scien8st (PhD) Data Scien8st (PhD) @robytur @lucabaroffio
  • 3. Agenda Data-driven algorithms Evalua8on A/B tes8ng Challenges in A/B tes8ng data-driven algorithms A/B tes8ng in the cloud Data-driven A/B tes8ng Conclusions Q&A
  • 4. Data-driven algorithms Decision problems that can be modeled from data
  • 5. Data-driven problems - I Image recogni8on Document classifica8on Speech-to-text Spam/fraud detec8on Stock price predic8on Content personaliza8on Market basket Search sugges8on Playlist genera8on Document clustering User segmenta8on Target Adver8sing
  • 6. Data-driven problems - II Image recogni8on Document classifica8on Speech-to-text Spam/fraud detec8on Stock price predic8on Content personaliza8on Market basket Search sugges8on Playlist genera8on Document clustering User segmenta8on Target Adver8sing classifica'on regression clustering rule extrac'on ? 170
 cm group A group B A, B C Supervised Unsupervised
  • 7. Data-driven algorithm pipeline Training Predic6on batch real-8me Feature extrac6on batch data set informa(on features ML models real-(me data
  • 9. Offline evalua8on - I Training Predic6on batch real-8me Feature extrac6on batch data set features ML models real (me
 data informa(on Offline experiments are run on a snapshot of the collected data set.
  • 10. Offline evalua8on - I PROS CONS Quick Large number of solu8ons No impact on business Applicable in most scenarios They use past data Risk to promote imita8on Not considering the impact of 
 the algorithm on the user context Not suitable for “unpredictable” data
 (e.g., stock price)
  • 11. Online evalua8on Training Predic6on batch real-8me Feature extrac6on batch data set features ML models real-(me
 data informa(on Online experiments use live user feedback
  • 12. Online: human-subject experiments - I Controlled experiment A B? Human-subject experiments work in a controlled environment
  • 13. Online: human-subject experiments - II PROS CONS Feedback of real users
 affected by actual context Implement controlled environment
 (back-end+front-end) Mul8ple KPIs can be measured Environment is simulated Recrui8ng non-biased users Not scaling: limited number of users Few solu8ons can be tested Mo8vate users Medium running 8me
  • 14. Online: live A/B tes8ng - I A B Live tes8ng works in produc8on
  • 15. Online: live A/B tes8ng - II PROS CONS Capture real, full impact of the
 data-driven solu8on Very few solu8ons can be tested Long running 8me Large traffic required May affect business Some KPIs are hard to measure
  • 16. A/B tes8ng: under the hood Sta8s8cal hypothesis tes8ng: 1. Formulate a hypothesis 2. Set up a tes)ng campaign 3. Make use of sta)s)cs to evaluate the hypothesis
  • 17. A/B tes8ng: real-world similari8es Clinical trials Product comparison Quality assurance Decision making
  • 18. A/B tes8ng: UI examples ADD TO CART ADD TO CART Register Register (it’s FREE!) Lorem ipsum dolor sit amet, ius an aperiri sapientem disputando, legimus mandamus reprimique mei ea. In aliquam euripidis ius. Ei sea dico interesset. Sit et veri brute. Eu sed populo option apeirian, essent blandit ei pro. No quo integre delicatissimi. Eos ea nostro fabulas neglegentur, vel dolor splendide eu, vel ei illud blandit scripserit. Dolor detracto efficiendi ei vel. Ad per error nullam. Nec id facer impetus deseruisse. Pri dicunt phaedrum te. Ad cum munere consectetuer, has odio referrentur in. Elit atqui prodesset quo eu. Eu mei ubique bonorum deseruisse. Habeo sonet disputando et duo. Et vim homero vocibus, vel ut dicunt omnium. Start free trial Lorem ipsum dolor sit amet, ius an aperiri sapientem disputando, legimus mandamus reprimique mei ea. In aliquam euripidis ius. Ei sea dico interesset. Sit et veri brute. Eu sed populo option apeirian, essent blandit ei pro. No quo integre delicatissimi. Eos ea nostro fabulas neglegentur, vel dolor splendide eu, vel ei illud blandit scripserit. Dolor detracto efficiendi ei vel. Ad per error nullam. Nec id facer impetus deseruisse. Pri dicunt phaedrum te. Ad cum munere consectetuer, has odio referrentur in. Elit atqui prodesset quo eu. Eu mei ubique bonorum deseruisse. Habeo sonet disputando et duo. Et vim homero vocibus, vel ut dicunt omnium. Start free trial A B“Control” “Varia8on”
  • 19. A/B tes8ng: ingredients Hypothesis formula8on • Everything starts with an idea Define metrics: • How to measure if something is “successful”? Run a test, collect data and compute metrics Compare the two alterna8ves
  • 20. A/B tes8ng: 1) hypothesis formula8on A red bu4on is clicked more o7en than a blue bu4on Sta6s6cs lingo: Null hypothesis: There is no difference between the red and the blue buLons GOAL: reject the null hypothesis The null hypothesis is true: • we fail to reject the null hypothesis
  • 21. A/B tes8ng: 2) define a metric Choose a measure that reflects your goals Examples: Click Through Rate (CTR) Open rate, click rate Conversion rate (# subs/# visitors) Customer sa8sfac8on Returning rate
  • 22. A/B tes8ng: 3) run a test It may affect your business! 1. Create the two alterna)ves 2. Assign a subset of users to each alterna8ve 3. Collect data and compute the metrics
  • 23. A/B tes8ng: 4) compare the two alterna8ves ADD TO CART ADD TO CART 1 view, 0 click —> 0% CTR 1 view, 1 click —> 100% CTR 100% > 0%,the red bupon is beper, right? Not so fast… A B
  • 24. A/B tes8ng: confidence What is the variability of our measure? How confident are we in the outcome of the test? Model our measure resor8ng to a sta)s)cal distribu)on, e.g., a Gaussian distribu8on E.g., the average click through rate for the blue bupon is 20% ± 7% Confidence interval
  • 25. A/B tes8ng: confidence interval A confidence interval is a range defined so that there is a given probability that the value of your measure falls within such range The confidence interval depends on the confidence level The higher the confidence level, the larger the confidence interval E.g., the average click through rate for the blue bupon is 20% ± 7% at 90% confidence level Confidence interval
  • 26. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART
  • 27. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 7% 90% confidence level
  • 28. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 10% 95% confidence level
  • 29. A/B tes8ng: rejec8ng the null hypothesis 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 10% 95% confidence level The avg CTR for the varia8on falls outside the CI —> Null hypothesis rejected!
  • 30. A/B tes8ng: errors Null hypothesis ACCEPTED Null hypothesis REJECTED Null hypothesis TRUE True Nega)ve The buLons are the same, we acknowledge it Type I error The buLons are the same, we say the red one is beLer Null hypothesis FALSE Type II error The red buLon is beLer, we say they are the same True Posi)ve The red buLon is beLer, we acknowledge it Null hypothesis: There is no difference between the red and the blue buLons
  • 31. A/B tes8ng: errors Null hypothesis ACCEPTED Null hypothesis REJECTED Null hypothesis TRUE True Nega)ve The buLons are the same, we acknowledge it Type I error The buLons are the same, we say the red one is beLer Null hypothesis FALSE Type II error The red buLon is beLer, we say they are the same True Posi)ve The red buLon is beLer, we acknowledge it Null hypothesis: There is no difference between the red and the blue buLons
  • 32. A/B tes8ng: errors Null hypothesis ACCEPTED Null hypothesis REJECTED Null hypothesis TRUE True Nega)ve The buLons are the same, we acknowledge it Type I error The buLons are the same, we say the red one is beLer Null hypothesis FALSE Type II error The red buLon is beLer, we say they are the same True Posi)ve The red buLon is beLer, we acknowledge it Null hypothesis: There is no difference between the red and the blue buLons
  • 33. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 7% 90% confidence level ⍺: type-I error rate
  • 34. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 7% 90% confidence level β: type-II error rate
  • 35. A/B tes8ng: comparing distribu8ons 20% p(CTR) CTR40% ADD TO CART ADD TO CART 20% ± 7% 90% confidence level power = 1 - β
  • 36. A/B tes8ng: 8ps and common mistakes DO NOT run the two varia8ons under different condi)ons DO NOT stop the test too early Pay apen8on to external factors DO NOT blind test without a hypothesis DO NOT stop ater the first failures Choose the right metric Consider the impact on your business Randomly split the popula8on Keep the assignment consistent
  • 37. Tom A/B tes8ng data-driven algorithms - I A B Training Predic6on Feature extrac6on Training Predic6on Feature extrac6on Mike People like you ChrisLena People like you Targeted Ad. Recommended users A B A B
  • 38. A/B tes8ng data-driven algorithms - II CTR not always is the right metric Search engine: ideally no click at all Tweet sugges8ons: what users click is not necessarily what they want E-commerce recommenda8ons: users click to find
 products alterna8ve to the one proposed Find long-term metrics Reten8on/churn Returning users Time spent Upgrading users
  • 39. A/B tes8ng data-driven algorithms - III Mul8ple goals are addressed Relevance Transparence Diversity Novelty Coverage Robustness Consider all the steps of the pipeline Do not vary UI and data-driven algorithm simultaneously
  • 40. A/B tes8ng in the cloud - I Cloud compu8ng makes A/B tes8ng simpler: 1. Create mul8ple environments/modules with different features 2. Split traffic • e.g., Google App Engine’s traffic splivng feature Do the same with the serverless paradigm
  • 41. A/B tes8ng in the cloud - II If unsure, use a third-party service A/B tes8ng as a service: • AWS A/B tes8ng service • Google Analy8cs A/B tes8ng feature • Op8mizely, VWO A/B tes8ng libraries: • Sixpack, Planout, Clutch.io, Alephbet Build your own
  • 42. Data-driven algorithms to support A/B tes8ng: mul8-armed bandit - I A B A D E C D E A/B tes6ng Mul6-armed bandit CD B A F D E C B A F D E C B A F D E C B A F G Training Predic6on Feature extrac6on B A F G (me (me F
  • 43. Data-driven algorithms to support A/B tes8ng: mul8-armed bandit - II PROS CONS Increased average KPI Longer 8me to reach sta8s8cal
 significance Harder implementa8on Harder maintain consistence
  • 44. Main takeaways Evaluate data-driven solu(ons both offline and online Define the correct KPIs Prefer long-term metrics to short-term conversions Do not forget A/B tes(ng is a sta(s(cal test,
 rely on some cloud services if you are not “confident” Exploita(on/explora(on approaches can be an alterna(ve to A/B tes(ng Conversion rate is not the only metric
  • 45. Thank you for apending :) cloudacademy.com Q & A