[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

C
Carl VogelData Scientist
HowDataScientists
BrokeA/BTesting
(andhowwecanfixit)
Questions?
pos.it/slido-A
A Completely
True Story
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
Launch on
Neutral
(But thanks anyway)
ExistentialDread
(Get used to it)
A real PM
“If it’s something we really believe
in, I’ll launch on a flat result … if
it’s part of a broader strategy.”
“My features are hard as shit to build,
but easy to tweak, so I’m not always
worried about statistical significance.”
Another real PM
NotjustNHST
Features aren’t IID
Path dependencies in
feature roadmaps
We develop experiences by
building up features over
time and it’s helpful to
launch them incrementally
MDE is basically zero
Feature costs are nearly all
sunk before the test
Any lift pays off
NotjustNHST
Risk is mismeasured
Decision makers don’t
think about Type I and II
error rates, per se
They just want to make
more money than they lose
CanImakegood
decisionsabout
smalltomoderate
effectsquickly?
Youcan’tmake
reliableinferences
aboutsmallto
moderateeffects
quickly.
Didtheymisusethetool?
Ordidwehandthemthewrongone?
Non-Inferiority
Designs
Non-inferioritydesigns
Let’s try not to wreck the place
Superiority Non-Inferiority
Non-inferioritydesigns
Let’s try not to wreck the place
• Inferiority margins ( ) prompt us to ask:
• How much do we believe in this feature?
• How quickly will we improve on it?
• Stakeholders can give meaningful answers to these questions
• Compare to MDE/minimal lift, which is often made up
• Avoid meaningless minimum e
ff
ect estimates
• Can power against a “no e
ff
ect” alternative
Δ
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
What’s
the rush?
Thecostsoflongexperiments
Time is money, folks
• Opportunity cost of time:
• Experimental features live on a roadmap, waiting for launch decisions
delays development of subsequent features
• Opportunity cost of sampling:
• As long as the experiment runs, many users aren’t getting the best
variant
• Maintenance costs:
• More experiments running means more complexity in the codebase,
more e
ff
ort, etc.
Value of
Information
Designs
Whenisdataworthit?
Good things are worth waiting for
•Waiting is costly, but data is valuable.
•We should keep going as long as the value
of more data exceeds the cost of more time
•Quantify our impatience as part of test
design
ExpectedValuevs.CostofData
$0
$20,000
$40,000
$60,000
$80,000
Test Length
0 15 30 45 60
Exp. Value
Cost
Net Exp.
Value
Whyisdatavaluable?
How dumb am I, in dollars?
• Before we have data, our range of potential lifts is wide
• Our best guess could be way o
ff
; we could make a big
mistake
• Observing data narrows the range, even if our new guess is
wrong, it won’t be wrong by as much.
• If the value of being less wrong (in expectation) exceeds the
cost of waiting for the data, LFG!
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
$0
$10K
$200K
Sequentialtestingdecisions
Don’t stop ’til you get enough
• We can do this again after collecting some data
• This changes the core decision from: “is B > A?” to “should I stop or
continue testing?”
• Good
fi
t for A/B tests, where we collect data passively just by
waiting
• Once more data isn’t worth it, launch the best observed variant,
the inference problem is irrelevant (Claxton ’96)
• This is our best information, and it’s not worth getting more
Lessons
What’stheProblem?
Going back to basics
There’s no silver bullet
You may have other problems; you’ll need
other solutions
Misuse of tools should prompt us to
rethink the problem
What are we actually trying to solve?
What are the costs, benefits, and risks?
What’stheProblem?
Going back to basics
Are we solving the problem, or treating
symptoms?
Launch-on-neutral, run-til-significant, peeking,
etc. are symptoms, not the root problem
Lots of advanced techniques speed up tests, but
don’t actually address reasons for impatience
Here,there,andeverywhere
You’re soaking in it
This isn’t just about A/B testing
But it’s a domain where we have very
familiar tools close at hand
Whatareweherefor?
People who solve problems for people are the luckiest people in the world
This is the fun stuff
This is where we add value as data
scientists
These problems aren’t solved
Try new stuff!
Carl Vogel
Principal Data Scientist
carl.vogel@babylist.com
Thanks!
1 von 34

Recomendados

Anatomy of pharynx von
Anatomy of pharynxAnatomy of pharynx
Anatomy of pharynxNISCHAL SHRESTHA
3.1K views52 Folien
Salivary glands 0 von
Salivary glands 0Salivary glands 0
Salivary glands 0michaelfahmy92
3.3K views34 Folien
Diaphragm Movement And Contractility Evaluation By Thoracic Ultrasound von
Diaphragm Movement And Contractility Evaluation By Thoracic UltrasoundDiaphragm Movement And Contractility Evaluation By Thoracic Ultrasound
Diaphragm Movement And Contractility Evaluation By Thoracic UltrasoundBassel Ericsoussi, MD
13.5K views23 Folien
Fracture Mimics Dr. Muhammad Bin Zulfiqar von
Fracture Mimics Dr. Muhammad Bin ZulfiqarFracture Mimics Dr. Muhammad Bin Zulfiqar
Fracture Mimics Dr. Muhammad Bin ZulfiqarDr. Muhammad Bin Zulfiqar
5.8K views25 Folien
Neck Imaging.pptx von
Neck Imaging.pptxNeck Imaging.pptx
Neck Imaging.pptxJwan AlSofi
108 views75 Folien
Anatomy of larynx & physiology, 29.08.16, dr.bakshi von
Anatomy of larynx & physiology, 29.08.16, dr.bakshiAnatomy of larynx & physiology, 29.08.16, dr.bakshi
Anatomy of larynx & physiology, 29.08.16, dr.bakshiophthalmgmcri
1.4K views95 Folien

Más contenido relacionado

Was ist angesagt?

1 hypoventilation disorders von
1 hypoventilation disorders1 hypoventilation disorders
1 hypoventilation disordersYaser Ammar
2.8K views34 Folien
Pterygopalatine ganglion von
Pterygopalatine ganglionPterygopalatine ganglion
Pterygopalatine ganglionahmad faraz khan
329 views12 Folien
Anatomy of nose von
Anatomy of noseAnatomy of nose
Anatomy of nosePriyanka Shastri
11K views88 Folien
Anatomical basis for respiration in female Vs male von
Anatomical basis for respiration in female Vs maleAnatomical basis for respiration in female Vs male
Anatomical basis for respiration in female Vs maleManoj Khadka
1.2K views7 Folien
Cystic masses of neck von
Cystic masses of neckCystic masses of neck
Cystic masses of neckPRAMODG11
14.7K views57 Folien
Radiological signs in chest medicine part 2 von
Radiological signs in chest medicine part 2Radiological signs in chest medicine part 2
Radiological signs in chest medicine part 2Gamal Agmy
9.4K views84 Folien

Was ist angesagt?(20)

1 hypoventilation disorders von Yaser Ammar
1 hypoventilation disorders1 hypoventilation disorders
1 hypoventilation disorders
Yaser Ammar2.8K views
Anatomical basis for respiration in female Vs male von Manoj Khadka
Anatomical basis for respiration in female Vs maleAnatomical basis for respiration in female Vs male
Anatomical basis for respiration in female Vs male
Manoj Khadka1.2K views
Cystic masses of neck von PRAMODG11
Cystic masses of neckCystic masses of neck
Cystic masses of neck
PRAMODG1114.7K views
Radiological signs in chest medicine part 2 von Gamal Agmy
Radiological signs in chest medicine part 2Radiological signs in chest medicine part 2
Radiological signs in chest medicine part 2
Gamal Agmy9.4K views
HRCT chest Ground glass opacities von Mitusha Verma
HRCT chest Ground glass opacitiesHRCT chest Ground glass opacities
HRCT chest Ground glass opacities
Mitusha Verma7.6K views
Anatomy of recurrent laryngeal nerveAnatomy of recurrent laryngeal nerveAnato... von Ebtisam ~
Anatomy of recurrent laryngeal nerveAnatomy of recurrent laryngeal nerveAnato...Anatomy of recurrent laryngeal nerveAnatomy of recurrent laryngeal nerveAnato...
Anatomy of recurrent laryngeal nerveAnatomy of recurrent laryngeal nerveAnato...
Ebtisam ~15.8K views
Anatomy of lung & pleura von imangalal
Anatomy of lung & pleuraAnatomy of lung & pleura
Anatomy of lung & pleura
imangalal35.6K views
Cavitatory lesions of the lung von reddyvjm
Cavitatory lesions of the lungCavitatory lesions of the lung
Cavitatory lesions of the lung
reddyvjm14.9K views
Anatomy of lungs 3DR NIKUNJ R SHEKHADA (MBBS,MS GEN SURG DNB CTS SR) von DR NIKUNJ SHEKHADA
Anatomy of lungs 3DR NIKUNJ R SHEKHADA (MBBS,MS GEN SURG DNB CTS SR)Anatomy of lungs 3DR NIKUNJ R SHEKHADA (MBBS,MS GEN SURG DNB CTS SR)
Anatomy of lungs 3DR NIKUNJ R SHEKHADA (MBBS,MS GEN SURG DNB CTS SR)
DR NIKUNJ SHEKHADA545 views
The solitary lung nodule. A diagnostic dilemma. von hazem youssef
The solitary lung nodule. A diagnostic dilemma.  The solitary lung nodule. A diagnostic dilemma.
The solitary lung nodule. A diagnostic dilemma.
hazem youssef956 views
Anatomy of larynx von ranjit9124
Anatomy of larynxAnatomy of larynx
Anatomy of larynx
ranjit912420.2K views
Achondroplasia von anwaradil4
AchondroplasiaAchondroplasia
Achondroplasia
anwaradil4445 views
Spectrum of pulmonary asperigellosis von Gamal Agmy
Spectrum of pulmonary asperigellosisSpectrum of pulmonary asperigellosis
Spectrum of pulmonary asperigellosis
Gamal Agmy4.4K views

Similar a [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

Tale of Two Tests von
Tale of Two TestsTale of Two Tests
Tale of Two TestsOptimizely
239 views41 Folien
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making von
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
2.5K views227 Folien
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C... von
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...Matthew Philip
574 views50 Folien
The Myths of Big Data von
The Myths of Big DataThe Myths of Big Data
The Myths of Big DataProphet
12.7K views29 Folien
Iwsm2014 why cant people estimate (dan galorath) von
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
973 views40 Folien
Building a culture of testing like lucid von
Building a culture of testing like lucidBuilding a culture of testing like lucid
Building a culture of testing like lucidKissmetrics on SlideShare
497 views22 Folien

Similar a [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)(20)

Tale of Two Tests von Optimizely
Tale of Two TestsTale of Two Tests
Tale of Two Tests
Optimizely239 views
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making von indeedeng
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
indeedeng2.5K views
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C... von Matthew Philip
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
Matthew Philip574 views
The Myths of Big Data von Prophet
The Myths of Big DataThe Myths of Big Data
The Myths of Big Data
Prophet12.7K views
Iwsm2014 why cant people estimate (dan galorath) von Nesma
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)
Nesma973 views
Actionable Machine Learning von Meir Maor
Actionable Machine LearningActionable Machine Learning
Actionable Machine Learning
Meir Maor391 views
Todd little - Risky Business | Real Options for Business Agility von Kanban Conferences
Todd little -  Risky Business | Real Options for Business AgilityTodd little -  Risky Business | Real Options for Business Agility
Todd little - Risky Business | Real Options for Business Agility
Kanban Conferences249 views
Portfolio Management Using Questionable Quality Data von Portfolio Decisions
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality Data
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P... von James Anderson
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
James Anderson198 views
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf von Jens-Fabian Goetzmann
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
Managing Data Science by David Martínez Rego von Big Data Spain
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
Big Data Spain556 views
How to use data to make a hit tv show von Parul Verma
How to use data to make a hit tv showHow to use data to make a hit tv show
How to use data to make a hit tv show
Parul Verma67 views
Software estimation is crap von Ian Garrison
Software estimation is crapSoftware estimation is crap
Software estimation is crap
Ian Garrison67 views
Is data visualisation bullshit? von Alban Gérôme
Is data visualisation bullshit?Is data visualisation bullshit?
Is data visualisation bullshit?
Alban Gérôme637 views
CommonAnalyticMistakes_v1.17_Unbranded von Jim Parnitzke
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
Jim Parnitzke190 views
Is Bigger Data Really Better? 10 Facts from Theory and Practice von DataWorks Summit
Is Bigger Data Really Better? 10 Facts from Theory and PracticeIs Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and Practice
DataWorks Summit720 views
Corporate Climb Presentation von Kirill Storch
Corporate Climb PresentationCorporate Climb Presentation
Corporate Climb Presentation
Kirill Storch332 views

Último

Data Journeys Hard Talk workshop final.pptx von
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
11 views18 Folien
Oral presentation (1).pdf von
Oral presentation (1).pdfOral presentation (1).pdf
Oral presentation (1).pdfreemalmazroui8
5 views10 Folien
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... von
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...DataScienceConferenc1
9 views77 Folien
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 Folien
CRM stick or twist.pptx von
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptxinfo828217
11 views16 Folien
Product Research sample.pdf von
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdfAllenSingson
33 views29 Folien

Último(20)

Data Journeys Hard Talk workshop final.pptx von info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... von DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
CRM stick or twist.pptx von info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821711 views
Product Research sample.pdf von AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 views
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... von DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
4_4_WP_4_06_ND_Model.pptx von d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
Customer Data Cleansing Project.pptx von Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 views
CRM stick or twist workshop von info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821714 views
DGST Methodology Presentation.pdf von maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf von 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 views
PRIVACY AWRE PERSONAL DATA STORAGE von antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 views

[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)