SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Downloaden Sie, um offline zu lesen
Experimenting
with Stats
EnginePete Koomen
Co-founder, CTO, Optimizely
@koomen
pete@optimizely.com
opticon2017
Agenda
Here
1. Why we built Stats Engine
2. How to make a decisions with Stats
Engine
3. How to scale your decision process
opticon2017
opticon2017opticon2017
Why we built Stats Engine
The study followed 1,291 participants for 10 years.
No exercise: 438 with 128 deaths (29%)
Light exercise: 576 with 7 deaths (1%)
Moderate exercise: 262 with 8 deaths (3%)
Heavy exercise: 40 with 2 deaths (5%)
“Thank goodness a third person
didn't die, or public health
authorities would be banning
jogging.”
– Alex Hutchinson, Runner’s World
“A/A” results
The “T-test”
(a.k.a. “NHST”, a.k.a. “Student T-test” )
The T-test in a nutshell
1. Run your experiment until you have reached
the required sample size, and then stop.
2. Ask “What are the chances I’d have gotten
these results in an A/A test?” (p-value)
3. If p-value < 5%, your results are significant.
1908
Data is expensive.
Data is slow.
Practitioners are trained.
2017
Data is cheap.
Data is real-time.
Practitioners are everyone.
The T-test was designed for this
world
T-Test Pitfalls
1. Peeking
2. Multiple comparisons
1. Peeking
p-Value < 5%.
Significant!
p-Value > 5%.
Inconclusive.
p-Value > 5%.
Inconclusive.
Min Sample Size
Time
Experiment Starts
p-Value > 5%.
Inconclusive.
Why is this a problem?
There is a ~5% chance of seeing a false
positive each time you peek.
p-Value < 5%.
Significant!
p-Value > 5%.
Inconclusive.
p-Value > 5%.
Inconclusive.
Min Sample Size
Time
Experiment Starts
p-Value > 5%.
Inconclusive.
4 peeks —> ~18% chance of seeing a false positive
The “T-test”
(a.k.a. “NHST”, a.k.a. “Student T-test” )
The T-test in a nutshell
1. Run your experiment until you have reached the
required sample size, and then stop.
2. Ask “What are the chances I’d have gotten these
results in an A/A test?” (p-value)
3. If p-value < 5%, your results are significant.
1:45 2:45 3:45 4:45 5:45
Solution: Stats Engine uses sequential testing to
compute an “always-valid” p-value.
2. Multiple Comparisons
© Randall Patrick Munroe, xkcd.com
© Randall Patrick Munroe, xkcd.com
- - - - -
Metrics
1 2 3 4 5
Variations
A
B
C
D
Control
False Discovery Rate = P( No Real Improvement | 10% Lift )
False Positive Rate = P( 10% Lift | No Real Improvement )
“How likely are my results if I assume there is no
underlying difference between my variation and control?
“How likely is it that my results are a fluke?”
Solution: Stats Engine controls False Discovery Rate by
becoming more conservative when more metrics and
variations are added to a test.
opticon2017opticon2017
How to make decisions with Stats Engine
When should I stop an experiment?
Understanding resets
How do additional variations and metrics affect my experiment?
How do I trade off between risk and velocity?
opticon2017opticon2017
How to make decisions with Stats Engine
When should I stop an experiment?
Understanding resets
How do additional variations and metrics affect my experiment?
How do I trade off between risk and velocity?
Variation
👍 Use “visitors remaining” to decide whether continuing
your experiment is worth it.
opticon2017opticon2017
How to make decisions with Stats Engine
When should I stop an experiment?
Understanding resets
How do additional variations and metrics affect my experiment?
How do I trade off between risk and velocity?
A
B
AB
“Peeking at A/B Tests: Why it matters, and what to do about it” KDD 2017
👍 Statistical Significance rises whenever there is strong
evidence of a difference between variation and control
“Peeking at A/B Tests: Why it matters, and what to do about it” KDD 2017
0
Variatio
Variation
👍 Statistical Significance will “reset” when there is strong
evidence of an underlying change.
Variation
👍 If your point estimate is near the edge of its confidence
interval, consider running the experiment longer.
-19.3% -2.58%
opticon2017opticon2017
How to make decisions with Stats Engine
When should I stop an experiment?
Understanding resets
How do additional variations and metrics affect my experiment?
How do I trade off between risk and velocity?
False Discovery Rate = P( No Real Improvement | 10% Lift )
False Positive Rate = P( 10% Lift | No Real Improvement )
“How likely are my results if I assume there is no
underlying difference between my variation and control?
“How likely is it that my results are a fluke?”
Solution: Stats Engine controls False Discovery Rate by
becoming more conservative when more metrics and
variations are added to a test.
Stats Engine treats each metric as a “signal”.
High Signal metrics are directly affected by the
experiment
Low Signal metrics are indirectly or not at all affected by
the experiment
False Discovery Rate = P( No Real Improvement | 10% Lift )
False Positive Rate = P( 10% Lift | No Real Improvement )
“How likely are my results if I assume there is no
underlying difference between my variation and control?
“How likely is it that my results are a fluke?”
Solution: Stats Engine controls False Discovery Rate by
becoming more conservative when more low signal
metrics and variations are added to a test.
Variations
A
B
C
D
Metrics
1 2 3 4 5 6 7 8
Primary Secondary Monitoring
…
👍For maximum velocity, use “high
signal” primary and secondary metrics.
👍Use monitoring metrics for “low
signal” metrics.
opticon2017opticon2017
How to make decisions with Stats Engine
When should I stop an experiment?
Understanding resets
How do additional variations and metrics affect my experiment?
How do I trade off between risk and velocity?
Max False Discovery Rate
👍 Use your Statistical Significance threshold to control
risk vs. velocity.
opticon2017opticon2017
How to scale your decision process
Risk vs. Velocity for Experimentation Programs
Getting organizational buy-in
👍Define “risk classes” for your team’s experiments
👍Keep low-risk experiments “low touch”
👍Save data science analysis resources for high risk experiments
👍Run high-risk experiments for 1+ conversion cycles to control for
seasonality
👍Rerun high-risk experiments
Risk vs. Velocity for Experimentation
Programs
👍Decide how and when you’ll share experiment
results with your organization.
👍Write down your “decision process” and socialize
with the team
Getting organizational buy-in
opticon2017
Q&A
Pete Koomen
@koomen
pete@optimizely.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Optimizely & FedEx - Setting North-Star Metrics to Drive ROI from Experimenta...
Optimizely & FedEx - Setting North-Star Metrics to Drive ROI from Experimenta...Optimizely & FedEx - Setting North-Star Metrics to Drive ROI from Experimenta...
Optimizely & FedEx - Setting North-Star Metrics to Drive ROI from Experimenta...
 
Optimizely, HEMA & Farfetch - Unlock the Potential of Digital Experimentation...
Optimizely, HEMA & Farfetch - Unlock the Potential of Digital Experimentation...Optimizely, HEMA & Farfetch - Unlock the Potential of Digital Experimentation...
Optimizely, HEMA & Farfetch - Unlock the Potential of Digital Experimentation...
 
Experimentation as a growth strategy: A conversation with The Motley Fool
Experimentation as a growth strategy: A conversation with The Motley FoolExperimentation as a growth strategy: A conversation with The Motley Fool
Experimentation as a growth strategy: A conversation with The Motley Fool
 
[CXL Live 16] When, Why and How to Do Innovative Testing by Marie Polli
[CXL Live 16] When, Why and How to Do Innovative Testing by Marie Polli[CXL Live 16] When, Why and How to Do Innovative Testing by Marie Polli
[CXL Live 16] When, Why and How to Do Innovative Testing by Marie Polli
 
Optimizely & Photobox - DON'T PANIC: The No-Confusion Experimentation Startup...
Optimizely & Photobox - DON'T PANIC: The No-Confusion Experimentation Startup...Optimizely & Photobox - DON'T PANIC: The No-Confusion Experimentation Startup...
Optimizely & Photobox - DON'T PANIC: The No-Confusion Experimentation Startup...
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - Atlassian
 
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
[CXL Live 16] Beyond Test-by-Test Results: CRO Metrics for Performance & Insi...
 
Experimentation Excellence Webinar Series: Time to scale up
Experimentation Excellence Webinar Series: Time to scale upExperimentation Excellence Webinar Series: Time to scale up
Experimentation Excellence Webinar Series: Time to scale up
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit Growth
 
The Science of Getting Testing Right
The Science of Getting Testing RightThe Science of Getting Testing Right
The Science of Getting Testing Right
 
World Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ CompaniesWorld Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ Companies
 
Optimizely's Optimization Benchmark Findings Webinar Slides
Optimizely's Optimization Benchmark Findings Webinar SlidesOptimizely's Optimization Benchmark Findings Webinar Slides
Optimizely's Optimization Benchmark Findings Webinar Slides
 
Optimizely Workshop: Mobile Walkthrough
Optimizely Workshop: Mobile Walkthrough Optimizely Workshop: Mobile Walkthrough
Optimizely Workshop: Mobile Walkthrough
 
Build a Winning Conversion Optimization Strategy
Build a Winning Conversion Optimization StrategyBuild a Winning Conversion Optimization Strategy
Build a Winning Conversion Optimization Strategy
 
Product Experimentation | Forming Strong Experiment Hypotheses
Product Experimentation | Forming Strong Experiment HypothesesProduct Experimentation | Forming Strong Experiment Hypotheses
Product Experimentation | Forming Strong Experiment Hypotheses
 
A/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections DebunkedA/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections Debunked
 
Program Management 101: Best Practices from Optimizely-on-Optimizely
Program Management 101: Best Practices from Optimizely-on-OptimizelyProgram Management 101: Best Practices from Optimizely-on-Optimizely
Program Management 101: Best Practices from Optimizely-on-Optimizely
 
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
 
Testing Your Testing Program
Testing Your Testing ProgramTesting Your Testing Program
Testing Your Testing Program
 
Becoming a True Experimentation Organization to Drive Innovation
Becoming a True Experimentation Organization to Drive InnovationBecoming a True Experimentation Organization to Drive Innovation
Becoming a True Experimentation Organization to Drive Innovation
 

Ähnlich wie Opticon 2017 Experimenting with Stats Engine

Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15
James F. McCarthy
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
Leanleaders.org
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
Leanleaders.org
 

Ähnlich wie Opticon 2017 Experimenting with Stats Engine (20)

Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 
6 Guidelines for A/B Testing
6 Guidelines for A/B Testing6 Guidelines for A/B Testing
6 Guidelines for A/B Testing
 
Causality in Python PyCon 2021 ISRAEL
Causality in Python PyCon 2021 ISRAELCausality in Python PyCon 2021 ISRAEL
Causality in Python PyCon 2021 ISRAEL
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product Lead
 
Lean Six Sigma
Lean Six SigmaLean Six Sigma
Lean Six Sigma
 
Tale of Two Tests
Tale of Two TestsTale of Two Tests
Tale of Two Tests
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Compiling Analysis Results
Compiling Analysis ResultsCompiling Analysis Results
Compiling Analysis Results
 
Spss & minitab
Spss & minitabSpss & minitab
Spss & minitab
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 
Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)Hypothesis Testing: Spread (Compare 2+ Factors)
Hypothesis Testing: Spread (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)Hypothesis Testing: Proportions (Compare 1:Standard)
Hypothesis Testing: Proportions (Compare 1:Standard)
 
105 Advanced A-B Testing: Making Decisions with Data
105 Advanced A-B Testing: Making Decisions with Data105 Advanced A-B Testing: Making Decisions with Data
105 Advanced A-B Testing: Making Decisions with Data
 

Mehr von Optimizely

Mehr von Optimizely (20)

Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
 
Atlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development CycleAtlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development Cycle
 
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
 
The Future of Optimizely for Technical Teams
The Future of Optimizely for Technical TeamsThe Future of Optimizely for Technical Teams
The Future of Optimizely for Technical Teams
 
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
 
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
 
Building an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team OfferingBuilding an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team Offering
 
AMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server SideAMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server Side
 
Evolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product DevelopmentEvolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product Development
 
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented ArchitectureOvercoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
 
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
 
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives RevenueKick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
 
Experimentation through Clients' Eyes
Experimentation through Clients' EyesExperimentation through Clients' Eyes
Experimentation through Clients' Eyes
 
Shipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHubShipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHub
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with Experimentation
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature Delivery
 
The Future of Software Development
The Future of Software DevelopmentThe Future of Software Development
The Future of Software Development
 
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Opticon 2017 Experimenting with Stats Engine

  • 1. Experimenting with Stats EnginePete Koomen Co-founder, CTO, Optimizely @koomen pete@optimizely.com opticon2017
  • 2. Agenda Here 1. Why we built Stats Engine 2. How to make a decisions with Stats Engine 3. How to scale your decision process opticon2017
  • 4.
  • 5.
  • 6. The study followed 1,291 participants for 10 years. No exercise: 438 with 128 deaths (29%) Light exercise: 576 with 7 deaths (1%) Moderate exercise: 262 with 8 deaths (3%) Heavy exercise: 40 with 2 deaths (5%)
  • 7. “Thank goodness a third person didn't die, or public health authorities would be banning jogging.” – Alex Hutchinson, Runner’s World
  • 8.
  • 9.
  • 11. The “T-test” (a.k.a. “NHST”, a.k.a. “Student T-test” ) The T-test in a nutshell 1. Run your experiment until you have reached the required sample size, and then stop. 2. Ask “What are the chances I’d have gotten these results in an A/A test?” (p-value) 3. If p-value < 5%, your results are significant.
  • 12. 1908 Data is expensive. Data is slow. Practitioners are trained. 2017 Data is cheap. Data is real-time. Practitioners are everyone. The T-test was designed for this world
  • 13. T-Test Pitfalls 1. Peeking 2. Multiple comparisons
  • 15. p-Value < 5%. Significant! p-Value > 5%. Inconclusive. p-Value > 5%. Inconclusive. Min Sample Size Time Experiment Starts p-Value > 5%. Inconclusive.
  • 16. Why is this a problem? There is a ~5% chance of seeing a false positive each time you peek.
  • 17. p-Value < 5%. Significant! p-Value > 5%. Inconclusive. p-Value > 5%. Inconclusive. Min Sample Size Time Experiment Starts p-Value > 5%. Inconclusive. 4 peeks —> ~18% chance of seeing a false positive
  • 18. The “T-test” (a.k.a. “NHST”, a.k.a. “Student T-test” ) The T-test in a nutshell 1. Run your experiment until you have reached the required sample size, and then stop. 2. Ask “What are the chances I’d have gotten these results in an A/A test?” (p-value) 3. If p-value < 5%, your results are significant.
  • 19. 1:45 2:45 3:45 4:45 5:45
  • 20. Solution: Stats Engine uses sequential testing to compute an “always-valid” p-value.
  • 22. © Randall Patrick Munroe, xkcd.com
  • 23. © Randall Patrick Munroe, xkcd.com
  • 24. - - - - - Metrics 1 2 3 4 5 Variations A B C D Control
  • 25. False Discovery Rate = P( No Real Improvement | 10% Lift ) False Positive Rate = P( 10% Lift | No Real Improvement ) “How likely are my results if I assume there is no underlying difference between my variation and control? “How likely is it that my results are a fluke?” Solution: Stats Engine controls False Discovery Rate by becoming more conservative when more metrics and variations are added to a test.
  • 26. opticon2017opticon2017 How to make decisions with Stats Engine When should I stop an experiment? Understanding resets How do additional variations and metrics affect my experiment? How do I trade off between risk and velocity?
  • 27. opticon2017opticon2017 How to make decisions with Stats Engine When should I stop an experiment? Understanding resets How do additional variations and metrics affect my experiment? How do I trade off between risk and velocity?
  • 28. Variation 👍 Use “visitors remaining” to decide whether continuing your experiment is worth it.
  • 29. opticon2017opticon2017 How to make decisions with Stats Engine When should I stop an experiment? Understanding resets How do additional variations and metrics affect my experiment? How do I trade off between risk and velocity?
  • 31. “Peeking at A/B Tests: Why it matters, and what to do about it” KDD 2017 👍 Statistical Significance rises whenever there is strong evidence of a difference between variation and control
  • 32. “Peeking at A/B Tests: Why it matters, and what to do about it” KDD 2017 0
  • 33. Variatio Variation 👍 Statistical Significance will “reset” when there is strong evidence of an underlying change.
  • 34. Variation 👍 If your point estimate is near the edge of its confidence interval, consider running the experiment longer. -19.3% -2.58%
  • 35. opticon2017opticon2017 How to make decisions with Stats Engine When should I stop an experiment? Understanding resets How do additional variations and metrics affect my experiment? How do I trade off between risk and velocity?
  • 36. False Discovery Rate = P( No Real Improvement | 10% Lift ) False Positive Rate = P( 10% Lift | No Real Improvement ) “How likely are my results if I assume there is no underlying difference between my variation and control? “How likely is it that my results are a fluke?” Solution: Stats Engine controls False Discovery Rate by becoming more conservative when more metrics and variations are added to a test.
  • 37. Stats Engine treats each metric as a “signal”. High Signal metrics are directly affected by the experiment Low Signal metrics are indirectly or not at all affected by the experiment
  • 38. False Discovery Rate = P( No Real Improvement | 10% Lift ) False Positive Rate = P( 10% Lift | No Real Improvement ) “How likely are my results if I assume there is no underlying difference between my variation and control? “How likely is it that my results are a fluke?” Solution: Stats Engine controls False Discovery Rate by becoming more conservative when more low signal metrics and variations are added to a test.
  • 39. Variations A B C D Metrics 1 2 3 4 5 6 7 8 Primary Secondary Monitoring …
  • 40. 👍For maximum velocity, use “high signal” primary and secondary metrics. 👍Use monitoring metrics for “low signal” metrics.
  • 41. opticon2017opticon2017 How to make decisions with Stats Engine When should I stop an experiment? Understanding resets How do additional variations and metrics affect my experiment? How do I trade off between risk and velocity?
  • 42. Max False Discovery Rate 👍 Use your Statistical Significance threshold to control risk vs. velocity.
  • 43. opticon2017opticon2017 How to scale your decision process Risk vs. Velocity for Experimentation Programs Getting organizational buy-in
  • 44. 👍Define “risk classes” for your team’s experiments 👍Keep low-risk experiments “low touch” 👍Save data science analysis resources for high risk experiments 👍Run high-risk experiments for 1+ conversion cycles to control for seasonality 👍Rerun high-risk experiments Risk vs. Velocity for Experimentation Programs
  • 45. 👍Decide how and when you’ll share experiment results with your organization. 👍Write down your “decision process” and socialize with the team Getting organizational buy-in