SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Tale of Two Tests
Jimmy Jin
Statistician
Mei Luo
Strategic Customer Success Manager
Experimentation in
the Digital Age
• You want to run an experiment on the
background image on the homepage of your e-
commerce clothing site, Attic & Button
• As a practitioner, you would want...
Results in real time
Evaluate impact on
multiple KPIs
Run experiment with
minimal data inputs
A brief
refresher
Review of basic terms
• p-value: The probability of observing a given result if
there is no difference
• False positive rate (Type 1 error rate): “How often will
the test detect an illusion?”
• Power (1 minus Type 2 error rate): “How often will the test
detect the real thing?”
Steps to doing a t-test
1. Calculate a required sample size for your A/B test
1. Depends on the minimum detectable effect (MDE)
2. Collect your data
3. Make a decision
Continuing past the prescribed sample size or
stopping early is NOT allowed.
The peeking
problem
Activity 1: peek for yourselves
External demo!
False positive rate inflation in a t-test
Why is a t-test sensitive to this?
The Stats Engine solution (essentially)
Upshot: faster results
Results Dashboard
Significance increases as more data is collected
t-test comparison on results page data
Summary – The Peeking Problem
t-test:
• Peeking during a t-test increases the chance you’ll find a winning
result when none actually exists (a false positive)
Stats Engine:
• Sequential testing enables evaluation of experiment data as it is
collected. Tests can be stopped at any time with valid results.
The guessing
problem
The bane of t-testing
Missed connections
Activity 2: the lift is right
Guess the lift (and see the consequences) of 2 actual
Optimizely experiments.
What is the expected lift to
subscription conversion rate?
Original Variant
What is the expected lift to subscription
conversion rate?
 -5%
 -2%
 2%
 5%
+35.75%
What is the expected lift to add-to-cart rate?
Original Variant
 -10%
 -15%
 20%
 2%
+16.46%
What is the expected lift to the add-to-cart rate?
Why is power limited in fixed-horizon tests?
The sequential advantage
How is the Optimizely calculator different?
Summary – The Guessing Problem
t-test
• If you set a small MDE, tests will take longer to conclude. If you set a
large MDE, you may miss smaller improvements.
Stats Engine
• When the true lift exceeds your MDE, you’ll be able to call your test
faster.
The multiple
comparisons
problem
A higher risk of false positives
Built-in protections in Stats Engine
Ordinary, corrections for multiple comparisons happen
after all tests have concluded.
In Optimizely, we perform these corrections in real time
so your results are protected no matter when you look
check your experiment.
Activity 3: false positives
You conduct an experiment with many variations. Under
which scenario would you suspect more false positives?
1. You obtain 5 significant results.
2. You obtain 50 significant results.
false positives vs. false discoveries
False positive rate
P( significant | no true effect )
False discovery rate
P( no true effect | significant )
FDR corrections in real time
FDR tiering
Example: FDR tiering in an actual
experiment
Let’s walk through an actual experiment!
Summary – The Multiple Comparisons
Problemt-test
• Traditional statistics control for false positive rates which does not
equate to the probability of making an incorrect business decision
Stats Engine
• Stats Engine controls for false discovery rate; as you add more metrics
to your experiment, Optimizely will become more conservative in
calling a winner or loser
Summary
3 Takeaways
• Monitor results in real-time for faster experimentation,
without increased error rates
• Run fully powered experiments without guessing at
sample size calculations
• Evaluate impact on many metrics without
sacrificing accuracy
Stats Engine allows you to...
Q&A
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Full Stack Experimentation
Full Stack ExperimentationFull Stack Experimentation
Full Stack ExperimentationOptimizely
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthOptimizely
 
Optimizely Demo Deck
Optimizely Demo DeckOptimizely Demo Deck
Optimizely Demo DeckMattharth
 
Optimizing Your B2B Demand Generation Machine
Optimizing Your B2B Demand Generation MachineOptimizing Your B2B Demand Generation Machine
Optimizing Your B2B Demand Generation MachineOptimizely
 
Optimism Webinar 1: Improving your digital experiences - what's next in 2019?
Optimism Webinar 1:  Improving your digital experiences - what's next in 2019?Optimism Webinar 1:  Improving your digital experiences - what's next in 2019?
Optimism Webinar 1: Improving your digital experiences - what's next in 2019?Optimizely
 
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...Optimizely
 
Mailchimp: Scaling Experimentation Across Teams
Mailchimp: Scaling Experimentation Across TeamsMailchimp: Scaling Experimentation Across Teams
Mailchimp: Scaling Experimentation Across TeamsOptimizely
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkOptimizely
 
Speed Matters - Strategies to Improve Your Site Performance
Speed Matters - Strategies to Improve Your Site PerformanceSpeed Matters - Strategies to Improve Your Site Performance
Speed Matters - Strategies to Improve Your Site PerformanceOptimizely
 
World Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ CompaniesWorld Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ CompaniesOptimizely
 
Opticon 2017 Driving Bottom Line Impact
Opticon 2017 Driving Bottom Line ImpactOpticon 2017 Driving Bottom Line Impact
Opticon 2017 Driving Bottom Line ImpactOptimizely
 
Making Your Hypothesis Work Harder to Inform Future Product Strategy
Making Your Hypothesis Work Harder to Inform Future Product StrategyMaking Your Hypothesis Work Harder to Inform Future Product Strategy
Making Your Hypothesis Work Harder to Inform Future Product StrategyOptimizely
 
Losing is the New Winning
Losing is the New WinningLosing is the New Winning
Losing is the New WinningOptimizely
 
Definition of A/B testing and Case Studies by Optimizely
Definition of A/B testing and Case Studies by OptimizelyDefinition of A/B testing and Case Studies by Optimizely
Definition of A/B testing and Case Studies by OptimizelyRusseWeb
 
Getting Started with Server-Side Testing
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side TestingOptimizely
 
Optimizely Under the Hood Series: Managing Experimentation at Scale
Optimizely Under the Hood  Series: Managing Experimentation at ScaleOptimizely Under the Hood  Series: Managing Experimentation at Scale
Optimizely Under the Hood Series: Managing Experimentation at ScaleOptimizely
 
Aeroméxico: Improving the Booking Experience with Data-First Personalization
Aeroméxico: Improving the Booking Experience with Data-First PersonalizationAeroméxico: Improving the Booking Experience with Data-First Personalization
Aeroméxico: Improving the Booking Experience with Data-First PersonalizationOptimizely
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
 

Was ist angesagt? (20)

Full Stack Experimentation
Full Stack ExperimentationFull Stack Experimentation
Full Stack Experimentation
 
An Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit GrowthAn Experimentation Framework: How to Position for Triple Digit Growth
An Experimentation Framework: How to Position for Triple Digit Growth
 
Optimizely Demo Deck
Optimizely Demo DeckOptimizely Demo Deck
Optimizely Demo Deck
 
Optimizing Your B2B Demand Generation Machine
Optimizing Your B2B Demand Generation MachineOptimizing Your B2B Demand Generation Machine
Optimizing Your B2B Demand Generation Machine
 
Optimism Webinar 1: Improving your digital experiences - what's next in 2019?
Optimism Webinar 1:  Improving your digital experiences - what's next in 2019?Optimism Webinar 1:  Improving your digital experiences - what's next in 2019?
Optimism Webinar 1: Improving your digital experiences - what's next in 2019?
 
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...
Under the Hood Webinar Series: B2B Experimentation & Personalization at Optim...
 
Mailchimp: Scaling Experimentation Across Teams
Mailchimp: Scaling Experimentation Across TeamsMailchimp: Scaling Experimentation Across Teams
Mailchimp: Scaling Experimentation Across Teams
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - Atlassian
 
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the GuessworkWhat Your Customers Really Do Online: 5 Ways to Remove the Guesswork
What Your Customers Really Do Online: 5 Ways to Remove the Guesswork
 
Speed Matters - Strategies to Improve Your Site Performance
Speed Matters - Strategies to Improve Your Site PerformanceSpeed Matters - Strategies to Improve Your Site Performance
Speed Matters - Strategies to Improve Your Site Performance
 
World Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ CompaniesWorld Class Optimization: Benchmarking 1,000+ Companies
World Class Optimization: Benchmarking 1,000+ Companies
 
Opticon 2017 Driving Bottom Line Impact
Opticon 2017 Driving Bottom Line ImpactOpticon 2017 Driving Bottom Line Impact
Opticon 2017 Driving Bottom Line Impact
 
Making Your Hypothesis Work Harder to Inform Future Product Strategy
Making Your Hypothesis Work Harder to Inform Future Product StrategyMaking Your Hypothesis Work Harder to Inform Future Product Strategy
Making Your Hypothesis Work Harder to Inform Future Product Strategy
 
Losing is the New Winning
Losing is the New WinningLosing is the New Winning
Losing is the New Winning
 
Magento Meetup New Delhi- AB Testing
Magento Meetup New Delhi- AB TestingMagento Meetup New Delhi- AB Testing
Magento Meetup New Delhi- AB Testing
 
Definition of A/B testing and Case Studies by Optimizely
Definition of A/B testing and Case Studies by OptimizelyDefinition of A/B testing and Case Studies by Optimizely
Definition of A/B testing and Case Studies by Optimizely
 
Getting Started with Server-Side Testing
Getting Started with Server-Side TestingGetting Started with Server-Side Testing
Getting Started with Server-Side Testing
 
Optimizely Under the Hood Series: Managing Experimentation at Scale
Optimizely Under the Hood  Series: Managing Experimentation at ScaleOptimizely Under the Hood  Series: Managing Experimentation at Scale
Optimizely Under the Hood Series: Managing Experimentation at Scale
 
Aeroméxico: Improving the Booking Experience with Data-First Personalization
Aeroméxico: Improving the Booking Experience with Data-First PersonalizationAeroméxico: Improving the Booking Experience with Data-First Personalization
Aeroméxico: Improving the Booking Experience with Data-First Personalization
 
Optimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with StatisticsOptimizely Workshop: Take Action on Results with Statistics
Optimizely Workshop: Take Action on Results with Statistics
 

Ähnlich wie Tale of Two Tests

Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadProduct School
 
Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15James F. McCarthy
 
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie
 
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingSAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
 
Yogurt -Weekly Predictions: The Future, Today
Yogurt -Weekly Predictions: The Future, TodayYogurt -Weekly Predictions: The Future, Today
Yogurt -Weekly Predictions: The Future, TodayJoosworks.com
 
Yogurt : Predicting the Unpredictable with 95% Accuracy
Yogurt : Predicting the Unpredictable with 95% AccuracyYogurt : Predicting the Unpredictable with 95% Accuracy
Yogurt : Predicting the Unpredictable with 95% AccuracyJoosworks.com
 
Opticon 2017 Experimenting with Stats Engine
Opticon 2017 Experimenting with Stats EngineOpticon 2017 Experimenting with Stats Engine
Opticon 2017 Experimenting with Stats EngineOptimizely
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingJack Nguyen (Hung Tien)
 
Validation and hypothesis based product management by Abdallah Al-Khalidi
Validation and hypothesis based  product management by Abdallah Al-KhalidiValidation and hypothesis based  product management by Abdallah Al-Khalidi
Validation and hypothesis based product management by Abdallah Al-KhalidiAbdallah Al-Khalidi
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's LawDomino Data Lab
 
The Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffThe Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffMatt Hansen
 
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...TEST Huddle
 
Debugging Intermittent Issues - A How To
Debugging Intermittent Issues - A How ToDebugging Intermittent Issues - A How To
Debugging Intermittent Issues - A How ToLloydMoore
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJuliosarahdijulio
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. EvaluationsBigML, Inc
 

Ähnlich wie Tale of Two Tests (20)

The Finishing Line
The Finishing LineThe Finishing Line
The Finishing Line
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product Lead
 
Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15Why learn Six Sigma, 4,28,15
Why learn Six Sigma, 4,28,15
 
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B Testing
 
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingSAMPLE SIZE – The indispensable A/B test calculation that you’re not making
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
 
Yogurt -Weekly Predictions: The Future, Today
Yogurt -Weekly Predictions: The Future, TodayYogurt -Weekly Predictions: The Future, Today
Yogurt -Weekly Predictions: The Future, Today
 
Yogurt : Predicting the Unpredictable with 95% Accuracy
Yogurt : Predicting the Unpredictable with 95% AccuracyYogurt : Predicting the Unpredictable with 95% Accuracy
Yogurt : Predicting the Unpredictable with 95% Accuracy
 
Opticon 2017 Experimenting with Stats Engine
Opticon 2017 Experimenting with Stats EngineOpticon 2017 Experimenting with Stats Engine
Opticon 2017 Experimenting with Stats Engine
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
Validation and hypothesis based product management by Abdallah Al-Khalidi
Validation and hypothesis based  product management by Abdallah Al-KhalidiValidation and hypothesis based  product management by Abdallah Al-Khalidi
Validation and hypothesis based product management by Abdallah Al-Khalidi
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
The Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuffThe Necessity of the Measure Phase with Matt Hansen at StatStuff
The Necessity of the Measure Phase with Matt Hansen at StatStuff
 
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
Testing As A Bottleneck - How Testing Slows Down Modern Development Processes...
 
Value added testing (VAT)
Value added testing (VAT)Value added testing (VAT)
Value added testing (VAT)
 
Doing monitoring right
Doing monitoring rightDoing monitoring right
Doing monitoring right
 
Lean Six Sigma
Lean Six SigmaLean Six Sigma
Lean Six Sigma
 
Debugging Intermittent Issues - A How To
Debugging Intermittent Issues - A How ToDebugging Intermittent Issues - A How To
Debugging Intermittent Issues - A How To
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
VSSML18. Evaluations
VSSML18. EvaluationsVSSML18. Evaluations
VSSML18. Evaluations
 

Mehr von Optimizely

Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationOptimizely
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Optimizely
 
Atlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development CycleAtlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development CycleOptimizely
 
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...Optimizely
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueOptimizely
 
The Future of Optimizely for Technical Teams
The Future of Optimizely for Technical TeamsThe Future of Optimizely for Technical Teams
The Future of Optimizely for Technical TeamsOptimizely
 
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...Optimizely
 
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...Optimizely
 
Building an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team OfferingBuilding an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team OfferingOptimizely
 
AMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server SideAMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server SideOptimizely
 
Evolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product DevelopmentEvolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product DevelopmentOptimizely
 
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented ArchitectureOvercoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented ArchitectureOptimizely
 
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...Optimizely
 
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives RevenueKick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives RevenueOptimizely
 
Experimentation through Clients' Eyes
Experimentation through Clients' EyesExperimentation through Clients' Eyes
Experimentation through Clients' EyesOptimizely
 
Shipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHubShipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHubOptimizely
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationOptimizely
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely
 
The Future of Software Development
The Future of Software DevelopmentThe Future of Software Development
The Future of Software DevelopmentOptimizely
 
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...Optimizely
 

Mehr von Optimizely (20)

Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
 
Atlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development CycleAtlassian's Mystique CLI, Minimizing the Experiment Development Cycle
Atlassian's Mystique CLI, Minimizing the Experiment Development Cycle
 
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion RevenueZillow + Optimizely: Building the Bridge to $20 Billion Revenue
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
 
The Future of Optimizely for Technical Teams
The Future of Optimizely for Technical TeamsThe Future of Optimizely for Technical Teams
The Future of Optimizely for Technical Teams
 
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
 
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
 
Building an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team OfferingBuilding an Experiment Pipeline for GitHub’s New Free Team Offering
Building an Experiment Pipeline for GitHub’s New Free Team Offering
 
AMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server SideAMC Networks Experiments Faster on the Server Side
AMC Networks Experiments Faster on the Server Side
 
Evolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product DevelopmentEvolving Experimentation from CRO to Product Development
Evolving Experimentation from CRO to Product Development
 
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented ArchitectureOvercoming the Challenges of Experimentation on a Service Oriented Architecture
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
 
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
 
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives RevenueKick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
 
Experimentation through Clients' Eyes
Experimentation through Clients' EyesExperimentation through Clients' Eyes
Experimentation through Clients' Eyes
 
Shipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHubShipping to Learn and Accelerate Growth with GitHub
Shipping to Learn and Accelerate Growth with GitHub
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with ExperimentationTest Everything: TrustRadius Delivers Customer Value with Experimentation
Test Everything: TrustRadius Delivers Customer Value with Experimentation
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature DeliveryOptimizely Agent: Scaling Resilient Feature Delivery
Optimizely Agent: Scaling Resilient Feature Delivery
 
The Future of Software Development
The Future of Software DevelopmentThe Future of Software Development
The Future of Software Development
 
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
 

Kürzlich hochgeladen

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Kürzlich hochgeladen (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Tale of Two Tests

  • 1.
  • 2. Tale of Two Tests Jimmy Jin Statistician Mei Luo Strategic Customer Success Manager
  • 3. Experimentation in the Digital Age • You want to run an experiment on the background image on the homepage of your e- commerce clothing site, Attic & Button • As a practitioner, you would want...
  • 4. Results in real time Evaluate impact on multiple KPIs Run experiment with minimal data inputs
  • 6. Review of basic terms • p-value: The probability of observing a given result if there is no difference • False positive rate (Type 1 error rate): “How often will the test detect an illusion?” • Power (1 minus Type 2 error rate): “How often will the test detect the real thing?”
  • 7. Steps to doing a t-test 1. Calculate a required sample size for your A/B test 1. Depends on the minimum detectable effect (MDE) 2. Collect your data 3. Make a decision Continuing past the prescribed sample size or stopping early is NOT allowed.
  • 9. Activity 1: peek for yourselves External demo!
  • 10. False positive rate inflation in a t-test
  • 11. Why is a t-test sensitive to this?
  • 12. The Stats Engine solution (essentially)
  • 14. Results Dashboard Significance increases as more data is collected
  • 15. t-test comparison on results page data
  • 16. Summary – The Peeking Problem t-test: • Peeking during a t-test increases the chance you’ll find a winning result when none actually exists (a false positive) Stats Engine: • Sequential testing enables evaluation of experiment data as it is collected. Tests can be stopped at any time with valid results.
  • 18. The bane of t-testing
  • 20. Activity 2: the lift is right Guess the lift (and see the consequences) of 2 actual Optimizely experiments.
  • 21. What is the expected lift to subscription conversion rate? Original Variant
  • 22. What is the expected lift to subscription conversion rate?  -5%  -2%  2%  5% +35.75%
  • 23. What is the expected lift to add-to-cart rate? Original Variant
  • 24.  -10%  -15%  20%  2% +16.46% What is the expected lift to the add-to-cart rate?
  • 25. Why is power limited in fixed-horizon tests?
  • 27. How is the Optimizely calculator different?
  • 28. Summary – The Guessing Problem t-test • If you set a small MDE, tests will take longer to conclude. If you set a large MDE, you may miss smaller improvements. Stats Engine • When the true lift exceeds your MDE, you’ll be able to call your test faster.
  • 30. A higher risk of false positives
  • 31. Built-in protections in Stats Engine Ordinary, corrections for multiple comparisons happen after all tests have concluded. In Optimizely, we perform these corrections in real time so your results are protected no matter when you look check your experiment.
  • 32. Activity 3: false positives You conduct an experiment with many variations. Under which scenario would you suspect more false positives? 1. You obtain 5 significant results. 2. You obtain 50 significant results.
  • 33. false positives vs. false discoveries False positive rate P( significant | no true effect ) False discovery rate P( no true effect | significant )
  • 34. FDR corrections in real time
  • 36. Example: FDR tiering in an actual experiment Let’s walk through an actual experiment!
  • 37. Summary – The Multiple Comparisons Problemt-test • Traditional statistics control for false positive rates which does not equate to the probability of making an incorrect business decision Stats Engine • Stats Engine controls for false discovery rate; as you add more metrics to your experiment, Optimizely will become more conservative in calling a winner or loser
  • 39. 3 Takeaways • Monitor results in real-time for faster experimentation, without increased error rates • Run fully powered experiments without guessing at sample size calculations • Evaluate impact on many metrics without sacrificing accuracy Stats Engine allows you to...
  • 40. Q&A