SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
An introduction to A-B test 
数据挖掘组王犇(garfieldwang) 
2014-10
Controlled experiment
example
example 
•random variable 
•null hypothesis 
•Z-score approximate
example
Hypothesis testing 
1.State a null and alternative hypothesis clearly (one-tailed or two-tailed test) e.g. one-tailed 
2.Determine a test size (significance level). e.g. test size(alpha) = 0.05, critical value=1.645 
3.Decision-making: reject or do not reject the null hypothesis. e.g. test statistic = 2.25, p-value = 0.02 … 
4.Draw a conclusion and interpret substantively
Statistic Power 
•Type I Error (α) : probability of rejecting the null hypothesis when it is true 
•Type II Error(β) : accept a wrong null hypothesis [beta] 
•Power of a test(1-β):the probability that it will correctlylead to the rejection of a false null hypothesis
Determining sample size 
•Formula 1
Determining sample size 
•the point where the upper value ofαon the null curve and the value forβon the alternative curve meet 
•80% Power,95% confidence level (Lehr`s equation) 
•assume that the distribution of the mean is normal
Determining sample size 
•Formula 2 
–When |Skewness| > 1 , 355 ×S^2 for each variant 
–In order to close normal distribution 
–skewness: is a measure of the asymmetry of theprobability distributionof areal-valuedrandom variableabout its mean. [ from wiki ]
Rules -Small Changes can have a Big Impact to Key Metrics 
Sessions success rate improved, time-to-success improved, +$10M annually 
This kindle of succis rare
Rules -Speed Matters a LOT 
•every 100msec speedup improves revenue by 0.6%
Rules -Reducing Abandonment is Hard, Shifting Clicks is Easy 
•local improvements are easy 
•global improvements are much harder 
•succ 
–significant improvements to relevance, 
–anti-malware flight
More Tips 
•A-A test 
•Primacy & newness effects 
•Robots 
•Long-term goals
Beyond A-B test 
•Overlapping Experiment Infrastructure—More、Better、Fast
Reference 
•[1] Jesse Farmer. Statistical Analysis and A/B Testing 
•[2] Ron Kohavi. Controlled experiments on the web : survey and practical guide 
•[3] Ron Kohavi. Seven Rules of Thumb for Web Site Experimenters. KDD 2014 
•[4] Diane Tang. Overlapping Experiment Infrastructure : More, Better, Faster Experimentation. KDD 2010 
•[5] Charles DiMaggio. Power Tools for Epidemiologists. 2014 
•[6] Gerald van Belle. Statistical Rules of Thumb
RTX: garfieldwangmail: benwang177@gmail.com 
Thanks

Weitere ähnliche Inhalte

Ähnlich wie A Introduction To A-B Test

addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceSoheila Dehghanzadeh
 
Test Case Design
Test Case DesignTest Case Design
Test Case Designacatalin
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientificRevenue
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsAndrea Arcuri
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Julián Urbano
 
Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Ahmed Negida
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaCytel
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugmCytel USA
 
The effect of episodic retrieval on inhibition in task switching
The effect of episodic retrieval on inhibition in task switchingThe effect of episodic retrieval on inhibition in task switching
The effect of episodic retrieval on inhibition in task switchingJimGrange
 
Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10Beamsync
 

Ähnlich wie A Introduction To A-B Test (20)

Significance Tests
Significance TestsSignificance Tests
Significance Tests
 
addressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenanceaddressing tim/quality trade-off in view maintenance
addressing tim/quality trade-off in view maintenance
 
ABTest-20231020.pptx
ABTest-20231020.pptxABTest-20231020.pptx
ABTest-20231020.pptx
 
Test Case Design
Test Case DesignTest Case Design
Test Case Design
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
The Finishing Line
The Finishing LineThe Finishing Line
The Finishing Line
 
Acceptance sampling
Acceptance samplingAcceptance sampling
Acceptance sampling
 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
 
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...Statistical Significance Testing in Information Retrieval: An Empirical Analy...
Statistical Significance Testing in Information Retrieval: An Empirical Analy...
 
joe-olsen.pptx
joe-olsen.pptxjoe-olsen.pptx
joe-olsen.pptx
 
Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...Statistical methods for questionnaire development: Questionnaire reliability ...
Statistical methods for questionnaire development: Questionnaire reliability ...
 
East ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehtaEast ugm-2012-presentation-east-future-mehta
East ugm-2012-presentation-east-future-mehta
 
Eugm 2012 mehta - future plans for east - 2012 eugm
Eugm 2012   mehta - future plans for east - 2012 eugmEugm 2012   mehta - future plans for east - 2012 eugm
Eugm 2012 mehta - future plans for east - 2012 eugm
 
Chap017
Chap017Chap017
Chap017
 
The effect of episodic retrieval on inhibition in task switching
The effect of episodic retrieval on inhibition in task switchingThe effect of episodic retrieval on inhibition in task switching
The effect of episodic retrieval on inhibition in task switching
 
Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10Introduction to Business Analytics Course Part 10
Introduction to Business Analytics Course Part 10
 
Sampling plan
Sampling planSampling plan
Sampling plan
 
Chi square
Chi squareChi square
Chi square
 
Data analysis
Data analysisData analysis
Data analysis
 

Kürzlich hochgeladen

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Kürzlich hochgeladen (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

A Introduction To A-B Test

  • 1. An introduction to A-B test 数据挖掘组王犇(garfieldwang) 2014-10
  • 4. example •random variable •null hypothesis •Z-score approximate
  • 6. Hypothesis testing 1.State a null and alternative hypothesis clearly (one-tailed or two-tailed test) e.g. one-tailed 2.Determine a test size (significance level). e.g. test size(alpha) = 0.05, critical value=1.645 3.Decision-making: reject or do not reject the null hypothesis. e.g. test statistic = 2.25, p-value = 0.02 … 4.Draw a conclusion and interpret substantively
  • 7. Statistic Power •Type I Error (α) : probability of rejecting the null hypothesis when it is true •Type II Error(β) : accept a wrong null hypothesis [beta] •Power of a test(1-β):the probability that it will correctlylead to the rejection of a false null hypothesis
  • 8. Determining sample size •Formula 1
  • 9. Determining sample size •the point where the upper value ofαon the null curve and the value forβon the alternative curve meet •80% Power,95% confidence level (Lehr`s equation) •assume that the distribution of the mean is normal
  • 10. Determining sample size •Formula 2 –When |Skewness| > 1 , 355 ×S^2 for each variant –In order to close normal distribution –skewness: is a measure of the asymmetry of theprobability distributionof areal-valuedrandom variableabout its mean. [ from wiki ]
  • 11. Rules -Small Changes can have a Big Impact to Key Metrics Sessions success rate improved, time-to-success improved, +$10M annually This kindle of succis rare
  • 12. Rules -Speed Matters a LOT •every 100msec speedup improves revenue by 0.6%
  • 13. Rules -Reducing Abandonment is Hard, Shifting Clicks is Easy •local improvements are easy •global improvements are much harder •succ –significant improvements to relevance, –anti-malware flight
  • 14. More Tips •A-A test •Primacy & newness effects •Robots •Long-term goals
  • 15. Beyond A-B test •Overlapping Experiment Infrastructure—More、Better、Fast
  • 16. Reference •[1] Jesse Farmer. Statistical Analysis and A/B Testing •[2] Ron Kohavi. Controlled experiments on the web : survey and practical guide •[3] Ron Kohavi. Seven Rules of Thumb for Web Site Experimenters. KDD 2014 •[4] Diane Tang. Overlapping Experiment Infrastructure : More, Better, Faster Experimentation. KDD 2010 •[5] Charles DiMaggio. Power Tools for Epidemiologists. 2014 •[6] Gerald van Belle. Statistical Rules of Thumb