6. Hypothesis testing
1.State a null and alternative hypothesis clearly (one-tailed or two-tailed test) e.g. one-tailed
2.Determine a test size (significance level). e.g. test size(alpha) = 0.05, critical value=1.645
3.Decision-making: reject or do not reject the null hypothesis. e.g. test statistic = 2.25, p-value = 0.02 …
4.Draw a conclusion and interpret substantively
7. Statistic Power
•Type I Error (α) : probability of rejecting the null hypothesis when it is true
•Type II Error(β) : accept a wrong null hypothesis [beta]
•Power of a test(1-β):the probability that it will correctlylead to the rejection of a false null hypothesis
9. Determining sample size
•the point where the upper value ofαon the null curve and the value forβon the alternative curve meet
•80% Power,95% confidence level (Lehr`s equation)
•assume that the distribution of the mean is normal
10. Determining sample size
•Formula 2
–When |Skewness| > 1 , 355 ×S^2 for each variant
–In order to close normal distribution
–skewness: is a measure of the asymmetry of theprobability distributionof areal-valuedrandom variableabout its mean. [ from wiki ]
11. Rules -Small Changes can have a Big Impact to Key Metrics
Sessions success rate improved, time-to-success improved, +$10M annually
This kindle of succis rare
13. Rules -Reducing Abandonment is Hard, Shifting Clicks is Easy
•local improvements are easy
•global improvements are much harder
•succ
–significant improvements to relevance,
–anti-malware flight
14. More Tips
•A-A test
•Primacy & newness effects
•Robots
•Long-term goals
15. Beyond A-B test
•Overlapping Experiment Infrastructure—More、Better、Fast
16. Reference
•[1] Jesse Farmer. Statistical Analysis and A/B Testing
•[2] Ron Kohavi. Controlled experiments on the web : survey and practical guide
•[3] Ron Kohavi. Seven Rules of Thumb for Web Site Experimenters. KDD 2014
•[4] Diane Tang. Overlapping Experiment Infrastructure : More, Better, Faster Experimentation. KDD 2010
•[5] Charles DiMaggio. Power Tools for Epidemiologists. 2014
•[6] Gerald van Belle. Statistical Rules of Thumb