Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Talks@Coursera - A/B Testing @ Internet Scale

24.186 Aufrufe

Veröffentlicht am

Talks@Coursera

This tech talk will describe how to build an experiment platform that can handle large-scale experiments. The talk will also discuss several best practices in designing and analyzing online experiments learned from companies like Coursera, Microsoft and LinkedIn.

About the Speakers

Ya Xu has been working in the domain of online A/B testing for over 4 years. She currently leads a team of engineers and data scientists building a world-class online A/B testing platform at LinkedIn. She also spearheads taking LinkedIn's A/B testing culture to the next level by evangelizing best practices and pushing for broad-based platform adoption. She holds a Ph.D. in Statistics from Stanford University.

Chuong (Tom) Do currently leads a team of data engineers and analysts in the Analytics team at Coursera, which is responsible for data infrastructure and quantitative analysis in support of the product and business. He completed his Ph.D. in Computer Science at Stanford University in 2009 and worked as a scientist in the personal genetics company 23andMe until 2012, where his research has collectively spanned the fields of machine learning, computational biology, and statistical genetics.

Veröffentlicht in: Ingenieurwesen
  • D0WNL0AD FULL ▶ ▶ ▶ ▶ http://1lite.top/d6QWZ ◀ ◀ ◀ ◀
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ,DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yyxo9sk7 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Talks@Coursera - A/B Testing @ Internet Scale

  1. 1. A/B Testing @ Internet Scale Ya Xu 8/12/2014 @ Coursera
  2. 2. A/B Testing in One Slide 20%80% Collect results to determine which one is better Join now Control Treatment
  3. 3. Outline § Culture Challenge –  Why A/B testing –  What to A/B test § Building a scalable experimentation system § Best practices 3
  4. 4. Why A/B Testing
  5. 5. Amazon Shopping Cart Recommendation 5 •  At Amazon, Greg Linden had this idea of showing recommendations based on cart items •  Trade-offs •  Pro: cross-sell more items (increase average basket size) •  Con: distract people from checking out (reduce conversion) •  HiPPO (Highest Paid Person’s Opinion) : stop the project From Greg Linden’s Blog: http://glinden.blogspot.com/2006/04/early-amazon-shopping-cart.html http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
  6. 6. MSN Real Estate § “Find a house” widget variations § Revenue to MSN generated every time a user clicks search/find button 6 A B http://www.exp-platform.com/Documents/2012-08%20Puzzling%20Outcomes%20KDD.pptx
  7. 7. Take-away Experiments are the only way to prove causality. 7 Use A/B testing to: § Guide product development § Measure impact (assess ROI) § Gain “real” customer feedback
  8. 8. What to A/B Test 8
  9. 9. Ads CTR Drop 9 Sudden drop on 11/11/2013 Profile top ads
  10. 10. Root-Cause 10 5 Pixels!! Navigation bar Profile top ads
  11. 11. What to A/B Test § Evaluating new ideas: –  Visual changes –  Complete redesign of web page –  Relevance algorithms –  … § Platform changes § Code refactoring § Bug fixes 11 Test Everything!
  12. 12. Startups vs. Big Websites § Do startups have enough users to A/B test? –  Startups typically look for larger effects –  5% vs. 0.5% difference è 100 times more users! § Startups should establish A/B testing culture early 12
  13. 13. A Scalable Experimentation System 13
  14. 14. A/B Testing 3 Steps 14 Design •  What/Whom to experiment on Deploy •  Code deployment Analyze •  Impact on metrics
  15. 15. A/B Testing Platform Architecture 1.  Experiment Management 2.  Online Infrastructure 3.  Offline Analysis 15 Example: Bing A/B
  16. 16. 1. Experiment Management § Define experiments –  Whom to target? –  How to split traffic? § Start/stop an experiment § Important addition: –  Define success criteria –  Power analysis 16
  17. 17. 2. Online Infrastructure 1)  Hash & partition: random & consistent 2)  Deploy: server-side, as a change to –  The default configuration (Bing) –  The default code path (LinkedIn) 3)  Data logging 17 0% 100% Treatment1 D20% D20% Hash (ID) Treatment2 Control
  18. 18. Hash & Partition @ Scale (I) § Pure bucket system (Google/Bing before 200X) 18 0% 100% Exp. 1 D20% D20% Exp. 2 Exp. 3 60% red green yellow 15% 15%30% •  Does not scale •  Traffic management
  19. 19. Hash & Partition @ Scale (II) § Fully overlapping system 0% 100% D Exp. 2 A2 B2 control Exp.1 controlA1 D B1 D •  Each experiment gets 100% traffic •  A user is in “all” experiments simultaneously •  Randomization btw experiments are independent (unique hashID) •  Cannot avoid interaction
  20. 20. Hash & Partition @ Scale (III) § Hybrid: Layer + Domain 20 •  Centralized management (Bing) •  Central exp. team creates/manages layers/domains •  De-centralized management (LinkedIn) •  Each experiment is one “layer” by default •  Experimenter controls hashID to create a “domain”
  21. 21. Data Logging §  Trigger §  Trigger-based logging –  Log whether a request is actually affected by the experiment –  Log for both factual & counter-factual 21 All LinkedIn members 300MM + Triggered: Members visiting contacts page
  22. 22. 3. Automated Offline Analysis §  Large-scale data processing, e.g. daily @LinkedIn –  200+ experiments –  700+ metrics –  Billions of experiment trigger events §  Statistical analysis –  Metrics design –  Statistical significance test (p-value, confidence interval) –  Deep-dive: slicing & dicing capability §  Monitoring & alerting –  Data quality –  Early termination 22
  23. 23. Best Practices 23
  24. 24. Example: Unified Search
  25. 25. What to Experiment? Measure one change at a time. Unified Search Experiments 1+2+…N50% En-US Pre-unified search 50% En-US
  26. 26. What to Measure? § Success metrics: summarize whether treatment is better § Puzzling example: –  Key metrics for Bing: number of searches & revenue –  Ranking bug in experiment resulted in poor search results –  Number of searches up +10% and revenue up +30% Success metrics should reflect long term impact
  27. 27. Scientific Experiment Design § How long to run the experiment? § How much traffic to allocate to treatment? Story: §  Site speed matters –  Bing: +100msec = -0.6% revenue –  Amazon: +100msec = -1.0% revenue –  Google: +100msec = -0.2% queries §  But not for Etsy.com? “Faster results better? … meh” 27
  28. 28. Power § Power: the chance of detecting a difference when there really is one. § Two reasons your feature doesn’t move metrics 1.  No “real” impact 2.  Not enough power 28 Properly power up your experiment!
  29. 29. Statistical Significance § Which experiment has a bigger impact? 29 Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% 2.4%
  30. 30. Statistical Significance § Which experiment has a bigger impact? 30 Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% Stat. significant 2.4%
  31. 31. Statistical Significance 31 § Must consider statistical significance –  A 12.9% delta can still be noise! –  Identify signal from noise; focus on the “real” movers –  Ensure results are reproducible Experiment 1 Experiment 2 Pageviews 1.5% 12.9% Revenue 0.8% Stat. significant 2.4%
  32. 32. Multiple Testing § Famous xkcd comic on Jelly Beans 32
  33. 33. Multiple Testing Concerns § Multiple ramps –  Pre-decide a ramp to base decision on (e.g. 50/50) § Multiple “peeks” –  Rely on “full”-week results § Multiple variants –  Choose the best, then rerun to see if replicate § Multiple metrics
  34. 34. An irrelevant metric is statistically significant. What to do? §  Which metric? §  How “significant”? (p-value) 34 34 All metrics 2nd order metrics 1st order metrics p-value < 0.05 p-value < 0.01 p-value < 0.001 Directly impacted by exp. Maybe impacted by exp. Watch out for multiple testing With 100 metrics, how many would you see stat. significant even if your experiment does NOTHING? 5
  35. 35. References §  Tang, Diane, et al. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation. Proceedings 16th Conference on Knowledge Discovery and Data Mining. 2010. §  Kohavi, Ron, et al. Online Controlled Experiments at Large Scale. KDD 2013: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 2013. §  LinkedIn blog post: http://engineering.linkedin.com/ab-testing/xlnt-platform-driving-ab-testing-linkedin Additional Resources: RecSys’14 A/B testing workshop 35

×