SlideShare a Scribd company logo
1 of 37
Online Controlled Experiments:
Lessons from Running at Large Scale
Slides at http://bit.ly/CH2017Kohavi, @RonnyK
Ron Kohavi, Distinguished Engineer, General Manager,
Analysis and Experimentation, Microsoft
Joint work with many members of the A&E/ExP platform team
2017
The Life of a Great Idea – True Bing Story
➢An idea was proposed in early 2012 to change the way ad titles were displayed on Bing
➢Move ad text to the title line to make it longer
Ronny Kohavi (@Ronnyk) 2
Control – Existing Display Treatment – new idea called Long Ad Titles
The Life of a Great Idea (cont)
➢It’s one of hundreds of ideas proposed and seems
➢Implementation was delayed in:
➢Multiple features were stack-ranked as more valuable
➢It wasn’t clear if it was going to be done by the end of the year
➢An engineer thought: this is trivial to implement.
He implemented the idea in a few days, and started a controlled experiment (A/B test)
➢An alert fired that something is wrong with revenue, as Bing was making too much money.
Such alerts have been very useful to detect bugs (such as logging revenue twice)
➢But there was no bug. The idea increased Bing’s revenue by 12% (over $120M at the time)
without hurting guardrail metrics
➢We are terrible at assessing the value of ideas.
Few ideas generate over $100M in incremental revenue (as this idea), but
the best revenue-generating idea in Bing’s history was badly rated and delayed for months!
Ronny Kohavi (@Ronnyk) 3
Feb March April May June
…meh…
Agenda
➢Experimentation at scale
Experiment lifecycle critical when your system is used to run 15,000 treatments/year
➢Three real examples: you’re the decision maker
Examples chosen to share lessons
➢Five important lessons
Ronny Kohavi 4
Ronny Kohavi 5
A/B/n Tests in One Slide
➢Concept is trivial
▪Randomly split traffic between two (or more) versions
oA (Control)
oB (Treatment)
▪Collect metrics of interest
▪Analyze
➢Sample of real users
▪Not WEIRD (Western, Educated, Industrialized, Rich,
and Democratic) like many academic research samples
➢A/B test is the simplest controlled experiment
▪A/B/n refers to multiple treatments (often used and encouraged: try control + two or three treatments)
▪MVT refers to multivariable designs (rarely used by our teams)
➢Must run statistical tests to confirm differences are not due to chance
➢Best scientific way to prove causality, i.e., the changes in metrics are caused by changes
introduced in the treatment(s)
Team and Mission
Analysis and Experimentation team at Microsoft:
▪Mission: Accelerate innovation through trustworthy analysis and experimentation.
▪Empower the HiPPO (Highest Paid Person’s Opinion) with data
▪ExP platform used by many groups at Microsoft, including Bing, MSN, Office (client and online), OneNote,
Exchange, Xbox, Cortana, Skype, and Photos
▪90+ people
o 50 developers: build the Experimentation Platform and Analysis Tools
o 30 data scientists (center of excellence model)
o 10 Program Managers
o 2 overhead (me, admin)
Team includes people who worked at Amazon, Facebook,
Google, LinkedIn
Ronny Kohavi 6
Scale
➢We run about ~300 experiment treatments per week
(These are “real” useful treatments, not 3x10x10 MVT = 300.)
➢Typical treatment is exposed to millions of users, sometimes tens of millions.
➢Bing scale
▪There is no single Bing. Since a user is exposed to over 15 concurrent experiments,
they get one of 5^15 = 30 billion variants. Debugging takes a new meaning
▪Until 2014, the system was
limiting usage as it scaled.
Now limits come from engineers’
ability to code new ideas
Ronny Kohavi 7
The Experimentation Platform
➢Experimentation Platform provides full experiment-lifecycle management
▪Experimenter sets up experiment (several design templates) and hits “Start”
▪Pre-experiment “gates” have to pass (specific to team, such as perf test, basic correctness)
▪System finds a good split to control/treatment (“seedfinder”).
Tries hundreds of splits, evaluates them on last week of data, picks the best
▪System initiates experiment at low percentage and/or at a single Data Center.
Computes near-real-time “cheap” metric and aborts in 15 minutes if there is a problem
▪System wait for several hours and computes more metrics.
If guardrails are crossed, auto shut down; otherwise, ramps-up to desired percentage (e.g.,
10-20% of users)
▪After a day, system computes many more metrics (e.g., thousand+) and sends e-mail alerts
about interesting movements (e.g., time-to-success on browser-X is down D%)
Ronny Kohavi 8
Real Examples
➢Three experiments that ran at Microsoft
➢Each provides interesting lessons
➢All had enough users for statistical validity
➢For each experiment, we provide the OEC, the Overall Evaluation Criterion
▪ This is the criterion to determine which variant is the winner
➢Let’s see how many you get right
▪Everyone please stand up
▪You will be given three options and you will answer by raising you left hand, right hand, or leave
both hand down (details per example)
▪If you get it wrong, please sit down
➢Since there are 3 choices for each question, random guessing implies
100%/3^3 =~ 4% will get all three questions right.
Let’s see how much better than random we can get in this room
Ronny Kohavi
9
Example 1:
SERP Truncation
➢SERP is a Search Engine Result Page
(shown on the right)
➢OEC: Clickthrough Rate on 1st SERP per query
(ignore issues with click/back, page 2, etc.)
➢Version A: show 10 algorithmic results
➢Version B: show 8 algorithmic results by removing
the last two results
➢All else same: task pane, ads, related searches, etc.
➢Version B is slightly faster (fewer results means
less HTML, but server-side computed same set)
10
• Raise your left hand if you think A Wins (10 results)
• Raise your right hand if you think B Wins (8 results)
• Don’t raise your hand if they are the about the same
SERP Truncation
➢[Intentionally left blank]
Ronny Kohavi 11
The search box is in the lower left part of the taskbar for most of the 500M machines running
Windows 10
Here are two variants:
OEC (Overall Evaluation Criterion): user engagement--more searches (and thus Bing revenue)
Windows Search Box
• Raise your left hand if you think the Left version wins (stat-sig)
• Raise your right hand if you think the Right version wins (stat-sig)
• Don’t raise your hand if they are the about the same
[Intentionally left blank]
Windows Search Box (cont)
Another Search Box Tweak
➢Those of you running Windows 10 Fall Creators update released October 2017 will see a
change to the search box background
➢From
➢To
➢It’s another multi-million dollar winning treatment
➢Annoying at first, but Windows NPS score improved with this
Ronny Kohavi 14
Example 3: Bing Ads with Site Links
➢Should Bing add “site links” to ads, which allow advertisers to offer several destinations
on ads?
➢OEC: Revenue, ads constraint to same vertical pixels on avg
➢Pro adding: richer ads, users better informed where they land
➢Cons: Constraint means on average 4 “A” ads vs. 3 “B” ads
Variant B is 5msc slower (compute + higher page weight)
15
• Raise your left hand if you think Left version wins
• Raise your right hand if you think Right version wins
• Don’t raise your hand if they are the about the same
Bing Ads with Site Links
➢[Intentionally left blank]
Ronny Kohavi 16
Example 3B: Underlining Links
➢Does underlining increase or decrease clickthrough-rate?
Ronny Kohavi 17
Example 3B: Underlining Links
➢Does underlining increase or decrease clickthrough-rate?
➢OEC: Clickthrough Rate on 1st SERP per query
Ronny Kohavi 18
A B
• Raise your left hand if you think A Wins (left, with underlines)
• Raise your right hand if you think B Wins (right, without underlines)
• Don’t raise your hand if they are the about the same
Underlines
➢[Intentionally left blank]
Ronny Kohavi 19
Agenda
➢Experimentation at scale
Experiment lifecycle critical when your system is used to run 15,000 treatments/year
➢Three real examples: you’re the decision maker
Examples chosen to share lessons
➢Five important lessons
Ronny Kohavi 20
Lesson #1: Agree on a good OEC
➢OEC = Overall Evaluation Criterion
▪Getting agreement on the OEC in the org is a huge step forward
▪OEC should be defined using short-term metrics that predict
long-term value (and hard to game)
▪Think about customer lifetime value, not immediate revenue
Ex: Amazon e-mail – account for unsubscribes
▪Look for success indicators/leading indicators, avoid vanity metrics
▪Read Doug Hubbard’s How to Measure Anything
▪Funnels use Pirate metrics: AARRR
acquisition, activation, retention, revenue, and referral
▪Criterion could be weighted sum of factors, such as
Conversion/action, time to action, visit frequency
Ronny Kohavi 21
Lesson #1 (cont)
➢Microsoft support example with time on site
➢Bing example
▪Bing optimizes for long-term query share (% of queries in market) and long-term revenue
▪Short term it’s easy to make money by showing more ads ,but we know it increases
abandonment.
▪Selecting ads is a constraint optimization problem:
given an agreed avg pixels/query, optimize revenue
▪For query share, queries/user may seem like a good metric, but it’s terrible!
oDegrading search results will cause users to search more
oSessions/user is a much better metric (see http://bit.ly/expPuzzling).
Ronny Kohavi 22
Example of Bad OEC
23
➢Example showed that moving bottom middle call-to-action left, raised its clicks by 109%
➢It’s a local metric and trivial to move
by cannibalizing other links
➢Problem: next week, the team
responsible for the bottom right
call-to-action will move it all the way
left and report 150% increase
➢The OEC could be global to page and
used consistently. Something like
(σ𝑖 𝑐𝑙𝑖𝑐𝑘𝑖 ∗ 𝑣𝑎𝑙𝑢𝑒_𝑖) per user
(i is in the set of clickable elements)
Bad OEC Example
➢Your data scientists makes an observation:
2% of queries end up with “No results.”
➢Manager: must reduce.
Assigns a team to minimize “no results” metric
➢Metric improves, but results for query
brochure paper
are crap (or in this case, paper to clean crap)
➢Sometimes it *is* better to show “No Results.”
This is a good example of gaming the OEC.
Real example from my Amazon Prime now search
https://twitter.com/ronnyk/status/713949552823263234
Ronny Kohavi 24
Bad OEC
➢Office Online tested new design for homepage
➢
➢Overall Evaluation Criterion (OEC) was clicks on the Buy Button [shown in red boxes]
➢Why is this bad?
Control
Treatment
Bad OEC
➢Treatment had a drop in the OEC (clicks on buy) of 64%!
➢Not having the price shown in the Control lead more people to click to determine the price
➢The OEC assumes conversion downstream is the same
➢Make fewer assumptions and measure what
you really need: actual sales
Lesson #2: Most Ideas Fail
➢Features are built because teams believe they are useful.
But most experiments show that features fail to move the metrics they were
designed to improve
➢Based on experiments at Microsoft (paper)
▪1/3 of ideas were positive ideas and statistically significant
▪1/3 of ideas were flat: no statistically significant difference
▪1/3 of ideas were negative and statistically significant
➢At Bing (well optimized), the success rate is lower: 10-20%.
For Bing Sessions/user, our holy grail metric, 1 out of 5,000 experiments improves it
➢Integrating Bing with Facebook/Twitter in 3rd pane cost more than $25M in dev costs and
abandoned due to lack of value
➢We joke that our job is to tell clients that their new baby is ugly
➢The low success rate has been documented many times across multiple companies
Ronny Kohavi 27When running controlled experiments, you will be humbled!
Key Lesson Given the Success Rate
➢Experiment often
▪To have a great idea, have a lot of them -- Thomas Edison
▪If you have to kiss a lot of frogs to find a prince,
find more frogs and kiss them faster and faster
-- Mike Moran, Do it Wrong Quickly
➢Try radical ideas. You may be surprised
▪Doubly true if it’s cheap to implement
▪If you're not prepared to be wrong, you'll never come up
with anything original – Sir Ken Robinson, TED 2006 (#1 TED talk)
Ronny Kohavi 28
Avoid the temptation to try and build optimal features through
extensive planning without early testing of ideas
Lesson #3: Small Changes can have a
Big Impact to Key Metrics
➢Tiny changes with big impact are the bread-and-butter of talks at conferences
▪Opening example with Bing Ads in this talk, worth over $120M annually
▪Changed text in Windows search box: $5M+ dollars
▪Site links in ads: $50M annually
▪Changed text color for fonts in Bing: over $10M annually
▪100msec improvement to Bing server perf: $18M annually
▪Opening mail link in new tab on MSN: 5% increase to clicks/user
▪Credit card offer on Amazon shopping cart: 10s of millions of dollars annually
▪Overriding DOM routines to avoid malware in Bing: millions of dollars annually
➢But the reality is that these are rare gems: few among tens of thousands of experiments
Ronny Kohavi 29
Lesson #4: Changes Rarely have a
Big Positive Impact to Key Metrics
➢As Al Pacino says in the movie Any Given Sunday:
winning is done inch by inch
➢Most progress is made by small continuous improvements: 0.1%-1% after a lot of work
➢Bing’s relevance team, several hundred developers running thousands of experiments
every year, improve the OEC 2% annually
(2% is the sum of OEC improvements in controlled experiments)
➢Bing’s ads team improves revenues about 15-25% per year, but it is extremely rare to see
an idea or a new machine learning model that improves revenue by >2%
Ronny Kohavi 30
$20
$25
$30
$35
$40
$45
$50
$55
$60
$65
$70
$75
$80
$85
$90
$95
$100
Bing Ads Revenue per Search(*)
– Inch by Inch
➢Emarketer estimates Bing revenue grew 55% from 2014 to 2016
➢About every month a “package” is shipped, the result of many experiments
➢Improvements are typically small (sometimes lower revenue, impacted by space budget)
➢Seasonality (note Dec spikes) and other changes (e.g., algo relevance) have large impact
New models:
0.1%
July lift:
2.6% Aug lift:
8.6%
Sep lift:
1.4%
Oct: lift:
6.1%
Jan lift:
2.5%
Feb lift:
0.9%
Mar lift:
0.2%
Apr lift:
2.5%
May lift:
0.6%
Rollback/
cleanup
June lift:
1.05%
New models:
0.2%
Oct :lift
0.15%
Jul lift:
2.2%
Aug lift:
4.2%
Nov lift:
3.6%
Jan lift:
-3.1%
Feb lift:
-1.2%
Mar lift:
3.6%
June lift:
1.1%
Sep lift :
6.8%
Apr lift:
1%
May lift:
0.5%
Jul lift:
-0.6%
Aug Lift:
1.4%
Sept lift:
1.6%
Oct lift:
1.9%
(*) Numbers have been perturbed for obvious reasons
Twyman’s Law
➢In the book Exploring Data: An Introduction to Data Analysis for Social Scientists, the
authors wrote that Twyman’s law is
“perhaps the most important single law in the whole of data analysis.“
➢If something is “amazing,” find the flaw! Examples
▪If you have a mandatory birth date field and people think it’s unnecessary, you’ll find lots of
11/11/11 or 01/01/01
▪If you have an optional drop down, do not default to the first alphabetical entry, or you’ll have lots of:
jobs = Astronaut
▪Traffic to many US web sites doubled between 1-2AM on Sunday Nov 5th,
relative to the same hour a week prior. Why?
➢If you see a massive improvement to your OEC, call Twyman’s law and find the flaw.
Triple check things before you celebrate. See http://bit.ly/twymanLaw
Ronny Kohavi 32
Any figure that looks interesting or different is usually wrong
Lesson #5: Validate the
Experimentation System
➢Software that shows p-values with many digits of precision leads users to
trust it, but the statistics or implementation behind it could be buggy
➢Getting numbers is easy; getting numbers you can trust is hard
➢Example: Three good books on A/B testing get the stats wrong
(see my Amazon reviews)
➢Recommendation:
▪Check for Bots, which can cause significant skews.
At Bing over 50% of traffic is bot generated!
▪Run A/A tests: if the system is operating correctly, the system should find a stat-sig difference
only about 5% of the time . Is p-value uniform (next slide)?
▪Run SRM checks (see slide)
Ronny Kohavi 33
Example A/A test
➢P-value distribution for metrics in A/A tests should be uniform
➢Do 1,000 A/A tests, and check if the distribution is uniform
➢When we got this for some Skype metrics, we had to correct things (delta method)
Ronny Kohavi 34
Lesson #5 (cont)
➢SRM = Sample Ratio Mismatch
➢For an experiment with equal percentages assigned to Control/Treatment,
you should have approximately the same number of users in each
➢Real example:
▪Control: 821,588 users, Treatment: 815,482 users
▪Ratio: 50.2% (should have been 50%)
▪Should I be worried?
➢Absolutely
▪The p-value is 1.8e-6, so the probability of this split (or more extreme) happening by
chance is less than 1 in 500,000 (the Null hypothesis is true by design)
Ronny Kohavi 35
Summary
➢Think about the OEC. Make sure the org agrees what to optimize
➢It is hard to assess the value of ideas
▪ Listen to your customers – Get the data
▪ Prepare to be humbled: data trumps intuition
➢ Compute the statistics carefully
▪ Getting numbers is easy. Getting a number you can trust is harder
➢Experiment often
▪ Triple your experiment rate and you triple your success (and failure) rate.
Fail fast & often in order to succeed
▪ Accelerate innovation by lowering the cost of experimenting
➢See http://exp-platform.com for papers
➢These slides at http://bit.ly/CH2017Kohavi
Ronny Kohavi 36
The less data, the stronger the opinions
Are you Hiring?
Ronny Kohavi 37

More Related Content

What's hot

Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecampCraig Sullivan
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleCraig Sullivan
 
Startup Metrics for Pirates (Startonomics Beijing, June 2009)
Startup Metrics for Pirates (Startonomics Beijing, June 2009)Startup Metrics for Pirates (Startonomics Beijing, June 2009)
Startup Metrics for Pirates (Startonomics Beijing, June 2009)Dave McClure
 
[CXL Live 16] Opening Keynote by Peep Laja
[CXL Live 16] Opening Keynote by Peep Laja[CXL Live 16] Opening Keynote by Peep Laja
[CXL Live 16] Opening Keynote by Peep LajaCXL
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
 
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation SuperstarsCXL
 
Turning Data into Customers - Conversion Hotel - Peep Laja
Turning Data into Customers - Conversion Hotel - Peep LajaTurning Data into Customers - Conversion Hotel - Peep Laja
Turning Data into Customers - Conversion Hotel - Peep LajaCXL
 
A/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey TheoryA/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey TheoryUseItBetter
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - MeasurefestGuido X Jansen
 
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and TacticsCXL
 
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...CXL
 
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...CXL
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely
 
A/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections DebunkedA/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections DebunkedOptimizely
 
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...Distilled
 
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John EkmanCXL
 
Ecommerce Test Ideation: What to Optimize and Why
Ecommerce Test Ideation: What to Optimize and WhyEcommerce Test Ideation: What to Optimize and Why
Ecommerce Test Ideation: What to Optimize and WhyOptimizely
 
Conversion Whitepaper
Conversion WhitepaperConversion Whitepaper
Conversion WhitepaperWSI Ensenada
 

What's hot (20)

Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecamp
 
eMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype CycleeMetrics London - The AB Testing Hype Cycle
eMetrics London - The AB Testing Hype Cycle
 
Startup Metrics for Pirates (Startonomics Beijing, June 2009)
Startup Metrics for Pirates (Startonomics Beijing, June 2009)Startup Metrics for Pirates (Startonomics Beijing, June 2009)
Startup Metrics for Pirates (Startonomics Beijing, June 2009)
 
[CXL Live 16] Opening Keynote by Peep Laja
[CXL Live 16] Opening Keynote by Peep Laja[CXL Live 16] Opening Keynote by Peep Laja
[CXL Live 16] Opening Keynote by Peep Laja
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
 
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars
[Elite Camp 2016] Thomas Barker - From Zero to In-House Optimisation Superstars
 
Turning Data into Customers - Conversion Hotel - Peep Laja
Turning Data into Customers - Conversion Hotel - Peep LajaTurning Data into Customers - Conversion Hotel - Peep Laja
Turning Data into Customers - Conversion Hotel - Peep Laja
 
A/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey TheoryA/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey Theory
 
Preparing for AI - Measurefest
Preparing for AI - MeasurefestPreparing for AI - Measurefest
Preparing for AI - Measurefest
 
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics
[Elite Camp 2016] Phil Nottingham - CRO with Video: Tips, Tricks and Tactics
 
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...
[Elite Camp 2016] Karl Gilis - How to Make Sure Your New Website Won’t Be a F...
 
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
[CXL Live 16] SaaS Optimization - Effective Metrics, Process and Hacks by Ste...
 
Optimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - AtlassianOptimizely Experience Customer Story - Atlassian
Optimizely Experience Customer Story - Atlassian
 
1330 keynote angle
1330 keynote angle1330 keynote angle
1330 keynote angle
 
A/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections DebunkedA/B Mythbusters: Common Optimization Objections Debunked
A/B Mythbusters: Common Optimization Objections Debunked
 
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
SearchLove London 2016 | Stephen Pavlovich | Habits of Advanced Conversion Op...
 
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
[CXL Live 16] The Grand Unified Theory of Conversion Optimization by John Ekman
 
Ecommerce Test Ideation: What to Optimize and Why
Ecommerce Test Ideation: What to Optimize and WhyEcommerce Test Ideation: What to Optimize and Why
Ecommerce Test Ideation: What to Optimize and Why
 
Conversion Whitepaper
Conversion WhitepaperConversion Whitepaper
Conversion Whitepaper
 
SalesStash Berkeley 2016
SalesStash Berkeley 2016SalesStash Berkeley 2016
SalesStash Berkeley 2016
 

Similar to Ronny Kohavi, Microsoft (USA) - Conversion Hotel 2017 - keynote

Into AB experiments
Into AB experimentsInto AB experiments
Into AB experimentsDeven
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Joni Salminen
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingJack Nguyen (Hung Tien)
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesHarvardComms
 
230286802015PPT.pptx
230286802015PPT.pptx230286802015PPT.pptx
230286802015PPT.pptxannalakshmi35
 
Taming The HiPPO
Taming The HiPPOTaming The HiPPO
Taming The HiPPOGoogle A/NZ
 
The Truth About Growth Hacking in B2B
The Truth About Growth Hacking in B2BThe Truth About Growth Hacking in B2B
The Truth About Growth Hacking in B2BVishal Sunak
 
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)ProductCamp Boston
 
How to Balance Innovation and Optimization in your CRO program
How to Balance Innovation and Optimization in your CRO programHow to Balance Innovation and Optimization in your CRO program
How to Balance Innovation and Optimization in your CRO programVWO
 
How to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMHow to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMProduct School
 
Download Invesp’s The Essentials of Multivariate & AB Testing
Download Invesp’s The Essentials of Multivariate & AB TestingDownload Invesp’s The Essentials of Multivariate & AB Testing
Download Invesp’s The Essentials of Multivariate & AB TestingDuy, Vo Hoang
 
Still A/B testing your buttons? You need to think (much) bigger...
Still A/B testing your buttons? You need to think (much) bigger...Still A/B testing your buttons? You need to think (much) bigger...
Still A/B testing your buttons? You need to think (much) bigger...Guido X Jansen
 
Applied Ecommerce - SearchCon 2017
Applied Ecommerce - SearchCon 2017Applied Ecommerce - SearchCon 2017
Applied Ecommerce - SearchCon 2017Todd Barrs
 
The Freemium Model and A/B Testing
The Freemium Model and A/B TestingThe Freemium Model and A/B Testing
The Freemium Model and A/B TestingWebs
 
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...Craig Sullivan
 
User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015Marcus Merrell
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerProduct School
 

Similar to Ronny Kohavi, Microsoft (USA) - Conversion Hotel 2017 - keynote (20)

Into AB experiments
Into AB experimentsInto AB experiments
Into AB experiments
 
A/B Testing at Scale
A/B Testing at ScaleA/B Testing at Scale
A/B Testing at Scale
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)
 
1325 keynote kohavi
1325 keynote kohavi1325 keynote kohavi
1325 keynote kohavi
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
Analytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
 
230286802015PPT.pptx
230286802015PPT.pptx230286802015PPT.pptx
230286802015PPT.pptx
 
Taming The HiPPO
Taming The HiPPOTaming The HiPPO
Taming The HiPPO
 
The Truth About Growth Hacking in B2B
The Truth About Growth Hacking in B2BThe Truth About Growth Hacking in B2B
The Truth About Growth Hacking in B2B
 
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)
The Truth about Growth Hacking in B2B - Vishal Sunak (ProductCamp Boston 2015)
 
CRO Cheat Sheet
CRO Cheat SheetCRO Cheat Sheet
CRO Cheat Sheet
 
How to Balance Innovation and Optimization in your CRO program
How to Balance Innovation and Optimization in your CRO programHow to Balance Innovation and Optimization in your CRO program
How to Balance Innovation and Optimization in your CRO program
 
How to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMHow to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PM
 
Download Invesp’s The Essentials of Multivariate & AB Testing
Download Invesp’s The Essentials of Multivariate & AB TestingDownload Invesp’s The Essentials of Multivariate & AB Testing
Download Invesp’s The Essentials of Multivariate & AB Testing
 
Still A/B testing your buttons? You need to think (much) bigger...
Still A/B testing your buttons? You need to think (much) bigger...Still A/B testing your buttons? You need to think (much) bigger...
Still A/B testing your buttons? You need to think (much) bigger...
 
Applied Ecommerce - SearchCon 2017
Applied Ecommerce - SearchCon 2017Applied Ecommerce - SearchCon 2017
Applied Ecommerce - SearchCon 2017
 
The Freemium Model and A/B Testing
The Freemium Model and A/B TestingThe Freemium Model and A/B Testing
The Freemium Model and A/B Testing
 
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...
Condensed testing syrup - @OptimiseorDie @sydney sep 2011 - 4 years of testin...
 
User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015User Analytics Testing - SeleniumCamp 2015
User Analytics Testing - SeleniumCamp 2015
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product Manager
 

More from Online Dialogue

Introduction to BOOM! - Behavioral online optimization method
Introduction to BOOM! - Behavioral online optimization methodIntroduction to BOOM! - Behavioral online optimization method
Introduction to BOOM! - Behavioral online optimization methodOnline Dialogue
 
GAUC 2020 - presentatie Hans en Reinier
GAUC 2020 - presentatie Hans en ReinierGAUC 2020 - presentatie Hans en Reinier
GAUC 2020 - presentatie Hans en ReinierOnline Dialogue
 
Online Dialogue Digital Analytics Congres 2019
Online Dialogue Digital Analytics Congres 2019Online Dialogue Digital Analytics Congres 2019
Online Dialogue Digital Analytics Congres 2019Online Dialogue
 
Presentation Emerce Conversion_Bart Schutz_April 11th 2019
Presentation Emerce Conversion_Bart Schutz_April 11th 2019Presentation Emerce Conversion_Bart Schutz_April 11th 2019
Presentation Emerce Conversion_Bart Schutz_April 11th 2019Online Dialogue
 
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019Online Dialogue
 
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...Online Dialogue
 
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van Dam
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van DamDialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van Dam
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van DamOnline Dialogue
 
Dialogue Donderdag #28 Gerben Langendijk
Dialogue Donderdag #28 Gerben LangendijkDialogue Donderdag #28 Gerben Langendijk
Dialogue Donderdag #28 Gerben LangendijkOnline Dialogue
 
Joost Fromberg at the Global Digital Marketing Summit '18
Joost Fromberg at the Global Digital Marketing Summit '18Joost Fromberg at the Global Digital Marketing Summit '18
Joost Fromberg at the Global Digital Marketing Summit '18Online Dialogue
 
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynote
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynoteAmy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynote
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynoteOnline Dialogue
 
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynote
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynoteEls Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynote
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynoteOnline Dialogue
 
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynote
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynoteJoost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynote
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynoteOnline Dialogue
 
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...Online Dialogue
 
Workshop congres verborgen psycholoog
Workshop congres  verborgen psycholoogWorkshop congres  verborgen psycholoog
Workshop congres verborgen psycholoogOnline Dialogue
 
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...Online Dialogue
 
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...Online Dialogue
 
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynote
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynoteEvidence everything - Bart Schutz - Growth Marketing Summit 2017 keynote
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynoteOnline Dialogue
 
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynoteEcommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynoteOnline Dialogue
 
Conversionboost - Bart Schutz
Conversionboost - Bart SchutzConversionboost - Bart Schutz
Conversionboost - Bart SchutzOnline Dialogue
 

More from Online Dialogue (20)

Introduction to BOOM! - Behavioral online optimization method
Introduction to BOOM! - Behavioral online optimization methodIntroduction to BOOM! - Behavioral online optimization method
Introduction to BOOM! - Behavioral online optimization method
 
CRO & scrum combineren
CRO & scrum combinerenCRO & scrum combineren
CRO & scrum combineren
 
GAUC 2020 - presentatie Hans en Reinier
GAUC 2020 - presentatie Hans en ReinierGAUC 2020 - presentatie Hans en Reinier
GAUC 2020 - presentatie Hans en Reinier
 
Online Dialogue Digital Analytics Congres 2019
Online Dialogue Digital Analytics Congres 2019Online Dialogue Digital Analytics Congres 2019
Online Dialogue Digital Analytics Congres 2019
 
Presentation Emerce Conversion_Bart Schutz_April 11th 2019
Presentation Emerce Conversion_Bart Schutz_April 11th 2019Presentation Emerce Conversion_Bart Schutz_April 11th 2019
Presentation Emerce Conversion_Bart Schutz_April 11th 2019
 
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019
Presentation Emerce Conversion_Tom vanden Berg_April 11th 2019
 
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...
Validation in every Organisation - Ton Weseling - CRO Elite keynote London - ...
 
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van Dam
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van DamDialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van Dam
Dialogue Donderdag #28 Joost Baalbergen, Eline van Baal en Roos van Dam
 
Dialogue Donderdag #28 Gerben Langendijk
Dialogue Donderdag #28 Gerben LangendijkDialogue Donderdag #28 Gerben Langendijk
Dialogue Donderdag #28 Gerben Langendijk
 
Joost Fromberg at the Global Digital Marketing Summit '18
Joost Fromberg at the Global Digital Marketing Summit '18Joost Fromberg at the Global Digital Marketing Summit '18
Joost Fromberg at the Global Digital Marketing Summit '18
 
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynote
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynoteAmy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynote
Amy Cueva, Mad*Pow (USA) - Conversion Hotel 2017 - keynote
 
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynote
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynoteEls Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynote
Els Aerts, AGConsult (BE) - Conversion Hotel 2017 - keynote
 
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynote
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynoteJoost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynote
Joost de Schepper, Spotify (SWE) - Conversion Hotel 2017 - keynote
 
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...
Keynote #gconvert17 - Bart Schutz of Online Dialogue - Online Persuasion - Sy...
 
Workshop congres verborgen psycholoog
Workshop congres  verborgen psycholoogWorkshop congres  verborgen psycholoog
Workshop congres verborgen psycholoog
 
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...
Keynote #DAC17 - Bart Schutz - Marshmallows and the psychology behind evidenc...
 
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...
Keynote on Online Experiments by Ton Wesseling - Online Dialogue at the Web A...
 
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynote
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynoteEvidence everything - Bart Schutz - Growth Marketing Summit 2017 keynote
Evidence everything - Bart Schutz - Growth Marketing Summit 2017 keynote
 
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynoteEcommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
Ecommerce Conversion World, London March 23 2017 - Ton Wesseling keynote
 
Conversionboost - Bart Schutz
Conversionboost - Bart SchutzConversionboost - Bart Schutz
Conversionboost - Bart Schutz
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Ronny Kohavi, Microsoft (USA) - Conversion Hotel 2017 - keynote

  • 1. Online Controlled Experiments: Lessons from Running at Large Scale Slides at http://bit.ly/CH2017Kohavi, @RonnyK Ron Kohavi, Distinguished Engineer, General Manager, Analysis and Experimentation, Microsoft Joint work with many members of the A&E/ExP platform team 2017
  • 2. The Life of a Great Idea – True Bing Story ➢An idea was proposed in early 2012 to change the way ad titles were displayed on Bing ➢Move ad text to the title line to make it longer Ronny Kohavi (@Ronnyk) 2 Control – Existing Display Treatment – new idea called Long Ad Titles
  • 3. The Life of a Great Idea (cont) ➢It’s one of hundreds of ideas proposed and seems ➢Implementation was delayed in: ➢Multiple features were stack-ranked as more valuable ➢It wasn’t clear if it was going to be done by the end of the year ➢An engineer thought: this is trivial to implement. He implemented the idea in a few days, and started a controlled experiment (A/B test) ➢An alert fired that something is wrong with revenue, as Bing was making too much money. Such alerts have been very useful to detect bugs (such as logging revenue twice) ➢But there was no bug. The idea increased Bing’s revenue by 12% (over $120M at the time) without hurting guardrail metrics ➢We are terrible at assessing the value of ideas. Few ideas generate over $100M in incremental revenue (as this idea), but the best revenue-generating idea in Bing’s history was badly rated and delayed for months! Ronny Kohavi (@Ronnyk) 3 Feb March April May June …meh…
  • 4. Agenda ➢Experimentation at scale Experiment lifecycle critical when your system is used to run 15,000 treatments/year ➢Three real examples: you’re the decision maker Examples chosen to share lessons ➢Five important lessons Ronny Kohavi 4
  • 5. Ronny Kohavi 5 A/B/n Tests in One Slide ➢Concept is trivial ▪Randomly split traffic between two (or more) versions oA (Control) oB (Treatment) ▪Collect metrics of interest ▪Analyze ➢Sample of real users ▪Not WEIRD (Western, Educated, Industrialized, Rich, and Democratic) like many academic research samples ➢A/B test is the simplest controlled experiment ▪A/B/n refers to multiple treatments (often used and encouraged: try control + two or three treatments) ▪MVT refers to multivariable designs (rarely used by our teams) ➢Must run statistical tests to confirm differences are not due to chance ➢Best scientific way to prove causality, i.e., the changes in metrics are caused by changes introduced in the treatment(s)
  • 6. Team and Mission Analysis and Experimentation team at Microsoft: ▪Mission: Accelerate innovation through trustworthy analysis and experimentation. ▪Empower the HiPPO (Highest Paid Person’s Opinion) with data ▪ExP platform used by many groups at Microsoft, including Bing, MSN, Office (client and online), OneNote, Exchange, Xbox, Cortana, Skype, and Photos ▪90+ people o 50 developers: build the Experimentation Platform and Analysis Tools o 30 data scientists (center of excellence model) o 10 Program Managers o 2 overhead (me, admin) Team includes people who worked at Amazon, Facebook, Google, LinkedIn Ronny Kohavi 6
  • 7. Scale ➢We run about ~300 experiment treatments per week (These are “real” useful treatments, not 3x10x10 MVT = 300.) ➢Typical treatment is exposed to millions of users, sometimes tens of millions. ➢Bing scale ▪There is no single Bing. Since a user is exposed to over 15 concurrent experiments, they get one of 5^15 = 30 billion variants. Debugging takes a new meaning ▪Until 2014, the system was limiting usage as it scaled. Now limits come from engineers’ ability to code new ideas Ronny Kohavi 7
  • 8. The Experimentation Platform ➢Experimentation Platform provides full experiment-lifecycle management ▪Experimenter sets up experiment (several design templates) and hits “Start” ▪Pre-experiment “gates” have to pass (specific to team, such as perf test, basic correctness) ▪System finds a good split to control/treatment (“seedfinder”). Tries hundreds of splits, evaluates them on last week of data, picks the best ▪System initiates experiment at low percentage and/or at a single Data Center. Computes near-real-time “cheap” metric and aborts in 15 minutes if there is a problem ▪System wait for several hours and computes more metrics. If guardrails are crossed, auto shut down; otherwise, ramps-up to desired percentage (e.g., 10-20% of users) ▪After a day, system computes many more metrics (e.g., thousand+) and sends e-mail alerts about interesting movements (e.g., time-to-success on browser-X is down D%) Ronny Kohavi 8
  • 9. Real Examples ➢Three experiments that ran at Microsoft ➢Each provides interesting lessons ➢All had enough users for statistical validity ➢For each experiment, we provide the OEC, the Overall Evaluation Criterion ▪ This is the criterion to determine which variant is the winner ➢Let’s see how many you get right ▪Everyone please stand up ▪You will be given three options and you will answer by raising you left hand, right hand, or leave both hand down (details per example) ▪If you get it wrong, please sit down ➢Since there are 3 choices for each question, random guessing implies 100%/3^3 =~ 4% will get all three questions right. Let’s see how much better than random we can get in this room Ronny Kohavi 9
  • 10. Example 1: SERP Truncation ➢SERP is a Search Engine Result Page (shown on the right) ➢OEC: Clickthrough Rate on 1st SERP per query (ignore issues with click/back, page 2, etc.) ➢Version A: show 10 algorithmic results ➢Version B: show 8 algorithmic results by removing the last two results ➢All else same: task pane, ads, related searches, etc. ➢Version B is slightly faster (fewer results means less HTML, but server-side computed same set) 10 • Raise your left hand if you think A Wins (10 results) • Raise your right hand if you think B Wins (8 results) • Don’t raise your hand if they are the about the same
  • 11. SERP Truncation ➢[Intentionally left blank] Ronny Kohavi 11
  • 12. The search box is in the lower left part of the taskbar for most of the 500M machines running Windows 10 Here are two variants: OEC (Overall Evaluation Criterion): user engagement--more searches (and thus Bing revenue) Windows Search Box • Raise your left hand if you think the Left version wins (stat-sig) • Raise your right hand if you think the Right version wins (stat-sig) • Don’t raise your hand if they are the about the same
  • 14. Another Search Box Tweak ➢Those of you running Windows 10 Fall Creators update released October 2017 will see a change to the search box background ➢From ➢To ➢It’s another multi-million dollar winning treatment ➢Annoying at first, but Windows NPS score improved with this Ronny Kohavi 14
  • 15. Example 3: Bing Ads with Site Links ➢Should Bing add “site links” to ads, which allow advertisers to offer several destinations on ads? ➢OEC: Revenue, ads constraint to same vertical pixels on avg ➢Pro adding: richer ads, users better informed where they land ➢Cons: Constraint means on average 4 “A” ads vs. 3 “B” ads Variant B is 5msc slower (compute + higher page weight) 15 • Raise your left hand if you think Left version wins • Raise your right hand if you think Right version wins • Don’t raise your hand if they are the about the same
  • 16. Bing Ads with Site Links ➢[Intentionally left blank] Ronny Kohavi 16
  • 17. Example 3B: Underlining Links ➢Does underlining increase or decrease clickthrough-rate? Ronny Kohavi 17
  • 18. Example 3B: Underlining Links ➢Does underlining increase or decrease clickthrough-rate? ➢OEC: Clickthrough Rate on 1st SERP per query Ronny Kohavi 18 A B • Raise your left hand if you think A Wins (left, with underlines) • Raise your right hand if you think B Wins (right, without underlines) • Don’t raise your hand if they are the about the same
  • 20. Agenda ➢Experimentation at scale Experiment lifecycle critical when your system is used to run 15,000 treatments/year ➢Three real examples: you’re the decision maker Examples chosen to share lessons ➢Five important lessons Ronny Kohavi 20
  • 21. Lesson #1: Agree on a good OEC ➢OEC = Overall Evaluation Criterion ▪Getting agreement on the OEC in the org is a huge step forward ▪OEC should be defined using short-term metrics that predict long-term value (and hard to game) ▪Think about customer lifetime value, not immediate revenue Ex: Amazon e-mail – account for unsubscribes ▪Look for success indicators/leading indicators, avoid vanity metrics ▪Read Doug Hubbard’s How to Measure Anything ▪Funnels use Pirate metrics: AARRR acquisition, activation, retention, revenue, and referral ▪Criterion could be weighted sum of factors, such as Conversion/action, time to action, visit frequency Ronny Kohavi 21
  • 22. Lesson #1 (cont) ➢Microsoft support example with time on site ➢Bing example ▪Bing optimizes for long-term query share (% of queries in market) and long-term revenue ▪Short term it’s easy to make money by showing more ads ,but we know it increases abandonment. ▪Selecting ads is a constraint optimization problem: given an agreed avg pixels/query, optimize revenue ▪For query share, queries/user may seem like a good metric, but it’s terrible! oDegrading search results will cause users to search more oSessions/user is a much better metric (see http://bit.ly/expPuzzling). Ronny Kohavi 22
  • 23. Example of Bad OEC 23 ➢Example showed that moving bottom middle call-to-action left, raised its clicks by 109% ➢It’s a local metric and trivial to move by cannibalizing other links ➢Problem: next week, the team responsible for the bottom right call-to-action will move it all the way left and report 150% increase ➢The OEC could be global to page and used consistently. Something like (σ𝑖 𝑐𝑙𝑖𝑐𝑘𝑖 ∗ 𝑣𝑎𝑙𝑢𝑒_𝑖) per user (i is in the set of clickable elements)
  • 24. Bad OEC Example ➢Your data scientists makes an observation: 2% of queries end up with “No results.” ➢Manager: must reduce. Assigns a team to minimize “no results” metric ➢Metric improves, but results for query brochure paper are crap (or in this case, paper to clean crap) ➢Sometimes it *is* better to show “No Results.” This is a good example of gaming the OEC. Real example from my Amazon Prime now search https://twitter.com/ronnyk/status/713949552823263234 Ronny Kohavi 24
  • 25. Bad OEC ➢Office Online tested new design for homepage ➢ ➢Overall Evaluation Criterion (OEC) was clicks on the Buy Button [shown in red boxes] ➢Why is this bad? Control Treatment
  • 26. Bad OEC ➢Treatment had a drop in the OEC (clicks on buy) of 64%! ➢Not having the price shown in the Control lead more people to click to determine the price ➢The OEC assumes conversion downstream is the same ➢Make fewer assumptions and measure what you really need: actual sales
  • 27. Lesson #2: Most Ideas Fail ➢Features are built because teams believe they are useful. But most experiments show that features fail to move the metrics they were designed to improve ➢Based on experiments at Microsoft (paper) ▪1/3 of ideas were positive ideas and statistically significant ▪1/3 of ideas were flat: no statistically significant difference ▪1/3 of ideas were negative and statistically significant ➢At Bing (well optimized), the success rate is lower: 10-20%. For Bing Sessions/user, our holy grail metric, 1 out of 5,000 experiments improves it ➢Integrating Bing with Facebook/Twitter in 3rd pane cost more than $25M in dev costs and abandoned due to lack of value ➢We joke that our job is to tell clients that their new baby is ugly ➢The low success rate has been documented many times across multiple companies Ronny Kohavi 27When running controlled experiments, you will be humbled!
  • 28. Key Lesson Given the Success Rate ➢Experiment often ▪To have a great idea, have a lot of them -- Thomas Edison ▪If you have to kiss a lot of frogs to find a prince, find more frogs and kiss them faster and faster -- Mike Moran, Do it Wrong Quickly ➢Try radical ideas. You may be surprised ▪Doubly true if it’s cheap to implement ▪If you're not prepared to be wrong, you'll never come up with anything original – Sir Ken Robinson, TED 2006 (#1 TED talk) Ronny Kohavi 28 Avoid the temptation to try and build optimal features through extensive planning without early testing of ideas
  • 29. Lesson #3: Small Changes can have a Big Impact to Key Metrics ➢Tiny changes with big impact are the bread-and-butter of talks at conferences ▪Opening example with Bing Ads in this talk, worth over $120M annually ▪Changed text in Windows search box: $5M+ dollars ▪Site links in ads: $50M annually ▪Changed text color for fonts in Bing: over $10M annually ▪100msec improvement to Bing server perf: $18M annually ▪Opening mail link in new tab on MSN: 5% increase to clicks/user ▪Credit card offer on Amazon shopping cart: 10s of millions of dollars annually ▪Overriding DOM routines to avoid malware in Bing: millions of dollars annually ➢But the reality is that these are rare gems: few among tens of thousands of experiments Ronny Kohavi 29
  • 30. Lesson #4: Changes Rarely have a Big Positive Impact to Key Metrics ➢As Al Pacino says in the movie Any Given Sunday: winning is done inch by inch ➢Most progress is made by small continuous improvements: 0.1%-1% after a lot of work ➢Bing’s relevance team, several hundred developers running thousands of experiments every year, improve the OEC 2% annually (2% is the sum of OEC improvements in controlled experiments) ➢Bing’s ads team improves revenues about 15-25% per year, but it is extremely rare to see an idea or a new machine learning model that improves revenue by >2% Ronny Kohavi 30
  • 31. $20 $25 $30 $35 $40 $45 $50 $55 $60 $65 $70 $75 $80 $85 $90 $95 $100 Bing Ads Revenue per Search(*) – Inch by Inch ➢Emarketer estimates Bing revenue grew 55% from 2014 to 2016 ➢About every month a “package” is shipped, the result of many experiments ➢Improvements are typically small (sometimes lower revenue, impacted by space budget) ➢Seasonality (note Dec spikes) and other changes (e.g., algo relevance) have large impact New models: 0.1% July lift: 2.6% Aug lift: 8.6% Sep lift: 1.4% Oct: lift: 6.1% Jan lift: 2.5% Feb lift: 0.9% Mar lift: 0.2% Apr lift: 2.5% May lift: 0.6% Rollback/ cleanup June lift: 1.05% New models: 0.2% Oct :lift 0.15% Jul lift: 2.2% Aug lift: 4.2% Nov lift: 3.6% Jan lift: -3.1% Feb lift: -1.2% Mar lift: 3.6% June lift: 1.1% Sep lift : 6.8% Apr lift: 1% May lift: 0.5% Jul lift: -0.6% Aug Lift: 1.4% Sept lift: 1.6% Oct lift: 1.9% (*) Numbers have been perturbed for obvious reasons
  • 32. Twyman’s Law ➢In the book Exploring Data: An Introduction to Data Analysis for Social Scientists, the authors wrote that Twyman’s law is “perhaps the most important single law in the whole of data analysis.“ ➢If something is “amazing,” find the flaw! Examples ▪If you have a mandatory birth date field and people think it’s unnecessary, you’ll find lots of 11/11/11 or 01/01/01 ▪If you have an optional drop down, do not default to the first alphabetical entry, or you’ll have lots of: jobs = Astronaut ▪Traffic to many US web sites doubled between 1-2AM on Sunday Nov 5th, relative to the same hour a week prior. Why? ➢If you see a massive improvement to your OEC, call Twyman’s law and find the flaw. Triple check things before you celebrate. See http://bit.ly/twymanLaw Ronny Kohavi 32 Any figure that looks interesting or different is usually wrong
  • 33. Lesson #5: Validate the Experimentation System ➢Software that shows p-values with many digits of precision leads users to trust it, but the statistics or implementation behind it could be buggy ➢Getting numbers is easy; getting numbers you can trust is hard ➢Example: Three good books on A/B testing get the stats wrong (see my Amazon reviews) ➢Recommendation: ▪Check for Bots, which can cause significant skews. At Bing over 50% of traffic is bot generated! ▪Run A/A tests: if the system is operating correctly, the system should find a stat-sig difference only about 5% of the time . Is p-value uniform (next slide)? ▪Run SRM checks (see slide) Ronny Kohavi 33
  • 34. Example A/A test ➢P-value distribution for metrics in A/A tests should be uniform ➢Do 1,000 A/A tests, and check if the distribution is uniform ➢When we got this for some Skype metrics, we had to correct things (delta method) Ronny Kohavi 34
  • 35. Lesson #5 (cont) ➢SRM = Sample Ratio Mismatch ➢For an experiment with equal percentages assigned to Control/Treatment, you should have approximately the same number of users in each ➢Real example: ▪Control: 821,588 users, Treatment: 815,482 users ▪Ratio: 50.2% (should have been 50%) ▪Should I be worried? ➢Absolutely ▪The p-value is 1.8e-6, so the probability of this split (or more extreme) happening by chance is less than 1 in 500,000 (the Null hypothesis is true by design) Ronny Kohavi 35
  • 36. Summary ➢Think about the OEC. Make sure the org agrees what to optimize ➢It is hard to assess the value of ideas ▪ Listen to your customers – Get the data ▪ Prepare to be humbled: data trumps intuition ➢ Compute the statistics carefully ▪ Getting numbers is easy. Getting a number you can trust is harder ➢Experiment often ▪ Triple your experiment rate and you triple your success (and failure) rate. Fail fast & often in order to succeed ▪ Accelerate innovation by lowering the cost of experimenting ➢See http://exp-platform.com for papers ➢These slides at http://bit.ly/CH2017Kohavi Ronny Kohavi 36 The less data, the stronger the opinions