A/B Testing and the Infinite Monkey Theory

http://en.wikipedia.org/wiki/Portraits_of_Shakespeare
A/B Testing  
and the Inﬁnite
Monkey Theorem.
Lukasz Twardowski
www.useitbetter.com

a monkey hitting keys
at random for an
inﬁnite amount of time
will almost surely type
the complete works of
William Shakespeare.

a monkey hitting keys
at random for an
inﬁnite amount of time
will almost surely
A/B testing
reach the conversion rate
of Amazon.

A/B testing
helps ﬁnd out which
of two versions performs better
while running simultaneously.
THEORY

We do this
because every day is different,
unlike in the Groundhog Day movie.
Groundhog Day (1993, Dir. Harold Ramis)

http://nerds.airbnb.com/experiments-at-airbnb/
A single change, bad or good, will not change a trend.
Unless a change is A/B tested,
you won’t know its impact.

The industry average hit rate
for A/B testing
=
Provide the benchmark:
EXERCISE 1.

The industry average hit rate
for A/B testing
=
14%
Just 1 out of 7 A/B tests
is successful!
http://conversionxl.com/ab-tests-fail/
Provide the benchmark:
EXERCISE 1.

King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack)
How to
be the greatest
monkey in the biz
if inﬁnity is not an
option?

Be a quick
monkey.
How to be the best monkey in the biz?

1 out of 7 tests wins
x 2 weeks per test
=
slow growth
Do the math:
EXERCISE 2.
Unless you experiment
at scale.

The currency in which
you pay for A/B tests 
is trafﬁc.

you pay for A/B tests  
is trafﬁc. The more you
have, the more tests
you can run.

you pay for A/B tests 
is trafﬁc. The more you
have, the more tests
you can run. Never
waste what you have.

Shop Direct
Scaled to 101
experiments a month
in two years.
100+ year old company
Etsy
25 releases a day,
most of them are  
A/B tests.
A startup launched in 2005
http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin)

Zero Tests Per Month.
Here’s the test idea,
numbers and execution.
Can we proceed?
Let’s meet to
discuss. Maybe
next week?
Looks good.
Will check with Z
and get back to you.
So here’s the test idea,
numbers…
Sorry,
had other priorities.
Can we meet
next week?
Sure! (D***!)
Have you
checked with Z?
Have you…?
Have you…?

Ground rules:  
1. Test ideas are
subject to prioritization
not approval.

evidence
x opportunity size
x strategy
=
priority
Magic formula:
EXERCISE 3.
The worst idea gets tested
if resources are available.

101 Tests Per Month.
Ok then, we’ll
do this, this
and that test.
Others will wait.
Guys, our
strategy shifted
to checkout
optimization.
Guys, we
need to increase
basket value.
Now this
and that one…
And this…
These two
would work…
Xmas is
coming!
DO NOTHING!
…this, this
and that…

Ground rules:  
2. Accept the fact that
things will go wrong.

Cheat like
a monkey.

If 1 out of 7 tests
wins, what about the
other 6?

https://www.groovehq.com/blog/failed-ab-tests
What was the result of the 
Button Colors Test by Groove?
EXERCISE 1.

If 1 out of 7 tests
wins, what about the
other 6? 5 of them
will be inconclusive.

Most tests are inconclusive because:
a) too few users were using the changed
feature for it to get statistical signiﬁcance.
b) the changed feature had little to do with
metrics used to evaluate the test.
c) there were multiple changes in the same
test and they levelled up.

Complete the sentence:
EXERCISE 4.
You do it to ﬁnd out
what works and how well.
A/B testing is NOT about __________.making money

You can successfully
run tests that have no
chance of success.

… removing a feature
… slowing down the website
…
Cheat: Experiment to
test signiﬁcance.
Test results show that…
didn’t reduce conversion.

… we shouldn’t
waste time on that.
Cheat:
test signiﬁcance.

Cheat: One change
per test. Order matters.
Select products, produce
videos, upload, add links,
launch test
Add links
Select products
Produce videos
…
INCONCLUSIVE

… people don’t
click “watch
video” links.
Cheat: Measure against
your hypothesis.
… adding videos
had no impact on
conversion.
INCONCLUSIVE
CONCLUSIVE

A great presentation by Etsy:
goo.gl/WQpY65

The beneﬁt you get
from A/B testing is
knowledge not
revenue.

The beneﬁt you get
from A/B testing is
knowledge not
revenue. Revenue will
come as a result of
applied knowledge.

Don’t be 
a monkey.
Don’t be a gnome either.

What about this
1 test out of 7 that
fails?

http://conversionxl.com/ab-tests-fail/
3 out of 4 companies (that are
A/B testing) make changes based on
intuition or best practices.
50%
NOT A/B testing
50%
A/B testing

collect underpants + ?
=
proﬁt
Solve equation:
EXERCISE 5.

A/B test
is launched.
Test results come
back negative.
The idea gets killed,
next test is
launched.
A/B Testing Flow
Fail Fast Approach

One failed test doesn’t
make collecting
underpants a bad idea.

A/B test
is launched.
Test results come
back negative.
Survey responses
give a clue why.
Users are surveyed
alongside the test.
Respondents’
logs give
another clue.
Respondents
are emailed to
clarify the issue.
The issue is solved,
the test relaunched.
Users’ behaviors
are logged.
Pre-test research
is done.
Example of A/B Testing Flow at Spotify
Prepare for failure.
Courtesy of @bendressler researcher at Spotify

The real price you pay
for not researching
why tests fail is the
death of great ideas.

User
Testing
Voice of
Customer
I predict
that doing B
will change X
by Y% because
of Z.
Are
Metrics
Good?
Accepted
Rejected
What really
happened?
Insight
and Evidence
Metrics Based
Evaluation
Hypothesis
check
Evidence-Led Flow
Hypothesis Based 
A/B Testing
Qual/Quant
Analytics

User
Testing
Voice of
Customer
I predict
that doing B
will change X
by Y% because
of Z.
Are
Metrics
Good?
Accepted
Rejected
What really
happened?
Insight
and Evidence
Metrics Based
Evaluation
Hypothesis
check
Evidence-Led Flow
Hypothesis Based 
A/B Testing

1TB
Behavioural
Raw Data
40M
Unique
Interactions
Collect
behavioral
data.
Build
segmentation
rules.
41
Sets of Rules
Created
Explore,
analyze.
visualize.
Quantify
an opportunity
Translate
an insight
into a test.
average stats per website
from the last month
UseItBetter - The Platform for
Evidence-Led Experimentation at Scale

An analyst researching
for an inﬁnite amount
of time will almost
surely get you to
100% hit ratio. Which
isn’t good either.

1. Never waste your trafﬁc. 2.
Many small changes are better
than one big change.

than one big change. 3. Even
the smallest change needs an
insight.

insight. 4. Prepare for failure.

5. It’s OK to fail if you know
why you failed.

why you failed. 6. Iterate.

why you failed. 6. Iterate. 7. Be
honest.

For the sake of this presentation, I assumed that the results of the
7 tests I referred to had been correctly read out by the people who
are familiar with the terms like statistical signiﬁcance,
conﬁdence intervals, p-value etc.
Otherwise, it’s likely that the one winning test was just a phantom.
Disclaimer

Get in touch:
THE FINAL EXERCISE
Łukasz Twardowski
https://linkedin.com/in/twardowski

A/B Testing and the Infinite Monkey Theory

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie A/B Testing and the Infinite Monkey Theory

Ähnlich wie A/B Testing and the Infinite Monkey Theory (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (17)

A/B Testing and the Infinite Monkey Theory