Surveys show that on average only 1 out of 7 A/B tests run by e-commerces end up to be successful. Lukasz Twardowski, the CEO of UseItBetter, tries to explain how some of the most successful online businesses master this process turning it into iterative, evidence-led experimentation at scale programme.
8. The industry average hit rate
for A/B testing
=
Provide the benchmark:
EXERCISE 1.
9. The industry average hit rate
for A/B testing
=
14%
Just 1 out of 7 A/B tests
is successful!
http://conversionxl.com/ab-tests-fail/
Provide the benchmark:
EXERCISE 1.
10. King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack)
How to
be the greatest
monkey in the biz
if infinity is not an
option?
14. The currency in which
you pay for A/B tests
is traffic. The more you
have, the more tests
you can run.
15. The currency in which
you pay for A/B tests
is traffic. The more you
have, the more tests
you can run. Never
waste what you have.
16. Shop Direct
Scaled to 101
experiments a month
in two years.
100+ year old company
Etsy
25 releases a day,
most of them are
A/B tests.
A startup launched in 2005
http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin)
17. Zero Tests Per Month.
Here’s the test idea,
numbers and execution.
Can we proceed?
Let’s meet to
discuss. Maybe
next week?
Looks good.
Will check with Z
and get back to you.
So here’s the test idea,
numbers…
Sorry,
had other priorities.
Can we meet
next week?
Sure! (D***!)
Have you
checked with Z?
Have you…?
Have you…?
18. Ground rules:
1. Test ideas are
subject to prioritization
not approval.
19. evidence
x opportunity size
x strategy
=
priority
Magic formula:
EXERCISE 3.
The worst idea gets tested
if resources are available.
20. 101 Tests Per Month.
Ok then, we’ll
do this, this
and that test.
Others will wait.
Guys, our
strategy shifted
to checkout
optimization.
Guys, we
need to increase
basket value.
Now this
and that one…
And this…
These two
would work…
Xmas is
coming!
DO NOTHING!
…this, this
and that…
26. If 1 out of 7 tests
wins, what about the
other 6? 5 of them
will be inconclusive.
27. Most tests are inconclusive because:
a) too few users were using the changed
feature for it to get statistical significance.
b) the changed feature had little to do with
metrics used to evaluate the test.
c) there were multiple changes in the same
test and they levelled up.
28. Complete the sentence:
EXERCISE 4.
You do it to find out
what works and how well.
A/B testing is NOT about __________.making money
30. … removing a feature
… slowing down the website
…
Cheat: Experiment to
test significance.
Test results show that…
didn’t reduce conversion.
31. … we shouldn’t
waste time on that.
Cheat:
test significance.
Test results show that…
32. Cheat: One change
per test. Order matters.
Select products, produce
videos, upload, add links,
launch test
Add links
Select products
Produce videos
…
INCONCLUSIVE
33. … people don’t
click “watch
video” links.
Cheat: Measure against
your hypothesis.
… adding videos
had no impact on
conversion.
INCONCLUSIVE
CONCLUSIVE
Test results show that…
41. A/B test
is launched.
Test results come
back negative.
The idea gets killed,
next test is
launched.
A/B Testing Flow
Fail Fast Approach
42. One failed test doesn’t
make collecting
underpants a bad idea.
43. A/B test
is launched.
Test results come
back negative.
Survey responses
give a clue why.
Users are surveyed
alongside the test.
Respondents’
logs give
another clue.
Respondents
are emailed to
clarify the issue.
The issue is solved,
the test relaunched.
Users’ behaviors
are logged.
Pre-test research
is done.
Example of A/B Testing Flow at Spotify
Prepare for failure.
Courtesy of @bendressler researcher at Spotify
44. The real price you pay
for not researching
why tests fail is the
death of great ideas.
45. User
Testing
Voice of
Customer
I predict
that doing B
will change X
by Y% because
of Z.
Are
Metrics
Good?
Accepted
Rejected
What really
happened?
Insight
and Evidence
Metrics Based
Evaluation
Hypothesis
check
Evidence-Led Flow
Hypothesis Based
A/B Testing
Qual/Quant
Analytics
46. User
Testing
Voice of
Customer
I predict
that doing B
will change X
by Y% because
of Z.
Are
Metrics
Good?
Accepted
Rejected
What really
happened?
Insight
and Evidence
Metrics Based
Evaluation
Hypothesis
check
Evidence-Led Flow
Hypothesis Based
A/B Testing
51. 1. Never waste your traffic. 2.
Many small changes are better
than one big change.
52. 1. Never waste your traffic. 2.
Many small changes are better
than one big change. 3. Even
the smallest change needs an
insight.
53. 1. Never waste your traffic. 2.
Many small changes are better
than one big change. 3. Even
the smallest change needs an
insight. 4. Prepare for failure.
54. 1. Never waste your traffic. 2.
Many small changes are better
than one big change. 3. Even
the smallest change needs an
insight. 4. Prepare for failure.
5. It’s OK to fail if you know
why you failed.
55. 1. Never waste your traffic. 2.
Many small changes are better
than one big change. 3. Even
the smallest change needs an
insight. 4. Prepare for failure.
5. It’s OK to fail if you know
why you failed. 6. Iterate.
56. 1. Never waste your traffic. 2.
Many small changes are better
than one big change. 3. Even
the smallest change needs an
insight. 4. Prepare for failure.
5. It’s OK to fail if you know
why you failed. 6. Iterate. 7. Be
honest.
57. For the sake of this presentation, I assumed that the results of the
7 tests I referred to had been correctly read out by the people who
are familiar with the terms like statistical significance,
confidence intervals, p-value etc.
Otherwise, it’s likely that the one winning test was just a phantom.
Disclaimer
58. Get in touch:
THE FINAL EXERCISE
Łukasz Twardowski
https://linkedin.com/in/twardowski