We have been trained and encouraged to focus on p-values and statistical significance in every aspect of testing, from PPC to CRO. In this talk, Richard is going to challenge your preconceptions, show how scientific accuracy isn't necessarily the same as commercial success, and demonstrate strategies that are better than waiting for your variation to be declared a winner by your testing platforms. The way you approach data-driven decision-making will never be the same.
22. The game:
1. Players can add new adverts, pause adverts and reactivate
adverts each turn
2. In between turns, active adverts get impressions and clicks
3. http://www.eanalytica.com/ad-testing/
23. Things that might be important that are not modelled:
1. That you might be an ad writing or landing page genius
2. Getting it wrong might lead you down a blind alley
24. This is useful because it is way simpler than real life
But maybe it is too simple and misses something important?
25. Different strategies:
1. Do nothing
2. Pause a random advert and add a new advert
3. Cheat
4. Run a chi-squared test
5. Run a g-test
6. Just pick the advert with the best observed CTR
31. I was really surprised by this and had to go back and double check
my code
32. A very quick introduction to multi-armed bandits
33. Explore vs Exploit
You can exploit what appears to be the best option at the time
Or you can explore to see if something else is a better option
34. There are lots of clever ways of optimising this balance.
A simple way:
X% of the time explore.
(100-X)% of the time exploit the best option
35.
36. And we can use this to crank up the number of variations in the
test!
37.
38. All of this was done on 1000 impressions per week
i.e. split 1000 impressions between the variations before doing any
test
This isn’t a huge number but smaller numbers are sometimes
common
41. All this is based on the idea of continuous testing where creating
new variations is cheap
This is mostly true for PPC text ads
If it is true for landing pages and conversion rate optimisation then
your organisation is doing very well!
42. Suppose you have a fixed window for testing after which the
winning variation will be around for a long time
43. Then the most important thing will be to end up with the best
performing variation at the end.
This is not the same as getting the best performance during the
test
44.
45. Just picking the best performing variation is again a strong strategy
46. But if we imagine that creating new variations is costly then maybe
it isn’t so good.
47. For a 10 week test the “Pick Best” strategy requires 11 different
test variations.
Other methods get nearly the same end result with 8 or fewer test
variations
48. New game:
There is not limit to how long tests can run for
There is a limit to how many new variations can be used
65. Areas for further investigation:
1. Changing click through rates
2. Weekly seasonality
3. Cost of creating new variations
66. Key Actions
Don’t worry too much about statistical significance testing
Do worry about how you can generate and deploy more test
variations
Think more about decision theory than trying to mimic what a
scientist does
67. Game code and example strategies:
http://www.eanalytica.com/SearchLove-Notebook/