LivePerson Developers is proud to host a meetup about A/B testing by Shlomo Lahav, Chief Scientist at LivePerson.
The lecture will focus on testing and the ability to deduct conclusions, especially in the web.
- What is an A/B test?
- How to construct an A/B test properly?
- What are the metrics that can be used?
- Can the results be miss leading?
- Errors: bias and statistical errors
- First and second type errors
- Measuring lift, why lift is a biased measure.
- Is it possible to change the test settings during the test?
- How to run a multivariate testing effectively?
4. Possible solutions
• A model that describe the results and
evaluates the marginal effect of the
alternatives
• Test the alternatives side by side while all
the rest is equal
4
5. Example
• the problem: Testing two different layouts
of a web page (A and B)
•
•
•
•
Population: visitors/visits
Performance: conversion rate
Alternatives: two different layouts
Objective: the find the better layout and
asses the performance difference
5
6. What does it mean all the rest being equal
• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent
6
7. Population: Visitor vs. visit
Population
Visitor
Visitor
Visit
Measurement
Visit conversion
rate
Lifetime
conversions per
visitor
Visit conversion
rate
Issues
Independency is
violated
A visitor may be
exposed to both A
and B (in different
visits)
7
8. Errors
• When we compare a test alternative to the
control alternative
• False Positive – Calling the test to be the
winner by mistake
• False Negative – calling the control to be
the winner by mistake
8
9. When do we end the test
• After a predefined period/observations.
• When the difference is significant
9
10. What does it mean all the rest being equal
• Fairness: for every member in the
population, the probability to be allocated
to A is the same.
• For each member, any other decisions is
independent with the test allocation (A/B).
• Observations are independent
10
11. Example
• We want to test two alternatives and
select the better one.
• The results are: CR(A)=9.21%,
CR(B)=11.93%. The win of B is statistical
significant (p-value<5%).
• We need to estimate the gain of B vs. A.
• Is our estimate of 2.72% a fair estimate?
11
13. Selection bias
• An AB test is conducted between A1,
A2,…,An
• After the test is completed, we select Ak.
• Should we expect Ak to perform as it did
during the test?
• Does the test outcome (the rank of k)
affects our expectation?
13
14. What else can go wrong?
• Independency is not maintained (traffic,
changes etc.)
• The fairness is handled by random
allocation. This can be biased due chance
• The significance level is usually higher
than planned (continues evaluation) which
results in a higher false positive.
14
15. How to control the traffic split?
• By percentage or round robin?
• Can we change the split?
15
16. Another example
• Need to test two design layouts in multiple
location, while each location has a
different conversion rate.
• Different populations – use lifts and
accumulate the lifts.
• How do we calculate the lift: A over B or B
over A?
16