SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
A/B test with problematic data
Ben Paul
May 20, 2015
Background
• It has previously been shown that user experience on our site is better if users first answer a few
questions about their preferences.
• We are testing a new landing page to determine if it will cause more users to answer at least one
question about their preferences.
• If the new landing page causes any statistically significant increase in conversion rate (percentage of
users who complete at least one question), then it will be considered a success.
Hypotheses
• The new landing page will cause a statistically significant increase in conversion rate.
Method
• Randomly assign 50% of users to a control group that will be shown the old landing page and the other
50% of users to a treatment group that will be shown the new landing page.
• Track whether each user answers at least one question or not.
• Run a z-test to determine if the treatment group had a greater conversion rate than the control group,
with the conventional cuto for statistical significance of p < 0.05, two-tailed.
Analysis
Set up environment
library("plyr")
library("dplyr", warn.conflicts = FALSE) # I m aware of the plyr/dplyr conflicts
library("scales")
knitr::opts_chunk$set(comment = NA) # remove hashes in output
Read data
dat <- read.csv("data/takehome.csv")
1
Clean data
Handle data types Check that data types are appropriate.
summary(dat); str(dat);
user_id ts ab
Min. :2.325e+04 Min. :1.357e+09 control : 90815
1st Qu.:2.488e+09 1st Qu.:1.357e+09 treatment:100333
Median :4.997e+09 Median :1.357e+09
Mean :4.998e+09 Mean :1.357e+09
3rd Qu.:7.508e+09 3rd Qu.:1.357e+09
Max. :1.000e+10 Max. :1.357e+09
landing_page converted
new_page:95574 Min. :0.0000
old_page:95574 1st Qu.:0.0000
Median :0.0000
Mean :0.1011
3rd Qu.:0.0000
Max. :1.0000
data.frame : 191148 obs. of 5 variables:
$ user_id : num 9.64e+09 2.46e+09 9.67e+09 2.25e+09 7.81e+09 ...
$ ts : num 1.36e+09 1.36e+09 1.36e+09 1.36e+09 1.36e+09 ...
$ ab : Factor w/ 2 levels "control","treatment": 2 2 1 2 1 1 1 2 2 1 ...
$ landing_page: Factor w/ 2 levels "new_page","old_page": 1 1 2 1 2 2 2 1 2 2 ...
$ converted : int 0 0 0 0 0 1 1 0 0 0 ...
Data types appear to be appropriate. The independent variables “ab” and “landing_page” each have
two levels, corresponding to the control condition (“control”/“old_page”) and the treatment condition
(“treatment”/“new_page”).
The dependent variable “converted” is an integer with just two possible values representing whether the user
answered at least one question (1) or not (0). Let’s ensure that it has no other values:
unique(dat$converted)
[1] 0 1
The dependent variable has no other values besides 0 and 1, so no cleaning is required.
In summary, there are no problematic data types or values apparent from initial inspection.
Handle duplicates The documentation indicated that each user should be assigned to just one condition,
either the control group (ab = “control”), which was shown the old landing page (landing_page = “old_page”),
or the treatment group (ab = “treatment”), which was shown the new landing page (landing_page =
“new_page”).
Therefore, each user_id should have just one row in the data set, with information about the one condition
they were assigned as well as the one landing page they were shown. If any user has more than one row,
something may have gone wrong and we will need to explore the data to determine how to handle it. Let’s
start by determining if this is an issue.
2
# find user_ids with multiple rows
dat$multi_obs <- (duplicated(dat$user_id) | duplicated(dat$user_id, fromLast = TRUE))
# print the number of rows with this issue
dat[dat$multi_obs, ] %>% nrow
[1] 9528
# print the percentage of rows that have this issue
percent((dat[dat$multi_obs, ] %>% nrow) / (dat %>% nrow))
[1] "4.98%"
These calculations show that some users do have multiple rows. These multi-observation users account for
9,528 observations, or 5% of all observations. This is concerning.
To understand this issue more fully, the next step will be to visually inspect a sample of multi-observation
users’ data.
# print a sample of multi-observation users data
dat[dat$multi_obs, ] %>%
arrange(user_id, ts) %>% # show each user s data chronologically
head(30) %>%
mutate(
# convert timestamps to human readable form
ts = ts %>% as.POSIXct(origin = "1970-01-01", tz = "GMT")
)
user_id ts ab landing_page converted multi_obs
1 203042 2013-01-01 02:56:48 treatment new_page 0 TRUE
2 203042 2013-01-01 02:56:49 treatment old_page 1 TRUE
3 2394489 2013-01-01 11:23:54 treatment new_page 0 TRUE
4 2394489 2013-01-01 11:23:55 treatment old_page 1 TRUE
5 2695427 2013-01-01 18:37:58 treatment new_page 0 TRUE
6 2695427 2013-01-01 18:37:59 treatment old_page 0 TRUE
7 3789396 2013-01-01 01:05:13 treatment new_page 0 TRUE
8 3789396 2013-01-01 01:05:14 treatment old_page 0 TRUE
9 6213582 2013-01-01 12:43:13 treatment new_page 0 TRUE
10 6213582 2013-01-01 12:43:14 treatment old_page 0 TRUE
11 7647078 2013-01-01 20:04:34 treatment new_page 0 TRUE
12 7647078 2013-01-01 20:04:35 treatment old_page 1 TRUE
13 11584819 2013-01-01 12:53:41 treatment new_page 0 TRUE
14 11584819 2013-01-01 12:53:42 treatment old_page 0 TRUE
15 11803291 2013-01-01 21:33:00 treatment new_page 0 TRUE
16 11803291 2013-01-01 21:33:01 treatment old_page 0 TRUE
17 22522327 2013-01-01 12:45:08 treatment new_page 0 TRUE
18 22522327 2013-01-01 12:45:09 treatment old_page 0 TRUE
19 22577434 2013-01-01 06:13:05 treatment new_page 0 TRUE
20 22577434 2013-01-01 06:13:06 treatment old_page 0 TRUE
21 24144768 2013-01-01 21:42:04 treatment new_page 0 TRUE
22 24144768 2013-01-01 21:42:05 treatment old_page 0 TRUE
23 25758261 2013-01-01 14:52:11 treatment new_page 0 TRUE
3
24 25758261 2013-01-01 14:52:12 treatment old_page 0 TRUE
25 29616796 2013-01-01 02:17:18 treatment new_page 0 TRUE
26 29616796 2013-01-01 02:17:19 treatment old_page 0 TRUE
27 32617932 2013-01-01 21:50:20 treatment new_page 0 TRUE
28 32617932 2013-01-01 21:50:21 treatment old_page 1 TRUE
29 32786569 2013-01-01 07:48:23 treatment new_page 0 TRUE
30 32786569 2013-01-01 07:48:24 treatment old_page 1 TRUE
In this sample of multi-observation users, it appears that such users see the new page first and then land on
the old page one second later. Inspection of all multi-observation user data verified this.
Inspection of this sample also raised the question of whether multi-observation users are primarily in the
treatment group. Analysis of all multi-observation user data (below) confirmed that 99.9% of multi-observation
users were assigned to the treatment group, and therefore should have been shown only the new page. However,
what actually happened is that multi-observation users saw the new page for one second before ultimately
landing on the old page, which was intended for the control group. This behavior does not match the intended
experimental design.
The sample data also suggest that multi-observation users never convert on the new page, which would
make sense since it was shown for just one second before they landed on the old page. Analysis of all
multi-observation user data (below) confirmed that none of these users converted on the new page.
# calculate percentage of multi-observation users assigned only to the treatment group
multi_summary <- dat[dat$multi_obs, ] %>%
group_by(user_id) %>%
summarize(all_treatment = as.numeric(all(ab == "treatment"))) # if user s rows are all "treatment" ->
percent(sum(multi_summary$all_treatment) / nrow(multi_summary))
[1] "99.9%"
# count number of times multi-observation users converted on the new page
dat[dat$multi_obs, ] %>%
filter(landing_page == "new_page", converted == 1) %>%
nrow
[1] 0
The calculations above demonstrate that, as previously discussed, 99.9% of multi-observation users were in
the treatment group, but none of them converted from the new landing page.
It would be possible to correct such users’ data by changing their label from “treatment” to “control” and
by removing the data from when they loaded the new page for a second. However, their responses may
have been influenced by a glitch in the website, which would not be generalizable to the wider audience for
which these changes are intended. In addition, they were not exposed to the experimental design as intended.
Therefore, their data would be di cult to interpret and should be removed altogether.
Note that the decision to remove their data entirely would be defensible only if multi-observation users
represented a random subset of the population under test. If multi-observation users represent a non-random
subset (e.g., people who use Internet Explorer), it would not be wise to delete their data, as it would limit the
generalizability of the results (e.g., results would then only apply to people who don’t use Internet Explorer).
Therefore, if the glitch a ected a non-random subset of users, I would advise running more users through the
study after fixing the glitch.
For the sake of this assignment, I will assume this is due to a random glitch and we can remove their data.
4
dat <- dat[!dat$multi_obs, ]
Check for further experimental errors As previously mentioned, users in the control group should
only see the old page, and users in the treatment group should only see the new page.
Therefore, after we removed users with multiple observations, if there are still any users left that saw the
wrong page given their condition, we will need to decide how to handle them.
# check that treatment and control groups saw their corresponding pages
table(dat$ab, dat$landing_page)
new_page old_page
control 0 90809
treatment 90811 0
The table indicates that we have fully removed the problematic users; each condition is now associated with
the correct landing page.
Analyze data
Now that the data has been cleaned, we can conduct a z-test to determine if there was an e ect of experimental
condition on conversion rate.
tbl <- table(dat$ab, dat$converted)
res <- tbl %>% prop.test # aka z-test
names(res$estimate) <- c("control", "treatment") # make results readable
# invert point estimates to show conversion rate rather than non-conversion rate
rates <- (1 - res$estimate)
# format confidence interval of difference as percentage
diff.conf.int <- res$conf.int
# to help with interpretation, also calculate conversion rate confidence interval for each group separat
control.conf.int <- prop.test(tbl["control", "1"], sum(tbl["control", ])) %>%
.$conf.int
treatment.conf.int <- prop.test(tbl["treatment", "1"], sum(tbl["treatment", ])) %>%
.$conf.int
Results
Examine results.
control.conf.int %>% round(3) %>% percent
[1] "9.8%" "10.2%"
5
treatment.conf.int %>% round(3) %>% percent
[1] "10.5%" "10.9%"
rates %>% round(3) %>% sapply(percent)
control treatment
"10%" "10.7%"
diff.conf.int %>% round(3) %>% percent
[1] "0.3%" "0.9%"
res["p.value"]
$p.value
[1] 1.104298e-05
The conversion rate of the old page is 10.0% (95% confidence interval, 9.8% - 10.2%). The conversion rate of
the new page is 10.7% (95% confidence interval, 10.5% - 10.9%). The new page has a higher conversion rate
than the old page (95% confidence interval of di erence, 0.3% - 0.9%), p < 0.001.
If the decision to remove the problematic users was correct, then we can say with 95% confidence that the
new page’s conversion rate is 3 - 9% greater than the old page’s conversion rate.
Discussion
Given the higher conversion rate of the new landing page, I would recommend we switch all users over to it
and to monitor whether the conversion rate increases as expected.
Regarding the discrepancy between our data and the third party’s data, I believe our data is more accurate
because we have cleaned problematic observations from it. There is no reason to believe that the third party
cleaned the data, although I would contact them to confirm this.
I would explain the discrepancy to the project manager by stating that some people were mislabeled as having
seen the new page, when really they saw the old page. Acme’s system isn’t set up to catch these problems,
but as a result of her request we were able to find and delete the bad data, uncovering the significant results
that she suspected were there all along.
To protect future experiments, it would be important to understand why these glitches occurred. Therefore, I
would discuss the issue with developers and quality assurance analysts and try to reproduce the problematic
behavior. If I’m not able to, I would o er an incentive to anyone in the company who could. (This strategy
has been successful for me in my current company: employees will actually race to reproduce an issue to earn
a gold star.) Once the conditions for reproduction are identified, we can determine how to prevent this glitch
in the future.
I would also suggest we set up monitoring in similar experiments to ensure that these problematic conditions
don’t occur again. In particular, (a) each user should have just one observation, and (b) each experimental
condition should be associated with the expected behavior (e.g., the treatment condition should be associated
with only new page and the control condition should be associated with only the old page). A first step
would be to set up as a daily email indicating whether (a) and (b) are satisfied. As we grow more confident
in the system, we could have it only email us if (a) and (b) are not satisfied.
Whenever problems arise, we should analyze what went wrong, explore whether we need to delete or correct
the relevant data, and continue to implement more safeguards to prevent similar problems in the future.
6

Weitere ähnliche Inhalte

Ähnlich wie A/B test results analysis with problematic data issues

Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing frameworkAgnes van Belle
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook ProjectBrian Ryan
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Databricks
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregressionkongara
 
Soft performance - measuring
Soft performance - measuringSoft performance - measuring
Soft performance - measuringDimiter Simov
 
Data Insights Talk
Data Insights TalkData Insights Talk
Data Insights TalkMetail
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant Yaqing Wang
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Sease
 
Data analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataData analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataKaren Yang
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
A new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem forA new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem forVenkat Projects
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterJieming Wei
 

Ähnlich wie A/B test results analysis with problematic data issues (20)

Data analysis
Data analysisData analysis
Data analysis
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing framework
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Workbook Project
Workbook ProjectWorkbook Project
Workbook Project
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
Building Understanding Out of Incomplete and Biased Datasets using Machine Le...
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Linear logisticregression
Linear logisticregressionLinear logisticregression
Linear logisticregression
 
Soft performance - measuring
Soft performance - measuringSoft performance - measuring
Soft performance - measuring
 
Data Insights Talk
Data Insights TalkData Insights Talk
Data Insights Talk
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
Rating Prediction for Restaurant
Rating Prediction for Restaurant Rating Prediction for Restaurant
Rating Prediction for Restaurant
 
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
Stat-weight Improving the Estimator of Interleaved Methods Outcomes with Stat...
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorDataData analysis_PredictingActivity_SamsungSensorData
Data analysis_PredictingActivity_SamsungSensorData
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
A new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem forA new architecture of internet of things and big data ecosystem for
A new architecture of internet of things and big data ecosystem for
 
Accurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - PosterAccurate Campaign Targeting Using Classification - Poster
Accurate Campaign Targeting Using Classification - Poster
 

Kürzlich hochgeladen

Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Kürzlich hochgeladen (20)

Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 

A/B test results analysis with problematic data issues

  • 1.
  • 2. A/B test with problematic data Ben Paul May 20, 2015 Background • It has previously been shown that user experience on our site is better if users first answer a few questions about their preferences. • We are testing a new landing page to determine if it will cause more users to answer at least one question about their preferences. • If the new landing page causes any statistically significant increase in conversion rate (percentage of users who complete at least one question), then it will be considered a success. Hypotheses • The new landing page will cause a statistically significant increase in conversion rate. Method • Randomly assign 50% of users to a control group that will be shown the old landing page and the other 50% of users to a treatment group that will be shown the new landing page. • Track whether each user answers at least one question or not. • Run a z-test to determine if the treatment group had a greater conversion rate than the control group, with the conventional cuto for statistical significance of p < 0.05, two-tailed. Analysis Set up environment library("plyr") library("dplyr", warn.conflicts = FALSE) # I m aware of the plyr/dplyr conflicts library("scales") knitr::opts_chunk$set(comment = NA) # remove hashes in output Read data dat <- read.csv("data/takehome.csv") 1
  • 3. Clean data Handle data types Check that data types are appropriate. summary(dat); str(dat); user_id ts ab Min. :2.325e+04 Min. :1.357e+09 control : 90815 1st Qu.:2.488e+09 1st Qu.:1.357e+09 treatment:100333 Median :4.997e+09 Median :1.357e+09 Mean :4.998e+09 Mean :1.357e+09 3rd Qu.:7.508e+09 3rd Qu.:1.357e+09 Max. :1.000e+10 Max. :1.357e+09 landing_page converted new_page:95574 Min. :0.0000 old_page:95574 1st Qu.:0.0000 Median :0.0000 Mean :0.1011 3rd Qu.:0.0000 Max. :1.0000 data.frame : 191148 obs. of 5 variables: $ user_id : num 9.64e+09 2.46e+09 9.67e+09 2.25e+09 7.81e+09 ... $ ts : num 1.36e+09 1.36e+09 1.36e+09 1.36e+09 1.36e+09 ... $ ab : Factor w/ 2 levels "control","treatment": 2 2 1 2 1 1 1 2 2 1 ... $ landing_page: Factor w/ 2 levels "new_page","old_page": 1 1 2 1 2 2 2 1 2 2 ... $ converted : int 0 0 0 0 0 1 1 0 0 0 ... Data types appear to be appropriate. The independent variables “ab” and “landing_page” each have two levels, corresponding to the control condition (“control”/“old_page”) and the treatment condition (“treatment”/“new_page”). The dependent variable “converted” is an integer with just two possible values representing whether the user answered at least one question (1) or not (0). Let’s ensure that it has no other values: unique(dat$converted) [1] 0 1 The dependent variable has no other values besides 0 and 1, so no cleaning is required. In summary, there are no problematic data types or values apparent from initial inspection. Handle duplicates The documentation indicated that each user should be assigned to just one condition, either the control group (ab = “control”), which was shown the old landing page (landing_page = “old_page”), or the treatment group (ab = “treatment”), which was shown the new landing page (landing_page = “new_page”). Therefore, each user_id should have just one row in the data set, with information about the one condition they were assigned as well as the one landing page they were shown. If any user has more than one row, something may have gone wrong and we will need to explore the data to determine how to handle it. Let’s start by determining if this is an issue. 2
  • 4. # find user_ids with multiple rows dat$multi_obs <- (duplicated(dat$user_id) | duplicated(dat$user_id, fromLast = TRUE)) # print the number of rows with this issue dat[dat$multi_obs, ] %>% nrow [1] 9528 # print the percentage of rows that have this issue percent((dat[dat$multi_obs, ] %>% nrow) / (dat %>% nrow)) [1] "4.98%" These calculations show that some users do have multiple rows. These multi-observation users account for 9,528 observations, or 5% of all observations. This is concerning. To understand this issue more fully, the next step will be to visually inspect a sample of multi-observation users’ data. # print a sample of multi-observation users data dat[dat$multi_obs, ] %>% arrange(user_id, ts) %>% # show each user s data chronologically head(30) %>% mutate( # convert timestamps to human readable form ts = ts %>% as.POSIXct(origin = "1970-01-01", tz = "GMT") ) user_id ts ab landing_page converted multi_obs 1 203042 2013-01-01 02:56:48 treatment new_page 0 TRUE 2 203042 2013-01-01 02:56:49 treatment old_page 1 TRUE 3 2394489 2013-01-01 11:23:54 treatment new_page 0 TRUE 4 2394489 2013-01-01 11:23:55 treatment old_page 1 TRUE 5 2695427 2013-01-01 18:37:58 treatment new_page 0 TRUE 6 2695427 2013-01-01 18:37:59 treatment old_page 0 TRUE 7 3789396 2013-01-01 01:05:13 treatment new_page 0 TRUE 8 3789396 2013-01-01 01:05:14 treatment old_page 0 TRUE 9 6213582 2013-01-01 12:43:13 treatment new_page 0 TRUE 10 6213582 2013-01-01 12:43:14 treatment old_page 0 TRUE 11 7647078 2013-01-01 20:04:34 treatment new_page 0 TRUE 12 7647078 2013-01-01 20:04:35 treatment old_page 1 TRUE 13 11584819 2013-01-01 12:53:41 treatment new_page 0 TRUE 14 11584819 2013-01-01 12:53:42 treatment old_page 0 TRUE 15 11803291 2013-01-01 21:33:00 treatment new_page 0 TRUE 16 11803291 2013-01-01 21:33:01 treatment old_page 0 TRUE 17 22522327 2013-01-01 12:45:08 treatment new_page 0 TRUE 18 22522327 2013-01-01 12:45:09 treatment old_page 0 TRUE 19 22577434 2013-01-01 06:13:05 treatment new_page 0 TRUE 20 22577434 2013-01-01 06:13:06 treatment old_page 0 TRUE 21 24144768 2013-01-01 21:42:04 treatment new_page 0 TRUE 22 24144768 2013-01-01 21:42:05 treatment old_page 0 TRUE 23 25758261 2013-01-01 14:52:11 treatment new_page 0 TRUE 3
  • 5. 24 25758261 2013-01-01 14:52:12 treatment old_page 0 TRUE 25 29616796 2013-01-01 02:17:18 treatment new_page 0 TRUE 26 29616796 2013-01-01 02:17:19 treatment old_page 0 TRUE 27 32617932 2013-01-01 21:50:20 treatment new_page 0 TRUE 28 32617932 2013-01-01 21:50:21 treatment old_page 1 TRUE 29 32786569 2013-01-01 07:48:23 treatment new_page 0 TRUE 30 32786569 2013-01-01 07:48:24 treatment old_page 1 TRUE In this sample of multi-observation users, it appears that such users see the new page first and then land on the old page one second later. Inspection of all multi-observation user data verified this. Inspection of this sample also raised the question of whether multi-observation users are primarily in the treatment group. Analysis of all multi-observation user data (below) confirmed that 99.9% of multi-observation users were assigned to the treatment group, and therefore should have been shown only the new page. However, what actually happened is that multi-observation users saw the new page for one second before ultimately landing on the old page, which was intended for the control group. This behavior does not match the intended experimental design. The sample data also suggest that multi-observation users never convert on the new page, which would make sense since it was shown for just one second before they landed on the old page. Analysis of all multi-observation user data (below) confirmed that none of these users converted on the new page. # calculate percentage of multi-observation users assigned only to the treatment group multi_summary <- dat[dat$multi_obs, ] %>% group_by(user_id) %>% summarize(all_treatment = as.numeric(all(ab == "treatment"))) # if user s rows are all "treatment" -> percent(sum(multi_summary$all_treatment) / nrow(multi_summary)) [1] "99.9%" # count number of times multi-observation users converted on the new page dat[dat$multi_obs, ] %>% filter(landing_page == "new_page", converted == 1) %>% nrow [1] 0 The calculations above demonstrate that, as previously discussed, 99.9% of multi-observation users were in the treatment group, but none of them converted from the new landing page. It would be possible to correct such users’ data by changing their label from “treatment” to “control” and by removing the data from when they loaded the new page for a second. However, their responses may have been influenced by a glitch in the website, which would not be generalizable to the wider audience for which these changes are intended. In addition, they were not exposed to the experimental design as intended. Therefore, their data would be di cult to interpret and should be removed altogether. Note that the decision to remove their data entirely would be defensible only if multi-observation users represented a random subset of the population under test. If multi-observation users represent a non-random subset (e.g., people who use Internet Explorer), it would not be wise to delete their data, as it would limit the generalizability of the results (e.g., results would then only apply to people who don’t use Internet Explorer). Therefore, if the glitch a ected a non-random subset of users, I would advise running more users through the study after fixing the glitch. For the sake of this assignment, I will assume this is due to a random glitch and we can remove their data. 4
  • 6. dat <- dat[!dat$multi_obs, ] Check for further experimental errors As previously mentioned, users in the control group should only see the old page, and users in the treatment group should only see the new page. Therefore, after we removed users with multiple observations, if there are still any users left that saw the wrong page given their condition, we will need to decide how to handle them. # check that treatment and control groups saw their corresponding pages table(dat$ab, dat$landing_page) new_page old_page control 0 90809 treatment 90811 0 The table indicates that we have fully removed the problematic users; each condition is now associated with the correct landing page. Analyze data Now that the data has been cleaned, we can conduct a z-test to determine if there was an e ect of experimental condition on conversion rate. tbl <- table(dat$ab, dat$converted) res <- tbl %>% prop.test # aka z-test names(res$estimate) <- c("control", "treatment") # make results readable # invert point estimates to show conversion rate rather than non-conversion rate rates <- (1 - res$estimate) # format confidence interval of difference as percentage diff.conf.int <- res$conf.int # to help with interpretation, also calculate conversion rate confidence interval for each group separat control.conf.int <- prop.test(tbl["control", "1"], sum(tbl["control", ])) %>% .$conf.int treatment.conf.int <- prop.test(tbl["treatment", "1"], sum(tbl["treatment", ])) %>% .$conf.int Results Examine results. control.conf.int %>% round(3) %>% percent [1] "9.8%" "10.2%" 5
  • 7. treatment.conf.int %>% round(3) %>% percent [1] "10.5%" "10.9%" rates %>% round(3) %>% sapply(percent) control treatment "10%" "10.7%" diff.conf.int %>% round(3) %>% percent [1] "0.3%" "0.9%" res["p.value"] $p.value [1] 1.104298e-05 The conversion rate of the old page is 10.0% (95% confidence interval, 9.8% - 10.2%). The conversion rate of the new page is 10.7% (95% confidence interval, 10.5% - 10.9%). The new page has a higher conversion rate than the old page (95% confidence interval of di erence, 0.3% - 0.9%), p < 0.001. If the decision to remove the problematic users was correct, then we can say with 95% confidence that the new page’s conversion rate is 3 - 9% greater than the old page’s conversion rate. Discussion Given the higher conversion rate of the new landing page, I would recommend we switch all users over to it and to monitor whether the conversion rate increases as expected. Regarding the discrepancy between our data and the third party’s data, I believe our data is more accurate because we have cleaned problematic observations from it. There is no reason to believe that the third party cleaned the data, although I would contact them to confirm this. I would explain the discrepancy to the project manager by stating that some people were mislabeled as having seen the new page, when really they saw the old page. Acme’s system isn’t set up to catch these problems, but as a result of her request we were able to find and delete the bad data, uncovering the significant results that she suspected were there all along. To protect future experiments, it would be important to understand why these glitches occurred. Therefore, I would discuss the issue with developers and quality assurance analysts and try to reproduce the problematic behavior. If I’m not able to, I would o er an incentive to anyone in the company who could. (This strategy has been successful for me in my current company: employees will actually race to reproduce an issue to earn a gold star.) Once the conditions for reproduction are identified, we can determine how to prevent this glitch in the future. I would also suggest we set up monitoring in similar experiments to ensure that these problematic conditions don’t occur again. In particular, (a) each user should have just one observation, and (b) each experimental condition should be associated with the expected behavior (e.g., the treatment condition should be associated with only new page and the control condition should be associated with only the old page). A first step would be to set up as a daily email indicating whether (a) and (b) are satisfied. As we grow more confident in the system, we could have it only email us if (a) and (b) are not satisfied. Whenever problems arise, we should analyze what went wrong, explore whether we need to delete or correct the relevant data, and continue to implement more safeguards to prevent similar problems in the future. 6