Estimating Causal Impact of Recommendation Systems Using Natural Experiments

Estimating the causal
impact of
recommendation systems
AMIT SHARMA, JAKE HOFMAN, DUNCAN WATTS
MICROSOFT RESEARCH, NEW YORK
1
2nd International Conference on
Computational Social Science

How much do they
change user behavior?
2

Naively, up to 30% of traffic
comes from recommendations
3

Naively, up to 30% of traffic
comes from recommendations
“Burton Snowboard, a sports retailer, reported
that personalized product recommendations
have driven nearly 25% of total sales since it
began offering them in 2008. Prior to this,
Burton’s customer recommendations consisted
of items from its list of top-selling products.”
4

Almost surely an over-estimate of the
actual effect, because of correlated
demand between a product and its
recommendations.

Example: product browsing on
Amazon.com

Counterfactual browsing: no
recommendations

Problem: Correlated demand may
drive page visits, even without
recommendations

The problem of correlated
demand
Demand
for winter
accessories
Visits to
winter hat
Rec. visits
to winter
gloves
12

Goal: Estimate the extra activity caused
by a recommender system that would
not have happened otherwise
Causal
Convenience
OBSERVED CLICK-THROUGHS WITHOUT RECOMMENDER
Convenience
?
13

Ideal experiment: A/B Test
Treatment (A) Control (B)
But, experiments:
may be costly
hamper user experience
require full access to the system
14

Using natural variations to
simulate an experiment
16

Studying sudden spikes,
“shocks” to demand for a book
[Carmi et al. 2012]
17

The same author’s recommended
book may also have a shock
18

Past work: Controlling for
correlated demand
Uses statistical models to control for confounds
Carmi et al. [2012], Oestreicher and Sundararajan [2012] and Lin [2013]
construct “complementary sets” of similar, non-recommended
products.
Garfinkel et. al. [2006] and Broder et al. [2015] compare to model-
predicted clicks without recommendations.
But,
1. These assumptions are hard to verify.
2. Finding examples of valid shocks requires ingenuity
and restricts researchers to very specific categories
19

Shock-IV: A simpler, more robust
method for estimating causal impact.

Distinguishing between
recommendation and direct traffic
All visits to a
product
Recommender
visits
Direct visits
Search visits
Direct
browsing
Proxy for unobserved demand
21

The Shock-IV strategy:
Searching for valid shocks
? ?
22

The Shock-IV strategy: Filtering
out invalid shocks
23

Search for products that receive a
sudden shock in their traffic but direct
traffic for their recommendations
remains constant.

Why does it work? Shock as an
instrumental variable
Demand
Focal
visits (X)
Rec.
visits (Y)
Sudden
Shock
Direct
visits (Y)

Computing the causal
estimate
Increase in
recommendation
clicks ( )
Causal CTR (
*Same as Wald estimator
for instrumental variables
Increase in
visits to focal
product ( )

The shock-IV strategy: In
equations

Application to Amazon.com,
using Bing toolbar logs
•
•
•
Sept 2013-May 2014

Recreating sequence of page
visits by a user

visits by a user
Timestamp URL
2014-01-20
09:04:10
http://www.amazon.com/s/ref=nb_sb_noss
_1?field-keywords=George%20saunders
2014-01-20
09:04:15
http://www.amazon.com/dp/0812984250/r
ef=sr_1_1
2014-01-20
09:05:01
http://www.amazon.com/dp/1573225797/r
ef=pd_sim_b_2

visits by a user
Timestamp URL
2014-01-20
09:04:10
http://www.amazon.com/s/ref=nb_sb_no
ss_1?field-keywords=George%20saunders
2014-01-20
09:04:15
http://www.amazon.com/dp/0812984250/
ref=sr_1_1
2014-01-20
09:05:01
http://www.amazon.com/dp/1573225797/
ref=pd_sim_b_2
User searches for
George Saunders
User clicks on the first
search result
User clicks on the
second recommendation

I. Weekly and seasonal patterns in
traffic, nearly tripling in holidays

II. 30% of all pageviews come
through recommendations

III. Books and eBooks are the
most popular categories by far

IV. Apparel and shoes see a
substantially higher fraction of
visits through recommendations

Shock-IV: Finding shocks in
user visit data
We look for focal products with large and sudden
increases in views relative to typical traffic.
Size of shock exceeds:
◦ 5 times median traffic
◦ Shock exceeds 5 times the previous day's traffic and 5 times the
mean of the last 7 days.
Shocked product has:
◦ Visits from at least 10 unique users during the shock
◦ Non-zero visits for at least five out of seven days before and after
the shock
36

Shock-IV: Ensuring exclusion
restriction
Recommended product (Y) should have constant
direct visits during the time of the shock.
(1-β): Ratio of maximum 14-day variation in visits to a
recommended product to the size of the shock for the focal
product.
Direct traffic to Y is
stable relative to
the shock to the
focal product.
β = 1 Direct traffic to Y is
no less varying
than the shock to
focal product.
β = 0
37

How to choose 𝛽?
Accept
RejectSelect 𝛽 = 0.7

Using the method, obtain
>4000 natural experiments!

Estimating the causal
clickthrough rate (𝜌)

Causal click-through rate by
product category

Estimating fraction of observed
click-throughs that are causal
Compare the number of estimated causal clicks to
all observed recommendation clicks (non-shock
period).
43

Only a quarter of the observed
click-throughs are causal
At β = 0.7, only 25% of
recommendation traffic is
caused by the recommender.

Generalization?
Shocks may be due to
discounts or sales
Lower CTR may be due to
the holiday season
45

Local average treatment effect
(LATE), not fully generalizable
Shocked products are not a representative sample
of all products, nor are the users who participate in
them.
• Shock-IV method covers roughly one-fifth of all
products with at least 10 visits on any single day.
• Our results are robust to sale or holiday effects.
• Causal estimates are consistent with
experimental findings (e.g., Belluf et. al. [2012])
46

More generally…
A robust, scalable method for causal inference.
◦ Causal CTR for Amazon’s recommender system much less
than the naïve observational CTR.
◦ Can be applied to other domains, such as online ads.
Data mining for instruments
I. Allows us to study a much larger sample of
natural experiments, while being able to test for
exclusion restriction directly.
II. Can be used for finding potential instruments.
47

Thank you!
AMIT SHARMA
MICROSOFT RESEARCH
http://www.amitsharma.in
Sharma, A., Hofman, J. M., & Watts, D. J. (2015). Estimating the causal impact of
recommendation systems from observational data. In Proceedings of the Sixteenth ACM
Conference on Economics and Computation.
48
Shock-IV: A robust, scalable method for estimating
causal impact from observational data, with testable
assumptions.
Naïve observational estimates of CTR for
recommendation systems may be big overestimates.

Estimating Causal Impact of Recommendation Systems Using Natural Experiments

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (8)

Similar to Estimating Causal Impact of Recommendation Systems Using Natural Experiments

Similar to Estimating Causal Impact of Recommendation Systems Using Natural Experiments (20)

More from Amit Sharma

More from Amit Sharma (13)

Recently uploaded

Recently uploaded (20)

Estimating Causal Impact of Recommendation Systems Using Natural Experiments