This document discusses the need for causal inference techniques when evaluating the impact of computing systems on people's lives. It provides several examples where naively using predictive metrics or overall aggregated statistics can obscure important effects. Causal inference methods that consider context and attempt to isolate causal effects, such as through conditioning on relevant variables, are needed to properly understand how systems influence users and society.
The Impact of Computing Systems | Causal inference in practice
1. The Impact of Computing Systems:
Causal inference in practice
Amit Sharma
Microsoft Research
www.amitsharma.in
Twitter: @amt_shrma
Email: amshar@microsoft.com
Summer School on Human-Centered AI
http://www.hcixb.org/
2. I. How little we know about
the systems we build
II. How can causal inference
help?
6. What is the impact of these systems on our lives?
Efficie
ncy Convenie
nce
Inclusi
on
Fairne
ss Accountab
ility
Transpare
ncy
7. What will be the impact of computing systems
on their lives?
8.
9. (New?) social science of a world
mediated by computing systems
Programming
Data science
Machine learning
Sensors and Systems
Sociology
Psychology
Ethics
Political Science
Economics
Development Studies
10. Many different communities
• Human Computer Interaction (HCI)
• Human Factors in Computing Systems (CHI)
• Computer Supported Cooperative Work (CSCW)
• Science and Technology Studies (STS)
• Computational Social Science (CSS)
• Information & Communication Technology and Development (ICTD)
• Computing and Sustainable Societies (COMPASS)
12. My path
“Intelligent systems that help
people”
Recommendation systems
Social networking platforms
Prediction
Can we predict what you’ll be interested in?
“How much do
recommender systems
shape people’s
decisions?”
“How much does a
social NewsFeed
influence people’s
information access?
“How do the
recommender systems
affect sellers on a
platform?
“How do you know
that recommendations
are having a positive
impact?
Causation
Can we estimate the effect of our recommendations?
13. I. How little we know about
the systems we build
II. How can causal inference
help?
14. 1. What’s the right decision?
Use the social feed to predict a user's future
activity (e.g, Likes).
• Future Likes -> f( items in social feed) + 𝜖
Highly predictive model.
“Would changing what a
person sees in their feed
change what they Like?”
a) Yes
b) No
c) Maybe, maybe not
15. Prediction !=
Decision-making
Would changing what
people see in the feed
affect what a user likes?
Maybe, maybe not (!)
Items liked
by a user
Homophily
Items in
Social Feed
Items liked
by a user
Items in
Social Feed
Predictability due to
feed influence
Predictability due to
homophily
17. Comparing old versus new algorithm
17
Old Algorithm (A) New Algorithm (B)
50/1000 (5%) 54/1000 (5.4%)
18. Change in Success Rate by activity-level
18
Old Algorithm (A) New Algorithm (B)
10/400 (2.5%) 4/200 (2%)
Old Algorithm (A) New Algorithm (B)
40/600 (6.6%) 50/800 (6.2%)
0
1
2
3
4
5
6
7
8
1 2 3 4
SR
19. Is Algorithm A better?
Which algorithm will you choose?
Old algorithm (A) New Algorithm
(B)
CTR for Low-
Activity users
10/400 (2.5%) 4/200 (2%)
CTR for High-
Activity users
40/600 (6.6%) 50/800 (6.2%)
Total CTR 50/1000 (5%) 54/1000 (5.4%)
19
20. Is Algorithm A still better?
The Simpson’s paradox
Old algorithm (A) New Algorithm (B)
CTR for Low-
Activity users
Low-Income: 1/200 (0.5%)
High-Income: 9/200 (4.5%)
Low-Income: 4/100 (4%)
High-Income: 0/100 (0%)
CTR for High-
Activity users
Low-Income: 10/500 (2%)
High-Income: 30/100 (30%)
Low-Income: 45/600 (7.5%)
High-Income: 5/200 (2.5%)
Total CTR 50/1000 (5%) 54/1000 (5.4%)
20
21. E.g., Algorithm A could have been shown at different
times than B.
There could be other hidden causal variations.
Answer (as usual): May be, may be not.
21
22. Average comment length decreases over time.
Example: Simpson’s paradox in Reddit
22
But for each yearly cohort of users, comment length
increases over time.
26. What is the effect of a taxi-app’s matching algorithm on people’s incomes?
What is the effect of algorithmic screening on a patient’s health?
What is the influence of an online social feed on a person’s behavior?
From interventions to algorithmic interventions
27. Definition: X causes Y iff
changing X leads to a change in Y,
keeping everything else constant.
The causal effect is the magnitude by which Y is changed by a
unit change in X.
Called the “interventionist” interpretation of causality.
A practical definition
27
http://plato.stanford.edu/entries/causation-
mani/
29. Powerful statistical frameworks
29
For more details, check out a KDD tutorial on causal inference by Emre Kiciman and I:
https://causalinference.gitlab.io/kdd-tutorial/
37. So how about comparing with a similar
user instead of random
37
38. Continuing example: Effect of Algorithm on
CTR
38
Does new Algorithm B increase CTR for recommendations on Windows
Store, compared to old algorithm A?
39. Previous example: Effect of Algorithm
over CTR
Does new Algorithm B increase CTR for recommendations on Windows
Store, compared to old algorithm A?
39
45. I. How little we know about
the systems we build
II. How can causal inference
help?
46. Example 1: Causal effect of a social news feed
Amit Sharma, Dan Cosley (2016). Distinguishing Between Personal Preferences and Social
Influence in Online Activity Feeds (Honorable Mention for Best Paper award) . Proceedings of
the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing.
47. Example 1: Causal effect of a social newsfeed
47
Non-FriendsEgo Network
f5
u
f1
f4
f3f2
n5
u
n1
n4
n3n2
48. Example 2: Is a search engine fair to all its users?
Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, Emine Yilmaz (2017).
Auditing Search Engines for Differential Satisfaction Across Demographics. Proceedings of the 26th International
Conference on World Wide Web (Industry Track).
49. Tricky: straightforward optimization can lead
to differential performance
• Search engine uses a standard metric: time spent on clicked
result page as an indicator of satisfaction.
• Goal: estimate difference in user satisfaction between these two
demographic groups.
• Suppose older users issue more of “retirement planning” queries
Age: >50 years
80% users 10% users
Age: <30 years
…
50. Overall metrics can hide differential
satisfaction
• Average user satisfaction for “retirement planning” may be high.
But,
• Average satisfaction for younger users=0.7
• Average satisfaction for older users=0.2
52. Pitfalls with Overall Metrics
• Conflate two separate effects:
• natural demographic variation caused by the differing traits among the
different demographic groups e.g.
• Different queries issued
• Different information need for the same query
• Even for the same satisfaction, demographic A tends to click more than demographic B
• Systemic difference in user satisfaction due to the search engine
53. Utilize work from causal inference
Information
Need
Demographics
Metric
User
satisfaction
Query
Search
Results
54. I. Context Matching: selecting for activity with
near-identical context
Information
Need
Demographics
Metric
User
satisfaction
Query
Search
Results
Context
58. Confounding: Observed click-throughs
may be due to correlated demand
58
Demand for
The Road
Visits to The
Road
Rec. visits to
No Country
for Old Men
Demand for
No Country for
Old Men
59. Observational click-through rate overestimates
causal effect
59
Amit Sharma, Jake M Hofman, Duncan J Watts (2018). Split-door criterion: Identification of causal effects
through auxiliary outcomes. The Annals of Applied Statistics.
60. Example 4: Prioritizing tuberculosis patients
for followup
• TB is the leading infectious cause of death globally
• TB treatment takes 6 months or more
• Poor adherence to treatment increases risk of relapse, drug
resistance, and death
• India’s government TB program has used Directly Observed
Treatment (DOT) to monitor adherence, but effort-intensive
for patients and providers
Jackson A Killian, Bryan Wilder, Amit Sharma, Vinod Choudhary, Bistra Dilkina, Milind Tambe (2019). Learning to
Prescribe Interventions for Tuberculosis Patients using Digital Adherence Data. Proc. KDD 2019.
62. Combination of Caller
ID and numbers called
shows that doses are
in patient’s hands.
Background: How 99Dots works
* Slide content sourced from Everwell.
63. Two questions
•“How to help health workers reprioritize their
interventions?”
• “Looking at a week’s data, can we predict adherence for the
next week?”
64. Machine learning task
• Input (t-7,t)
• demographic features (age, gender, location)
• Call details (number of calls, time of calls, days between calls, etc.)
• Output (t, t+7)
• Number of calls in the next week
Obtain nearly 0.85 AUC.
65. Tale of Two worlds
• Person makes no calls in week 1,
intervention, starts making calls
in week 2
• Person makes no calls in week 1,
intervention, no calls in week 2
66. A causal model for interventions
Person’s
Behavior (t)
Health worker’s
intervention
Call to 99Dots
(t)
Person’s
Behavior (t-1)
Call to 99Dots
(t-1)
67. Domain-based filtering solution
• 99Dots records suggested attention level for each patient
• High: 4 or more calls missed in the last week
• Medium: 1 to 4 calls missed in the last week
• Low: No missed calls
Medium -> High?
• Given last week’s data, can we predict whether a person moves
from Medium to High attention ?
68. Complex model and lower accuracy, but are
able to save more missed doses
69. Example 5: What is the effect of peer support
on mental health forums?
70. Talklife: thousands of “counselling”
conversations online
• A social network for peer support
• People experiencing mental distress
can post on Talklife and get support
from their peers.
• Global network, but also has Indian
users
• Can we identify patterns of
successful peer support
conversations?
“Moments of cognitive change”
Yada Pruksachatkun, Sachin R. Pendse, Amit Sharma (2019). Moments of Change: Analyzing Peer-Based Cognitive
Support in Online Mental Health Forums. Proceedings of the 2019 CHI Conference on Human Factors in Computing
Systems.
71.
72. Summary
People + Computing
• Our lives are being mediated by computing systems, often using
predictive models.
• The impact can shape the future of our society!
• But their impact is far from obvious.
• Naïve prediction metrics can lead us astray.
Need causal reasoning + understanding context
73. Thank you
Amit Sharma
@amt_shrma
www.amitsharma.in
• Our lives are being mediated by computing systems, often using
predictive models.
• The impact can shape the future of our society!
• But their impact is far from obvious.
• Naïve prediction metrics can lead us astray.
Need causal reasoning + understanding context