SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Bayesian Nonparametric Models for Treatment
Effect Heterogeneity: Parameterization, Prior
Choice, and Posterior Summarization
Jared S. Murray – The University of Texas at Austin.
Joint work with Carlos Carvalho, P. Richard Hahn, David Yeager, et al.
December 2019
National Study of Learning Mindsets
1
National Study of Learning Mindsets
• National Study of Learning Mindsets (Yeager et. al., 2019):
Randomized controlled trial of a low-cost mindset intervention
• Probability sample, 65 schools in this analysis (> 11, 000 9th
grade students)
• Specifically designed to assess treatment effect heterogeneity
by factors including baseline levels of growth mindset beliefs
and overall school quality/achievement
2
National Study of Learning Mindsets
Desiderata for our analysis:
• Avoid model selection/specification search
• School-level effect heterogeneity explained and unexplained by
covariates
• Individual-level effect heterogeneity explained by covariates
• Interpretable model summaries for communicating results
• Uncertainty!
We can get all these by modifying Bayesian causal forests (Hahn,
Murray, and Carvalho (2017))
3
Our (generic) assumptions
Strong ignorability:
Yi(0), Yi(1) ⊥⊥ Zi | Xi = xi,
Positivity:
0 < Pr(Zi = 1 | Xi = xi) < 1
for all i. Then
P(Y(z) | x) = P(Y | Z = z, x)
,
and the conditional average treatment effect (CATE) is
τ(xi) : = E(Yi(1) − Yi(0) | xi)
= E(Yi | xi, Zi = 1) − E(Yi | xi, Zi = 0).
4
Parameterizing Nonparametric Models of Causal Effects
Let’s forget covariates for a second and pretend we have a simple
RCT.
A simple model:
(Yi | Zi = 0)
iid
∼ N(µ0, σ2
)
(Yi | Zi = 1)
iid
∼ N(µ1, σ2
)
where the estimand of interest is τ ≡ µ1 − µ0.
If µj ∼ N(ϕj, δj) independently then τ ∼ N(ϕ1 − ϕ0, δ0 + δ1)
Often we have stronger prior information about τ than µ1 or µ0 – in
particular, we expect it to be small.
5
Parameterizing Nonparametric Models of Causal Effects
A more natural parameterization:
(Yi | Zi = 0)
iid
∼ N(µ, σ2
)
(Yi | Zi = 1)
iid
∼ N(µ + τ, σ2
)
where the estimand of interest is still τ.
Now we can express prior beliefs on τ directly.
6
Parameterizing Nonparametric Models of Causal Effects
How does this relate to models for heterogeneous treatment effects?
Consider (mostly) separate models for treatment arms:
yi = fzi
(xi) + ϵi ϵi ∼ P
Independent priors on f0, f1 → prior on τ(x) ≡ f1(x) − f0(x) has larger
variance than prior on f0 or f1
No direct prior control – simple forms for f0, f1 can compose to
complex τ.
Every variable in x is necessarily a potential effect modifier.
7
Parameterizing Nonparametric Models of Causal Effects
What about the “just another covariate” parameterization?
yi = f(xi, zi) + ϵi ϵi ∼ P
Then the heterogeneous treatment effects given by
τ(x) ≡ f(x, 1) − f(x, 0)
and every variable in x is still a potential effect modifier, and we
have no direct prior control for most priors on f...
8
Parameterizing Nonparametric Models of Causal Effects
Set f(xi, zi) = µ(xi) + τ(wi)zi, where w is a subset of x:
yi = µ(xi) + τ(wi)zi + ϵi, ϵi ∼ P
The heterogeneous treatment effects are given by τ(w) directly, and
can be easily regularized.
In Hahn et. al. (2017) we use independent Bayesian additive
regression tree priors on µ and τ (“Bayesian causal forests”)
9
Building Functions with Regression Trees
Represent each function µ and τ as the sum of many small
regression trees:
∑
h
g(x, Th, Mh)Building Regression Functions with Trees
10
Tweaking priors for causal effects
Several adjustments to the BART prior on τ:
• Higher probability on smaller τ trees (than BART defaults)
• Higher probability on “stumps”, trees that never split (all stumps
= homogeneous effects)
• N+
(0, v) Hyperprior on the scale of leaf parameters in τ
11
What about observational data?
What changes with (measured) confounding?
Not much, but we should include an estimated propensity score as a
covariate:
yi = µ(xi, ˆπ(xi)) + τ(wi)zi + ϵi, ϵi ∼ P
Mitigates regularization induced confounding (RIC); see Hahn et. al.
(2017) for details.
Lots of independent empirical evidence this is important (cf Dorie et
al (2019), Wendling et al (2019)).
12
The National Study of Learning Mindsets
13
Analysis of National Study of Learning Mindsets
• A new analaysis of effects on math GPA in the overall population
of students
• Interesting moderators are baseline level of mindset norms,
school achievement, and minority composition
• Many, many other controls
The usual approach is the multilevel linear model...
14
Multilevel BCF
Replaces the linear pieces of a multilevel model with functions that
have BART priors:
yij = αj + µ(xij) +
[
ϕj + τ(wij)
]
zij + ϵij, ϵij ∼ N(0, σ2
)
αj, ϕj have standard random effect priors (normal with half-t
hyperpriors on scale)
w = school achievement, baseline mindset norms, minority
composition
15
We fit this model, now what?
How do we present our results? Average treatment effect (ATE),
school ATE, plots of (school) ATE by xj....
Two questions from our collaborators:
• For which groups was the treatment most/least effective?
• What is the impact of posited treatment effect modifiers,
holding other modifiers constant?
16
We fit this model, now what?
How do we present our results? Average treatment effect (ATE),
school ATE, plots of (school) ATE by xj....
Two questions from our collaborators:
• For which groups was the treatment most/least effective?
• What is the impact of posited treatment effect modifiers,
holding other modifiers constant?
16
Subgroup finding as a decision problem
The action γ is choosing subgroups, here represented by a recursive
partition (tree)
• Minimize the posterior expected loss
ˆγ = arg min
˜γ∈Γ
Eτ [d(τ, ˜γ) + p(˜γ) | Y, x]
where p() is a complexity penalty and d() is squared error
averaged over a distribution for x.
• With ˆγ in hand we can look at the joint posterior distribution of
subgroup ATEs
Relaxing complexity penalties in the loss function → growing a
deeper tree
(Hahn et al (2017), Sivaganesan et al (2017))
17
Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
−0.05 0.00 0.05 0.10 0.15
0510152025
High Achieving
CATE = 0.016
n = 5023
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
−0.05 0.00 0.05 0.10 0.15
0510152025
High Achieving
CATE = 0.016
n = 5023
−0.05 0.05 0.10 0.15 0.20
04812
Diff in Subgroup ATE
Density
Pr(diff > 0) = 0.93
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
Achievement
Norms
Lower Achieving
High Norm
CATE = 0.073
n = 3265
High Achieving
CATE = 0.016
n = 5023
Pr(diff > 0) = 0.81
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
Low Achieving
Low Norm
CATE = 0.010
n = 1208
Mid Achieving
Low Norm
CATE = 0.045
n = 2045
Achievement
≤ 0.38 > 0.38
−0.05 0.00 0.05 0.10 0.15
0510152025
−0.05 0.05 0.10 0.15 0.20
048
Diff in Subgroup ATE
We fit this model, now what?
How do we present our results? ATE, school ATE, plots of school ATE
by variables....
Two questions we got from our collaborators:
• For which groups was the treatment most/least effective?
• What was the impact of posited treatment effect modifiers,
holding other modifiers constant?
We could generate posterior distributions of individual-level partial
effects manually and try to summarize these, or try to approximate τ
by a proxy with simple partial effects (i.e. an additive function)
18
Counterfactual treatment effect predictions
• How do estimated treatment
effects change in lower
achieving/low norm schools if
norms increase, holding
constant school minority comp
& achievement?
• Not a formal causal mediation/
moderation analysis (roughly,
we would need “no
unmeasured moderators
correlated with norms”)
Achievement
Norms
Lower Achieving
Low Norm
CATE = 0.032
n = 3253
Lower Achieving
High Norm
CATE = 0.073
n = 3265
High Achieving
CATE = 0.016
n = 5023
> 0.67 ≤ 0.67
≤ 0.53 > 0.53
0
5
10
15
0.0 0.1 0.2
CATE
density
Increase +0.5 IQR +1 IQR +10% Orig Group Low Norm/Lower Ach High Norm/Lower Ach
	Original		0.032	(-0.011	0.076)	
+10%							0.050	(0.005,	0.097)	
+Half	IQR		0.051	(0.005,	0.099)	
+Full	IQR		0.059	(0.009,	0.114)	
1 IQR = 0.6 extra problems
on worksheet task
An Alternative: Interpretable Summaries
The goal: Examine the “best” (in a user-defined sense) simple
approximation to the “true” τ(x)
Given samples of τ,
1. Consider a class of simple/interpretable approximations Γ to τ
2. Make inference on
γ = arg min
˜γ∈Γ
d(τ, ˜γ) + p(˜γ)
for an appropriate distance function d and complexity penalty
p(γ)
Get draws of γ by solving the optimization for each draw of τ
Also get discrepancy metrics, like pseudo-R2
: Cor2
[γ(xi), τ(xi)]
(Woody, Carvalho, and Murray (2019) for this approach in predictive models)
19
Approximate partial effects of treatment effect modifiers
Here we use:
• An additive approximation γ(x)
• d(τ, γ) =
∑n
i=1(τ(xi) − γ(xi))2
• A smoothness penalty p(γ)
Don’t average draws to get a point estimate! Treat
d(τ, ˜γ) + p(˜γ) =
n∑
i=1
(τ(xi) − γ(xi))2
+ p(˜γ)
as a loss function, and minimize posterior expected loss:
ˆγ = arg min
˜γ∈Γ
Eτ [d(τ, ˜γ) + p(˜γ) | Y, x]
20
−2 −1 0 1 2
−0.2−0.10.00.10.2
Approx Additive Effect
School Achievement
gamma_j
2.0 2.5 3.0 3.5
−0.15−0.10−0.050.000.050.10
Approx Additive Effect
Baseline Mindset Norms
gamma_j
0.4 0.5 0.6 0.7 0.8 0.9 1.0
01234
Approximation R^2
N = 2000 Bandwidth = 0.01901
Density
(Partial effect of minority composition not shown)
−2 −1 0 1 2
−0.2−0.10.00.10.2
Approx Additive Effect
School Achievement
gamma_j
2.0 2.5 3.0 3.5
−0.15−0.10−0.050.000.050.10
Approx Additive Effect
Baseline Mindset Norms
gamma_j
0.4 0.5 0.6 0.7 0.8 0.9 1.0
051015
Approximation R^2
N = 2000 Bandwidth = 0.004762
Density
(Partial effect of minority composition not shown)
Thank you!
• Yeager, David S., Paul Hanselman, Gregory M. Walton, Jared S.
Murray, Robert Crosnoe, Chandra Muller, Elizabeth Tipton et al.
”A national experiment reveals where a growth mindset
improves achievement.” Nature 573, no. 7774 (2019): 364-369.
• Hahn, P. Richard , Jared S. Murray, Carlos M. Carvalho: “Bayesian
regression tree models for causal inference: regularization,
confounding, and heterogeneous effects”, 2017; arXiv:1706.09523.
• Woody, Spencer, Carlos M. Carvalho, Jared S. Murray: “Model
interpretation through lower-dimensional posterior
summarization”, 2019; arXiv:1905.07103.
21

Weitere ähnliche Inhalte

Was ist angesagt?

Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Student's T test distributions & its Applications
Student's T test distributions & its Applications Student's T test distributions & its Applications
Student's T test distributions & its Applications
vidit jain
 

Was ist angesagt? (20)

Factorial Experiments
Factorial ExperimentsFactorial Experiments
Factorial Experiments
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Student's T test distributions & its Applications
Student's T test distributions & its Applications Student's T test distributions & its Applications
Student's T test distributions & its Applications
 
2008 JSM - Meta Study Data vs Patient Data
2008 JSM - Meta Study Data vs Patient Data2008 JSM - Meta Study Data vs Patient Data
2008 JSM - Meta Study Data vs Patient Data
 
APHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants ModelAPHA 2012: Hierarchical Multiple Informants Model
APHA 2012: Hierarchical Multiple Informants Model
 
T test and ANOVA
T test and ANOVAT test and ANOVA
T test and ANOVA
 
testing as a mixture estimation problem
testing as a mixture estimation problemtesting as a mixture estimation problem
testing as a mixture estimation problem
 
Causality-inspired ML - a use case in unsupervised domain adaptation
Causality-inspired ML - a use case in unsupervised domain adaptationCausality-inspired ML - a use case in unsupervised domain adaptation
Causality-inspired ML - a use case in unsupervised domain adaptation
 
Ee184405 statistika dan stokastik statistik deskriptif 1 grafik
Ee184405 statistika dan stokastik   statistik deskriptif 1 grafikEe184405 statistika dan stokastik   statistik deskriptif 1 grafik
Ee184405 statistika dan stokastik statistik deskriptif 1 grafik
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis ModelUsing the Breeder GA to Optimize a Multiple Regression Analysis Model
Using the Breeder GA to Optimize a Multiple Regression Analysis Model
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
Mixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc ComparisonsMixed Effects Models - Post-Hoc Comparisons
Mixed Effects Models - Post-Hoc Comparisons
 
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
To Interpret the SPSS table of Independent sample T-Test, Paired sample T-Tes...
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
 
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
PMED Transition Workshop - Estimation & Optimization of Composite Outcomes - ...
 
Classification
ClassificationClassification
Classification
 

Ähnlich wie Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019

MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
The Statistical and Applied Mathematical Sciences Institute
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
butest
 

Ähnlich wie Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019 (20)

Data analysis
Data analysisData analysis
Data analysis
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
ECONOMETRICS I ASA
ECONOMETRICS I ASAECONOMETRICS I ASA
ECONOMETRICS I ASA
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...A Tree-Based Approach  for Addressing Self-selection in Impact Studies with B...
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
 
Causal Random Forest
Causal Random ForestCausal Random Forest
Causal Random Forest
 
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
#2ECcourse.ppthjjkhkjhkjhjkjhkhhkhkhkhkj
 
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
#2ECcourse.pptgjdgjgfjgfjhkhkjkkjhkhkhkjhjkj
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 

Mehr von The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 

Mehr von The Statistical and Applied Mathematical Sciences Institute (20)

2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
2019 GDRR: Blockchain Data Analytics - Tracking Criminals by Following the Mo...
 
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Disclosure in the World of Cryptocurre...
 
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
2019 GDRR: Blockchain Data Analytics - Real World Adventures at a Cryptocurre...
 
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...2019 GDRR: Blockchain Data Analytics  - Dissecting Blockchain Price Analytics...
2019 GDRR: Blockchain Data Analytics - Dissecting Blockchain Price Analytics...
 
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
2019 GDRR: Blockchain Data Analytics - ChainNet: Learning on Blockchain Graph...
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatment Effect Heterogeneity: model parameterization, prior Choice, and posterior summarization - Jared Murray, December 9, 2019

  • 1. Bayesian Nonparametric Models for Treatment Effect Heterogeneity: Parameterization, Prior Choice, and Posterior Summarization Jared S. Murray – The University of Texas at Austin. Joint work with Carlos Carvalho, P. Richard Hahn, David Yeager, et al. December 2019
  • 2. National Study of Learning Mindsets 1
  • 3. National Study of Learning Mindsets • National Study of Learning Mindsets (Yeager et. al., 2019): Randomized controlled trial of a low-cost mindset intervention • Probability sample, 65 schools in this analysis (> 11, 000 9th grade students) • Specifically designed to assess treatment effect heterogeneity by factors including baseline levels of growth mindset beliefs and overall school quality/achievement 2
  • 4. National Study of Learning Mindsets Desiderata for our analysis: • Avoid model selection/specification search • School-level effect heterogeneity explained and unexplained by covariates • Individual-level effect heterogeneity explained by covariates • Interpretable model summaries for communicating results • Uncertainty! We can get all these by modifying Bayesian causal forests (Hahn, Murray, and Carvalho (2017)) 3
  • 5. Our (generic) assumptions Strong ignorability: Yi(0), Yi(1) ⊥⊥ Zi | Xi = xi, Positivity: 0 < Pr(Zi = 1 | Xi = xi) < 1 for all i. Then P(Y(z) | x) = P(Y | Z = z, x) , and the conditional average treatment effect (CATE) is τ(xi) : = E(Yi(1) − Yi(0) | xi) = E(Yi | xi, Zi = 1) − E(Yi | xi, Zi = 0). 4
  • 6. Parameterizing Nonparametric Models of Causal Effects Let’s forget covariates for a second and pretend we have a simple RCT. A simple model: (Yi | Zi = 0) iid ∼ N(µ0, σ2 ) (Yi | Zi = 1) iid ∼ N(µ1, σ2 ) where the estimand of interest is τ ≡ µ1 − µ0. If µj ∼ N(ϕj, δj) independently then τ ∼ N(ϕ1 − ϕ0, δ0 + δ1) Often we have stronger prior information about τ than µ1 or µ0 – in particular, we expect it to be small. 5
  • 7. Parameterizing Nonparametric Models of Causal Effects A more natural parameterization: (Yi | Zi = 0) iid ∼ N(µ, σ2 ) (Yi | Zi = 1) iid ∼ N(µ + τ, σ2 ) where the estimand of interest is still τ. Now we can express prior beliefs on τ directly. 6
  • 8. Parameterizing Nonparametric Models of Causal Effects How does this relate to models for heterogeneous treatment effects? Consider (mostly) separate models for treatment arms: yi = fzi (xi) + ϵi ϵi ∼ P Independent priors on f0, f1 → prior on τ(x) ≡ f1(x) − f0(x) has larger variance than prior on f0 or f1 No direct prior control – simple forms for f0, f1 can compose to complex τ. Every variable in x is necessarily a potential effect modifier. 7
  • 9. Parameterizing Nonparametric Models of Causal Effects What about the “just another covariate” parameterization? yi = f(xi, zi) + ϵi ϵi ∼ P Then the heterogeneous treatment effects given by τ(x) ≡ f(x, 1) − f(x, 0) and every variable in x is still a potential effect modifier, and we have no direct prior control for most priors on f... 8
  • 10. Parameterizing Nonparametric Models of Causal Effects Set f(xi, zi) = µ(xi) + τ(wi)zi, where w is a subset of x: yi = µ(xi) + τ(wi)zi + ϵi, ϵi ∼ P The heterogeneous treatment effects are given by τ(w) directly, and can be easily regularized. In Hahn et. al. (2017) we use independent Bayesian additive regression tree priors on µ and τ (“Bayesian causal forests”) 9
  • 11. Building Functions with Regression Trees Represent each function µ and τ as the sum of many small regression trees: ∑ h g(x, Th, Mh)Building Regression Functions with Trees 10
  • 12. Tweaking priors for causal effects Several adjustments to the BART prior on τ: • Higher probability on smaller τ trees (than BART defaults) • Higher probability on “stumps”, trees that never split (all stumps = homogeneous effects) • N+ (0, v) Hyperprior on the scale of leaf parameters in τ 11
  • 13. What about observational data? What changes with (measured) confounding? Not much, but we should include an estimated propensity score as a covariate: yi = µ(xi, ˆπ(xi)) + τ(wi)zi + ϵi, ϵi ∼ P Mitigates regularization induced confounding (RIC); see Hahn et. al. (2017) for details. Lots of independent empirical evidence this is important (cf Dorie et al (2019), Wendling et al (2019)). 12
  • 14. The National Study of Learning Mindsets 13
  • 15. Analysis of National Study of Learning Mindsets • A new analaysis of effects on math GPA in the overall population of students • Interesting moderators are baseline level of mindset norms, school achievement, and minority composition • Many, many other controls The usual approach is the multilevel linear model... 14
  • 16. Multilevel BCF Replaces the linear pieces of a multilevel model with functions that have BART priors: yij = αj + µ(xij) + [ ϕj + τ(wij) ] zij + ϵij, ϵij ∼ N(0, σ2 ) αj, ϕj have standard random effect priors (normal with half-t hyperpriors on scale) w = school achievement, baseline mindset norms, minority composition 15
  • 17. We fit this model, now what? How do we present our results? Average treatment effect (ATE), school ATE, plots of (school) ATE by xj.... Two questions from our collaborators: • For which groups was the treatment most/least effective? • What is the impact of posited treatment effect modifiers, holding other modifiers constant? 16
  • 18. We fit this model, now what? How do we present our results? Average treatment effect (ATE), school ATE, plots of (school) ATE by xj.... Two questions from our collaborators: • For which groups was the treatment most/least effective? • What is the impact of posited treatment effect modifiers, holding other modifiers constant? 16
  • 19. Subgroup finding as a decision problem The action γ is choosing subgroups, here represented by a recursive partition (tree) • Minimize the posterior expected loss ˆγ = arg min ˜γ∈Γ Eτ [d(τ, ˜γ) + p(˜γ) | Y, x] where p() is a complexity penalty and d() is squared error averaged over a distribution for x. • With ˆγ in hand we can look at the joint posterior distribution of subgroup ATEs Relaxing complexity penalties in the loss function → growing a deeper tree (Hahn et al (2017), Sivaganesan et al (2017)) 17
  • 20. Achievement Norms Lower Achieving Low Norm CATE = 0.032 n = 3253 Lower Achieving High Norm CATE = 0.073 n = 3265 −0.05 0.00 0.05 0.10 0.15 0510152025 High Achieving CATE = 0.016 n = 5023 > 0.67 ≤ 0.67 ≤ 0.53 > 0.53
  • 21. Achievement Norms Lower Achieving Low Norm CATE = 0.032 n = 3253 Lower Achieving High Norm CATE = 0.073 n = 3265 −0.05 0.00 0.05 0.10 0.15 0510152025 High Achieving CATE = 0.016 n = 5023 −0.05 0.05 0.10 0.15 0.20 04812 Diff in Subgroup ATE Density Pr(diff > 0) = 0.93 > 0.67 ≤ 0.67 ≤ 0.53 > 0.53
  • 22. Achievement Norms Lower Achieving High Norm CATE = 0.073 n = 3265 High Achieving CATE = 0.016 n = 5023 Pr(diff > 0) = 0.81 > 0.67 ≤ 0.67 ≤ 0.53 > 0.53 Low Achieving Low Norm CATE = 0.010 n = 1208 Mid Achieving Low Norm CATE = 0.045 n = 2045 Achievement ≤ 0.38 > 0.38 −0.05 0.00 0.05 0.10 0.15 0510152025 −0.05 0.05 0.10 0.15 0.20 048 Diff in Subgroup ATE
  • 23. We fit this model, now what? How do we present our results? ATE, school ATE, plots of school ATE by variables.... Two questions we got from our collaborators: • For which groups was the treatment most/least effective? • What was the impact of posited treatment effect modifiers, holding other modifiers constant? We could generate posterior distributions of individual-level partial effects manually and try to summarize these, or try to approximate τ by a proxy with simple partial effects (i.e. an additive function) 18
  • 24. Counterfactual treatment effect predictions • How do estimated treatment effects change in lower achieving/low norm schools if norms increase, holding constant school minority comp & achievement? • Not a formal causal mediation/ moderation analysis (roughly, we would need “no unmeasured moderators correlated with norms”) Achievement Norms Lower Achieving Low Norm CATE = 0.032 n = 3253 Lower Achieving High Norm CATE = 0.073 n = 3265 High Achieving CATE = 0.016 n = 5023 > 0.67 ≤ 0.67 ≤ 0.53 > 0.53
  • 25. 0 5 10 15 0.0 0.1 0.2 CATE density Increase +0.5 IQR +1 IQR +10% Orig Group Low Norm/Lower Ach High Norm/Lower Ach Original 0.032 (-0.011 0.076) +10% 0.050 (0.005, 0.097) +Half IQR 0.051 (0.005, 0.099) +Full IQR 0.059 (0.009, 0.114) 1 IQR = 0.6 extra problems on worksheet task
  • 26. An Alternative: Interpretable Summaries The goal: Examine the “best” (in a user-defined sense) simple approximation to the “true” τ(x) Given samples of τ, 1. Consider a class of simple/interpretable approximations Γ to τ 2. Make inference on γ = arg min ˜γ∈Γ d(τ, ˜γ) + p(˜γ) for an appropriate distance function d and complexity penalty p(γ) Get draws of γ by solving the optimization for each draw of τ Also get discrepancy metrics, like pseudo-R2 : Cor2 [γ(xi), τ(xi)] (Woody, Carvalho, and Murray (2019) for this approach in predictive models) 19
  • 27. Approximate partial effects of treatment effect modifiers Here we use: • An additive approximation γ(x) • d(τ, γ) = ∑n i=1(τ(xi) − γ(xi))2 • A smoothness penalty p(γ) Don’t average draws to get a point estimate! Treat d(τ, ˜γ) + p(˜γ) = n∑ i=1 (τ(xi) − γ(xi))2 + p(˜γ) as a loss function, and minimize posterior expected loss: ˆγ = arg min ˜γ∈Γ Eτ [d(τ, ˜γ) + p(˜γ) | Y, x] 20
  • 28. −2 −1 0 1 2 −0.2−0.10.00.10.2 Approx Additive Effect School Achievement gamma_j 2.0 2.5 3.0 3.5 −0.15−0.10−0.050.000.050.10 Approx Additive Effect Baseline Mindset Norms gamma_j 0.4 0.5 0.6 0.7 0.8 0.9 1.0 01234 Approximation R^2 N = 2000 Bandwidth = 0.01901 Density (Partial effect of minority composition not shown)
  • 29. −2 −1 0 1 2 −0.2−0.10.00.10.2 Approx Additive Effect School Achievement gamma_j 2.0 2.5 3.0 3.5 −0.15−0.10−0.050.000.050.10 Approx Additive Effect Baseline Mindset Norms gamma_j 0.4 0.5 0.6 0.7 0.8 0.9 1.0 051015 Approximation R^2 N = 2000 Bandwidth = 0.004762 Density (Partial effect of minority composition not shown)
  • 30. Thank you! • Yeager, David S., Paul Hanselman, Gregory M. Walton, Jared S. Murray, Robert Crosnoe, Chandra Muller, Elizabeth Tipton et al. ”A national experiment reveals where a growth mindset improves achievement.” Nature 573, no. 7774 (2019): 364-369. • Hahn, P. Richard , Jared S. Murray, Carlos M. Carvalho: “Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects”, 2017; arXiv:1706.09523. • Woody, Spencer, Carlos M. Carvalho, Jared S. Murray: “Model interpretation through lower-dimensional posterior summarization”, 2019; arXiv:1905.07103. 21