Smooth extensions to BART, with applications to women’s healthcare practice and policy

Smooth extensions to BART,
with applications to women’s healthcare
practice and policy
In collaboration with
Patricia A. Lohr, Abigail R.A. Aiken, Jared S. Murray,
Carlos M. Carvalho, and James G. Scott
Jennifer E. Starling
December 8, 2019
Department of Statistics and Data Sciences
The University of Texas at Austin

Scientific Problem Modeling Background tsbcf EMA Results Conclusion
Scientific Problem: Early Medical Abortion Regimens
Goal: Model the treatment effect of a new early medical abortion
regimen (EMA) over gestation.
• EMA is available through 9 weeks of gestation
• 200 mg oral mifepristone, then 800 micrograms vaginal misoprostol
Control: 24-48 hour wait between administration, in two clinic visits
Treatment: Simultaneous adminstration, no more than 15 minutes apart
Smooth extensions to BART, with applications to women’s healthcare practice and policy 1 / 25 Jennifer E. Starling

British Pregnancy Advisory Service
We are working with British Pregnancy Advisory Service to help clinicians
give better advice to their patients.
• BPAS is the leading abortion provider in the UK, providing a third of
all procedures
• 1 in 3 women in the UK will have an abortion by age 45
• 69 centers in England, Scotland, and Wales
• Treated over 79,000 women last year

What do clinicians think about?
Clinicians advise patients to guide them in their choice between
simultaneous and interval administration.
Known:
• Simultaneous significantly lowers barriers to access.
• Simultaneous is 97% as effective on average
Unknown:
• Does effectiveness difference change for any subgroups of patients?
• Does effectiveness difference change over gestational age? If yes, is
this trajectory different for some groups of patients?

Observational Data from British Pregnancy Advisory Services
Retrospective cohort study using data from British Pregnancy Advisory
Service from May, 2015 to April, 2016.
• 28,895 unique patients, from 4.5 to 9 weeks of gestation
• 85% of patients self-assign to simultaneous (n = 24, 541) vs. interval
(n = 4, 354)
• Binary outcome, where failure is surgical intervention or continued
pregnancy
• Covariates: age, BMI, ethnicity, previous abortions, previous births,
previous Cesarean Sections, and previous miscarriages.

Previous work on simultaneous versus interval EMA
We (Lohr 2018) did previous work to answer questions on general
effectiveness.
• Fast turnaround required so that BPAS could quickly counsel
patients
• Logistic regression on propensity scores and gestational age buckets
• Found that simultaneous was 97% as effective, without a significant
drop over gestation
• Lacked smooth, nuanced treatment over gestation
• No subgroup analysis
The National Institute for Clinical Excellence in the UK is using this work
to rewrite the UK’s abortion standards of care.

EMA Modeling Goals
Our method accomplishes the following.
• Smooth treatment effect estimation over gestation
• Detection of heterogeneous treatment effects, while shrinking towards
homogeneity when appropriate
• Uncertainty quantification for treatment effects
• Regularization to avoid bias of treatment effect estimates

Smoothness over a single covariate
Not Smooth Smooth
0.1 0.5 10.1 0.5 1
3
4
5
6
Target covariate
CATE
Subgroup 1 Subgroup 2 Subgroup 3

BART model and prior
BART is a Bayesian ’sum-of-trees’ model introduced by Chipman,
George, and McCulloch (2010). The BART model statement is:
yi = f (xi ) + i , i ∼ N(0, σ2
)
f (x) =
m
j=1
g(x, Tj , Mj )
BART prior is composed of priors on σ2
, terminal node values µjl , and
tree structures Tj .
The BART model is ﬁt using an iterative MCMC called ’Bayesian
Backﬁtting.’ The µjl and σ updates are easy; priors are conjugate.

Introducing BART with Targeted Smoothing (tsBART)
tsBART (Starling, 2019) extends BART to estimate a response which
evolves smoothly over one covariate.
BART Tree Tj
x1 < 0.9
µ1j x2 < 0.4
µ2j µ3j
no yes
no yes
tsBART Tree Tj
x1 < 0.9
µ1j (t)
x2 < 0.4
µ2j (t) µ3j (t)
no yes
no yes
Leaf/End node parameters
µj = (µ1j (t), µ2j (t), µ3j (t))

tsBART model and prior
tsBART’s prior diﬀers from the BART prior as follows.
• Model estimates function evolving smoothly over a covariate (t).
• Replace the scalar µlj leaf node parameters with functions µlj (t).
• These functions are assigned a Gaussian Process prior
µlj (t)
iid
∼ GP (µ0(t), Cθ(t, t ))
θt = (τ, l)
for centering function m0(t) and covariance function Cθ(t, t ).

Review of Bayesian Causal Forests (bcf)
Models response surface as
E(Yi | xi , Zi = zi ) = µ (xi , ˆπ (xi ))
BART1
+ τ (xi )
BART2
zi
Treatment eﬀect is
τ(xi ) = f (xi , 1) − f (xi , 0)

Causal assumptions for BCF
• SUTVA (Stable Unit Treatment Value Assumption)
• No interference between units, i.e.
• The response of an observation depends only on its treatment, not
on the treatment of other observations around it
• Strong ignorability, consisting of two conditions:
• No unmeasured confounders: Yi (0), Yi (1) ⊥ Zi | ti , Xi
• Enough overlap to estimate treatment eﬀects everywhere in covariate
space: 0 < Pr(Zi = 1 | ti , xi ) < 1
Under these conditions, E [Yi (z) | ti , xi ] = E [Yi | xi , Zi = z], so we can
express the causal estimand as:
τ(ti , xi ) = E [Yi | ti , xi , zi = 1] − E [Yi | ti , xi , zi = 0]

Benefits of BCF
BCF accomplishes a few causal inference goals:
• Separate regularization for heterogeneous treatment effects.
• Less shrinkage of control covariates to get deconfounding.
• More shrinkage of treatment fit towards homogeneity.
• Inclusion of propensity score estimates in control fit to mitigate bias
due to RIC.

Introducing Targeted Smooth Bayesian Causal Forests (tsbcf)
E(y | t, x, z) = µ(t, x, ˆπ(x))
tsBART1
+ τ(t, x)
tBART2
z + ,
iid
∼ N(0, σ2
)
Regularize tsBART1 using default BART depth and split parameters, and
number of trees.
Regularize tsBART2 with fewer trees, stronger depth penalty, and lower
splitting probability.
Uses the same causal assumptions as bcf (SUTVA and strong
ignorability).

Probit model for binary responses
In both the tsBART and tsbcf models, we can ﬁt binary responses using
a probit link function.
Let yi be the latent Gaussian variable, so ci = 1 if yi ≥ 0, else ci = 1.
Then we can write P(ci = 1 | xi ) = Φ(f (xi , ti )), or in latent form,
ci =
1 if yi > 0
0 if yi < 0
yi = f (xi , ti ) + i
f (xi , ti ) ∼ tsBART
i ∼ N(0, 1)

Probit model nuances in the tsbcf case
Counterfactual probabilities of success are
ωi (0) = Φ (µ(ti , xi , ˆπi ))
ωi (1) = Φ (µ(ti , xi , ˆπi ) + τ(ti , xi ))
For observation i, causal estimands on the probability scale are either
• Absolute risk diﬀerence: ∆i = ωi (1) − ωi (0)
• Relative risk: RRi = ωi (1)/ωi (0)
• Odds ratio: ωi (1)/(1−ωi (1))
ωi (0)/(1−ωi (0))

EMA overall relative eﬀectiveness
0.900
0.925
0.950
0.975
1.000
5 6 7 8 9
RelativeEffectiveness
A
q
q
q
q
q q q q q q
===
===
−502
0
50
100
150
200
250
730
5 6 7 8 9NNT
B
Gestational age (Wks)

EMA subgroups
For a slice of patients where gestational age is between 7 and 9 weeks:
age >= 29
ethnicity = White,Not Reported,Other
prevBirth >= 2
age >= 22
prevBirth >= 1
0.95
100%
0.93
37%
0.93
30%
0.93
18%
0.94
12%
0.94
7%
0.96
63%
0.96
37%
0.95
19%
0.96
18%
0.97
26%
yes no
(15) (17) (17) (25) (25) (34)

Posterior relative eﬀectiveness for all terminal nodes
0
20
40
60
0.90 0.92 0.94 0.96 0.98
Relative Effectiveness
Density
Leaf 1
Leaf 2
Leaf 3
Leaf 4
Leaf 5
Leaf 6

EMA individual relative eﬀectiveness
Individual estimated average relative eﬀectiveness:
0.90
0.95
1.00
20 30 40 50
Age (Years)
RelativeEffectiveness
A
0.90
0.95
1.00
0 5 10
Prev. Births
B

EMA sensitivity to tuning smoothness parameters
0.90
0.95
1.00
1.05
5 6 7 8 9
Gestational age (wks)
RelativeEffeciveness
Default Smoother Wigglier

Summary
• The tsbcf method lets us estimate smooth heterogeneous treatment
effects in observational data, with
• Careful regularization to avoid biased treatment effects.
• Smooth estimates are crucial in giving clinicians tools for effectively
advising patients
• We find clinically relevant subgroups to support clinicians in giving
personalized advice
• A tablet app can effectively support clinical practice

Shiny app for clinicians

Paper and software
Preprint for tsbcf at http://arxiv.org/abs/1905.09405/
R packages are available at
• github.com/jestarling/tsbart.
• github.com/jestarling/tsbcf.
The shiny app is available at
jestarling.shinyapps.io/tsbcf-shiny-app.

Selected references
H. A. Chipman, E. I. George, and R. E. McCulloch.
Bart: Bayesian additive regression trees.
The Annals of Applied Statistics, 2010.
P. R. Hahn, J. Murray, and C. M. Carvalho.
Bayesian regression tree models for causal inference:
Regularization, confounding, and heterogeneous eﬀects.
http://dx.doi.org/10.2139/ssrn.3048177, 2017.
J. Starling, J. Murray, C. Carvalho, R. Bukowski, and J. Scott.
Bart with targeted smoothing: An analysis of patient-speciﬁc
stillbirth risk.
Annals of Applied Statistics, 2019.

Why do we care so much about regularization?
Estimating treatment effects using any kind of regression from
observational data is complicated because
• The minimal set of sufficient control variables is rarely known
• The number of available control variables is often large relative to
sample size
Regularizing is a tool to help us reliably estimate treatment effects.

Avoiding biased treatment effect estimates
Naive regularization is not an easy fix - it must be done carefully!
• Under some scenarios (strong confounding, targeted selection) the
prognostic fit µ(x, ˆπ(x)) is approximately a monotone-ish,
easy-to-learn function of the propensity score π(x).
• This function might be hard for regularized models to learn.
• Result is misattributing variability in the control covariates to the
treatment effect, where
• We over-regularize µ(x, ˆπ(x))
• Treatment effect estimates τ(x) are too large

Targeted Selection
Where treatment is assigned based on some function of the control
covariates, and probability of treatment π is generally increasing or
decreasing as a function of this estimate.
Targeted selection occurs when for every x, propensity score function
E(X | x) = π(µ(x), x) is approximately monotone in µ(x).
Common in practice, such as in medical settings when risk factors for
adverse outcomes are known. Clinicians are more likely to assign
treatment to patients with worse risk factors.

Propensity score function is monotone in µ(x)

Regularization Induced Confounding (RIC)
Strong confounding and targeted selection → prognostic fit µ(x) is
approximately a monotone function of propensity scores π(x) alone.
Can happen where π(x) (and so µ(x)) are difficult to learn via regression
trees, because it takes many axis-aligned splits to approximate a ’shelf’
across a diagonal. The BART prior penalizes this kind of complexity.
The model can misattribute the variability in µ(x) to Z, the treatment –
and so underestimates control effect and inflates treatment effect.
RIC can be reliably recreated by simulating data with targeted selection.

RIC Example of axis-aligned splits
Here, µ(x) = 1 above the diagonal, and −1 below. If these two regions
conrrespond to diﬀerent rates of treatment, then regularizing is likely to
overestimate the treatment eﬀect.

Mitigating RIC
Including an estimate of the propensity score π(x) as a control covariate
dramatically reduces RIC.
This makes sure that π(x) is penalized equitably with changes in the
treament variable Z.
From a Bayesian perspective, this is a variable transformation of
covariates, so we don’t need to worry about including uncertainty from
the estimation of π(x).

Smooth extensions to BART, with applications to women’s healthcare practice and policy

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Smooth extensions to BART, with applications to women’s healthcare practice and policy

Ähnlich wie Smooth extensions to BART, with applications to women’s healthcare practice and policy (20)

Mehr von The Statistical and Applied Mathematical Sciences Institute

Mehr von The Statistical and Applied Mathematical Sciences Institute (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Smooth extensions to BART, with applications to women’s healthcare practice and policy