Sponsored content in contextual bandits. Deconfounding targeting not at random

GRAPE
GRAPEGRAPE
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Sponsored content in contextual bandits.
Deconfounding Targeting Not At Random
MIUE 2023
Hubert Drążkowski
GRAPE|FAME, Warsaw University of Technology
September 22, 2023
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Motivational examples
Recommender systems
• Suggest best ads/movies a ∈ {a1, a2, ...aK }
• Users X1, X2, ...., XT
• Design of the study {na1
, na2
, ..., naK
},
P
i nai
= T
• Measured satisfaction {Rt(a1), ...Rt(aK )}T
t=1
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The flow of information
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Inverse Gap Weighting
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Causal interpretation
Figure 1: TCAR
Figure 2: TAR
Figure 3: TNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS
Internal validity
External validity
Propensity score ?
Table 1: Differences and similarities between data sources
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0
(X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0
(xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Deconfounded CATE IGW (D-CATE-IGW)
• Let b = arg maxa b
µa(xt).
π(a|x) =
( 1
K+γm(b
µm
b (x)−b
µm
a (x))
for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
=
(
1
K+γm b
τb,a(x) for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
,
• Each round/epoch deconfound the CATE
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result I
Figure 4: Normed cumulative regret for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result II
Figure 5: True and estimated CATE values for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Contribution
1 Pioneering model for sponsored content in contextual bandits framework
2 Bandits not as experimental studies, but as observational studies
3 Confounding scenario and deconfounding application
4 D-CATE-IGW works
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The beginning ...
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang
(2020). Causal inference methods for combining randomized trials and observational studies: a
review. arXiv preprint arXiv:2011.08047.
Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental
grounding. Advances in neural information processing systems 31.
Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating
heterogeneous treatment effects using machine learning. Proceedings of the national academy of
sciences 116(10), 4156–4165.
Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press.
Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining
experimental and observational studies. In Conference on Causal Learning and Reasoning, pp.
904–926. PMLR.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26
1 von 68

Recomendados

Sequential Monte Carlo algorithms for agent-based models of disease transmission von
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
62 views57 Folien
block-mdp-masters-defense.pdf von
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
63 views75 Folien
Classification von
ClassificationClassification
ClassificationArthur Charpentier
16.7K views199 Folien
Micro to macro passage in traffic models including multi-anticipation effect von
Micro to macro passage in traffic models including multi-anticipation effectMicro to macro passage in traffic models including multi-anticipation effect
Micro to macro passage in traffic models including multi-anticipation effectGuillaume Costeseque
108 views28 Folien
Locality-sensitive hashing for search in metric space von
Locality-sensitive hashing for search in metric space Locality-sensitive hashing for search in metric space
Locality-sensitive hashing for search in metric space Eliezer Silva
104 views47 Folien
Low Complexity Regularization of Inverse Problems von
Low Complexity Regularization of Inverse ProblemsLow Complexity Regularization of Inverse Problems
Low Complexity Regularization of Inverse ProblemsGabriel Peyré
1.3K views56 Folien

Más contenido relacionado

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi... von
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...asahiushio1
119 views22 Folien
ijcai09submodularity.ppt von
ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt42HSQuangMinh
7 views154 Folien
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... von
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...Hui Yang
113 views42 Folien
Sequential Monte Carlo algorithms for agent-based models of disease transmission von
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
38 views52 Folien
ppt0320defenseday von
ppt0320defensedayppt0320defenseday
ppt0320defensedayXi (Shay) Zhang, PhD
542 views48 Folien
main von
mainmain
mainDavid Mateos
194 views75 Folien

Similar a Sponsored content in contextual bandits. Deconfounding targeting not at random(20)

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi... von asahiushio1
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1119 views
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... von Hui Yang
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Hui Yang113 views
Sequential Monte Carlo algorithms for agent-based models of disease transmission von JeremyHeng10
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng1038 views
ESRA2015 course: Latent Class Analysis for Survey Research von Daniel Oberski
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
Daniel Oberski4.7K views
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences von Oana Tifrea-Marciuska
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI von Jack Clark
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark2.9K views
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習 von Deep Learning JP
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
Deep Learning JP661 views
Hierarchical Reinforcement Learning with Option-Critic Architecture von Necip Oguz Serbetci
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
Linear Discriminant Analysis and Its Generalization von 일상 온
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
일상 온3.6K views
Applied machine learning for search engine relevance 3 von Charles Martin
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin1.7K views

Más de GRAPE

ENTIME_GEM___GAP.pdf von
ENTIME_GEM___GAP.pdfENTIME_GEM___GAP.pdf
ENTIME_GEM___GAP.pdfGRAPE
5 views15 Folien
Boston_College Slides.pdf von
Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdfGRAPE
4 views208 Folien
Presentation_Yale.pdf von
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdfGRAPE
9 views207 Folien
Presentation_Columbia.pdf von
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdfGRAPE
4 views187 Folien
Presentation.pdf von
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
4 views175 Folien
Presentation.pdf von
Presentation.pdfPresentation.pdf
Presentation.pdfGRAPE
18 views113 Folien

Más de GRAPE(20)

ENTIME_GEM___GAP.pdf von GRAPE
ENTIME_GEM___GAP.pdfENTIME_GEM___GAP.pdf
ENTIME_GEM___GAP.pdf
GRAPE5 views
Boston_College Slides.pdf von GRAPE
Boston_College Slides.pdfBoston_College Slides.pdf
Boston_College Slides.pdf
GRAPE4 views
Presentation_Yale.pdf von GRAPE
Presentation_Yale.pdfPresentation_Yale.pdf
Presentation_Yale.pdf
GRAPE9 views
Presentation_Columbia.pdf von GRAPE
Presentation_Columbia.pdfPresentation_Columbia.pdf
Presentation_Columbia.pdf
GRAPE4 views
Presentation.pdf von GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE4 views
Presentation.pdf von GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE18 views
Presentation.pdf von GRAPE
Presentation.pdfPresentation.pdf
Presentation.pdf
GRAPE16 views
Slides.pdf von GRAPE
Slides.pdfSlides.pdf
Slides.pdf
GRAPE14 views
Slides.pdf von GRAPE
Slides.pdfSlides.pdf
Slides.pdf
GRAPE16 views
DDKT-Munich.pdf von GRAPE
DDKT-Munich.pdfDDKT-Munich.pdf
DDKT-Munich.pdf
GRAPE7 views
DDKT-Praga.pdf von GRAPE
DDKT-Praga.pdfDDKT-Praga.pdf
DDKT-Praga.pdf
GRAPE11 views
DDKT-Southern.pdf von GRAPE
DDKT-Southern.pdfDDKT-Southern.pdf
DDKT-Southern.pdf
GRAPE25 views
DDKT-SummerWorkshop.pdf von GRAPE
DDKT-SummerWorkshop.pdfDDKT-SummerWorkshop.pdf
DDKT-SummerWorkshop.pdf
GRAPE15 views
DDKT-SAET.pdf von GRAPE
DDKT-SAET.pdfDDKT-SAET.pdf
DDKT-SAET.pdf
GRAPE29 views
The European Unemployment Puzzle: implications from population aging von GRAPE
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE53 views
Matching it up: non-standard work and job satisfaction.pdf von GRAPE
Matching it up: non-standard work and job satisfaction.pdfMatching it up: non-standard work and job satisfaction.pdf
Matching it up: non-standard work and job satisfaction.pdf
GRAPE20 views
Investment in human capital: an optimal taxation approach von GRAPE
Investment in human capital: an optimal taxation approachInvestment in human capital: an optimal taxation approach
Investment in human capital: an optimal taxation approach
GRAPE22 views
slides_cef.pdf von GRAPE
slides_cef.pdfslides_cef.pdf
slides_cef.pdf
GRAPE22 views
Fertility, contraceptives and gender inequality von GRAPE
Fertility, contraceptives and gender inequalityFertility, contraceptives and gender inequality
Fertility, contraceptives and gender inequality
GRAPE24 views
The European Unemployment Puzzle: implications from population aging von GRAPE
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE50 views

Último

GroupPresentation_MicroEconomics von
GroupPresentation_MicroEconomicsGroupPresentation_MicroEconomics
GroupPresentation_MicroEconomicsBethanyAline
34 views27 Folien
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf von
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf1_updated_Axis India Manufacturing Fund-NFO One pager.pdf
1_updated_Axis India Manufacturing Fund-NFO One pager.pdfmultigainfinancial
29 views3 Folien
01-SamcoMF DAAF_IDBIBank_page-0001.pdf von
01-SamcoMF DAAF_IDBIBank_page-0001.pdf01-SamcoMF DAAF_IDBIBank_page-0001.pdf
01-SamcoMF DAAF_IDBIBank_page-0001.pdfmultigainfinancial
7 views2 Folien
Blockchain, AI & Metaverse for Football Clubs - 2023.pdf von
Blockchain, AI & Metaverse for Football Clubs - 2023.pdfBlockchain, AI & Metaverse for Football Clubs - 2023.pdf
Blockchain, AI & Metaverse for Football Clubs - 2023.pdfkelroyjames1
12 views24 Folien
Topic 37 copy.pptx von
Topic 37 copy.pptxTopic 37 copy.pptx
Topic 37 copy.pptxsaleh176
5 views9 Folien
InitVerse :Blockchain technology trends in 2024.pdf von
InitVerse :Blockchain technology trends in 2024.pdfInitVerse :Blockchain technology trends in 2024.pdf
InitVerse :Blockchain technology trends in 2024.pdfInitVerse Blockchain
23 views9 Folien

Último(20)

GroupPresentation_MicroEconomics von BethanyAline
GroupPresentation_MicroEconomicsGroupPresentation_MicroEconomics
GroupPresentation_MicroEconomics
BethanyAline34 views
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf von multigainfinancial
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf1_updated_Axis India Manufacturing Fund-NFO One pager.pdf
1_updated_Axis India Manufacturing Fund-NFO One pager.pdf
Blockchain, AI & Metaverse for Football Clubs - 2023.pdf von kelroyjames1
Blockchain, AI & Metaverse for Football Clubs - 2023.pdfBlockchain, AI & Metaverse for Football Clubs - 2023.pdf
Blockchain, AI & Metaverse for Football Clubs - 2023.pdf
kelroyjames112 views
Topic 37 copy.pptx von saleh176
Topic 37 copy.pptxTopic 37 copy.pptx
Topic 37 copy.pptx
saleh1765 views
Stock Market Brief Deck 1129.pdf von Michael Silva
Stock Market Brief Deck 1129.pdfStock Market Brief Deck 1129.pdf
Stock Market Brief Deck 1129.pdf
Michael Silva56 views
Debt Watch | ICICI Prudential Mutual Fund von iciciprumf
Debt Watch | ICICI Prudential Mutual FundDebt Watch | ICICI Prudential Mutual Fund
Debt Watch | ICICI Prudential Mutual Fund
iciciprumf8 views
Stock Market Brief Deck 125.pdf von Michael Silva
Stock Market Brief Deck 125.pdfStock Market Brief Deck 125.pdf
Stock Market Brief Deck 125.pdf
Michael Silva36 views
Macro Economics- Group Presentation for Germany von BethanyAline
Macro Economics- Group Presentation for Germany Macro Economics- Group Presentation for Germany
Macro Economics- Group Presentation for Germany
BethanyAline39 views
The implementation of government subsidies and tax incentives to enhance the ... von Fardeen Ahmed
The implementation of government subsidies and tax incentives to enhance the ...The implementation of government subsidies and tax incentives to enhance the ...
The implementation of government subsidies and tax incentives to enhance the ...
Fardeen Ahmed5 views
Embracing the eFarming Challenge.pdf von ramadhan04116
Embracing the eFarming Challenge.pdfEmbracing the eFarming Challenge.pdf
Embracing the eFarming Challenge.pdf
ramadhan041169 views
Teaching Third Generation Islamic Economics von Asad Zaman
Teaching Third Generation Islamic EconomicsTeaching Third Generation Islamic Economics
Teaching Third Generation Islamic Economics
Asad Zaman247 views
Amalgamation, Absorption, External Reconstruction and Internal Reconstruction... von Dr.G. KARTHIKEYAN
Amalgamation, Absorption, External Reconstruction and Internal Reconstruction...Amalgamation, Absorption, External Reconstruction and Internal Reconstruction...
Amalgamation, Absorption, External Reconstruction and Internal Reconstruction...
Product Listing Optimization.pdf von AllenSingson
Product Listing Optimization.pdfProduct Listing Optimization.pdf
Product Listing Optimization.pdf
AllenSingson21 views
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa... von aljazeeramasoom
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...
List of Qataris Sanctioned by the U.S. Treasury Department for Links to Al-Qa...
aljazeeramasoom6 views

Sponsored content in contextual bandits. Deconfounding targeting not at random

  • 1. Authoritarian Sponsor Deconfounding Experiment Conclusions References Sponsored content in contextual bandits. Deconfounding Targeting Not At Random MIUE 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology September 22, 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
  • 2. Authoritarian Sponsor Deconfounding Experiment Conclusions References Motivational examples Recommender systems • Suggest best ads/movies a ∈ {a1, a2, ...aK } • Users X1, X2, ...., XT • Design of the study {na1 , na2 , ..., naK }, P i nai = T • Measured satisfaction {Rt(a1), ...Rt(aK )}T t=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
  • 3. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 4. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 5. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 6. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 7. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 8. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 9. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 10. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 11. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 12. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 13. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 14. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 15. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 16. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 17. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 18. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 19. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 20. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 21. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 22. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 23. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 24. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 25. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 26. Authoritarian Sponsor Deconfounding Experiment Conclusions References The flow of information Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
  • 27. Authoritarian Sponsor Deconfounding Experiment Conclusions References Inverse Gap Weighting Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
  • 28. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
  • 29. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 30. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 31. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 32. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 33. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 34. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 35. Authoritarian Sponsor Deconfounding Experiment Conclusions References Causal interpretation Figure 1: TCAR Figure 2: TAR Figure 3: TNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
  • 36. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
  • 37. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Internal validity External validity Propensity score ? Table 1: Differences and similarities between data sources Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
  • 38. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 39. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 40. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 41. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 42. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 43. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 44. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 45. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 46. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 47. Authoritarian Sponsor Deconfounding Experiment Conclusions References Deconfounded CATE IGW (D-CATE-IGW) • Let b = arg maxa b µa(xt). π(a|x) = ( 1 K+γm(b µm b (x)−b µm a (x)) for a ̸= b 1 − P c̸=b π(c|x) for a = b = ( 1 K+γm b τb,a(x) for a ̸= b 1 − P c̸=b π(c|x) for a = b , • Each round/epoch deconfound the CATE Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
  • 48. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
  • 49. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 50. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 51. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 52. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 53. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result I Figure 4: Normed cumulative regret for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
  • 54. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result II Figure 5: True and estimated CATE values for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
  • 55. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
  • 56. Authoritarian Sponsor Deconfounding Experiment Conclusions References Contribution 1 Pioneering model for sponsored content in contextual bandits framework 2 Bandits not as experimental studies, but as observational studies 3 Confounding scenario and deconfounding application 4 D-CATE-IGW works Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
  • 57. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 58. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 59. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 60. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 61. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 62. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 63. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 64. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 65. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 66. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 67. Authoritarian Sponsor Deconfounding Experiment Conclusions References The beginning ... Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
  • 68. Authoritarian Sponsor Deconfounding Experiment Conclusions References Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang (2020). Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047. Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental grounding. Advances in neural information processing systems 31. Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116(10), 4156–4165. Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press. Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining experimental and observational studies. In Conference on Causal Learning and Reasoning, pp. 904–926. PMLR. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26