SlideShare ist ein Scribd-Unternehmen logo
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Sponsored content in contextual bandits.
Deconfounding Targeting Not At Random
MIUE 2023
Hubert Drążkowski
GRAPE|FAME, Warsaw University of Technology
September 22, 2023
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Motivational examples
Recommender systems
• Suggest best ads/movies a ∈ {a1, a2, ...aK }
• Users X1, X2, ...., XT
• Design of the study {na1
, na2
, ..., naK
},
P
i nai
= T
• Measured satisfaction {Rt(a1), ...Rt(aK )}T
t=1
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The framework
Exploration vs exploitation
• Allocating limited resources under uncertainty
• Sequential manner
• Partial feedback (bandit feedback)
• Adaptive (non-iid) data
• Maximizing cumulative gain
• Current actions do not change future environment
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The elements of the bandit model
(see Lattimore and Szepesvári (2020))
• Context Xt ∈ X
• Xt ∼ DX
• Actions At ∈ A = {a1, ..., K}
• At ∼ πt (a|x)
• Policy π ∈ Π
• π = {πt }T
t=1
• πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K
:
P
a∈A qa = 1}
• Rewards Rt ∈ R+
• (R(a1), R(a2), ..., R(ak )) and Rt =
PK
k=1 1(At = ak )R(ak )
• Rt ∼ DR|A,X
• History Ht ∈ Ht
• Ht = σ {(Xs , As , Rs )}t
s=1

Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Details
• In short (Xt, At, Rt) ∼ D(πt)
• We know πt(a|x) (propensity score)
• We don’t know DX,⃗
R
• We have 1(At = a) ⊥
⊥ R(a)|Xt
• We want to maximize with π
ED(π)
 T
X
t=1
Rt(At)
#
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The flow of information
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Inverse Gap Weighting
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Authoritarian Sponsor model
• The act of sponsoring
• Recommender system - marketing campaigns, testing products
• Healthcare - funding experiments, lobbying doctors
• The sponsor (€, 
H) intervenes in an authoritarian manner
At = StÃt + (1 − St)Āt,
St ∈ {0, 1}, St ∼ €(·|X)
Āt ∼ πt(·|X), Ãt ∼ 
H
t
(·|X)
H
 t
(a|x) = €
t
(1|x) 
H
t
(a|x) + €
t
(0|x)πt(a|x).
• The lack of knowledge about sponsor’s policy (€, 
H)
• Not sharing technology or strategy
• Lost in human to algorithm translation
• Hard to model process like auctions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Targeting mechanisms
Introducing an unobserved confounder Z
1 Targeting Completely At Random (TCAR):
• S(X) = S, 
H(a|X, R, Z) = 
H(a)
• kind of like MCAR
2 Targeting At Random (TAR)
• S(X) = S(X), 
H(a|X, R, Z) = 
H(a|X)
• kind of like MAR
3 Targeting Not At Random (TNAR)
• 
H(a|X, R, Z) ⇒ R(a) ̸⊥
⊥ A|X, S = 1.
• kind of like MNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Causal interpretation
Figure 1: TCAR
Figure 2: TAR
Figure 3: TNAR
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS
Internal validity
External validity
Propensity score ?
Table 1: Differences and similarities between data sources
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Data fusion
(see Colnet et al. (2020))
RCT OS Learner Sponsor
Internal validity
External validity ∼ ∼
Propensity score ? ?
Table 2: Differences and similarities between data sources
• Unsolved challenge: sampling in interaction!
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
CATE
• CATE
τa1,a2
(x) = EDR|A,X=x
[R(a1) − R(a2)] and b
τa1,a2
(x) = b
µa1
(x) − b
µa2
(x)
• Assumptions
• SUTVA: Rt =
P
a∈A 1(At = a)Rt (a),
• Ignorability: 1(At = a) ⊥
⊥ R(a)|Xt , St = 0
• Ignorability of the study participation: Rt (a) ⊥
⊥ St |Xt
• TNAR: R(a) ̸⊥
⊥ A|X, S = 1.
• Biased CATE on sponsor sample
ρa1,a2
(x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1].
• Bias measurement
ηa1,a2
(x) = τa1,a2
(x) − ρa1,a2
(x)
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0
(X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0
(xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Two step deconfounding
(see Kallus et al. (2018)), A = {a0, a1}
1 On the observational sample data use a metalearner to obtain b
ρa1,a0
(X).
2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0
(x). Where
qt(X, a0) =
1(A = a1)
πt(a1|X)
−
1(A = a0)
πt(a0|X)
.
3 Using qt(X, a0) apply the definition of ηa1,a0
(x) to adjust the b
ρ term by solving an optimization
problem on the unconfounded sample:
b
ηa1,a0 (X) = arg min
η
X
t:St =0
(qt(xt, a0)rt − b
ρa1,a0 (xt) − η(xt))
2
.
4 Finally b
τa1,a0 (x) = b
ρa1,a0 (x) + b
ηa1,a0 (x).
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Deconfounded CATE IGW (D-CATE-IGW)
• Let b = arg maxa b
µa(xt).
π(a|x) =
( 1
K+γm(b
µm
b (x)−b
µm
a (x))
for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
=
(
1
K+γm b
τb,a(x) for a ̸= b
1 −
P
c̸=b π(c|x) for a = b
,
• Each round/epoch deconfound the CATE
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Setup I
• St ∼ Bern(ρ)
• No overlap scenario
Xt|St = 0 ∼Unif([−1, 1]),
Ut|St = 0 ∼N(0, 1).
• Full overlap scenario
Xt|St = 0 ∼ N(0, 1),
Ut|St = 0 ∼ N(0, 1).

Xt
Ut

| {At, St = 1} ∼ N

0
0

,

1 (2At − 1)σA
(2At − 1)σA 1

,
• σA ∈ {0.6, 0.9}
• ρ ∈ {0.3, 0.6}
Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2
t + 3/4AtX2
t + 2Ut + 1/2ϵt,
where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2
t + 2Xt + 1.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result I
Figure 4: Normed cumulative regret for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Result II
Figure 5: True and estimated CATE values for different scenarios
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
1 Authoritarian Sponsor
2 Deconfounding
3 Experiment
4 Conclusions
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Contribution
1 Pioneering model for sponsored content in contextual bandits framework
2 Bandits not as experimental studies, but as observational studies
3 Confounding scenario and deconfounding application
4 D-CATE-IGW works
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Future research
• Theoretical
• Mathematically model the complicated sampling. Especially the flow of information
• Consistency proof of CATE estimator in this scenario
• High probability regret bounds on D-CATE-IGW P(REWARD(π)  BOUND(δ))  1 − δ
• Empirical
• More metalearners (X-learner, R-learner) (see Künzel et al. (2019))
• Other deconfounding methods (see Wu and Yang (2022))
• A more comprehensive empirical study
Expansion
• Policy evaluation
V (π) = EX EA∼π(·|X)ER|A,X [R]
b
Vt(π) on {(Xs, As, Rs)}t
s=1 ∼ D(H
 )
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
The beginning ...
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
Authoritarian Sponsor Deconfounding Experiment Conclusions References
Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang
(2020). Causal inference methods for combining randomized trials and observational studies: a
review. arXiv preprint arXiv:2011.08047.
Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental
grounding. Advances in neural information processing systems 31.
Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating
heterogeneous treatment effects using machine learning. Proceedings of the national academy of
sciences 116(10), 4156–4165.
Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press.
Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining
experimental and observational studies. In Conference on Causal Learning and Reasoning, pp.
904–926. PMLR.
Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology
Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26

Weitere ähnliche Inhalte

Ähnlich wie Sponsored content in contextual bandits. Deconfounding targeting not at random

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
asahiushio1
 
ijcai09submodularity.ppt
ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt
42HSQuangMinh
 
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Hui Yang
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
JeremyHeng10
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
Xi (Shay) Zhang, PhD
 
main
mainmain
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
ssuser2be88c
 
E5-roughsets unit-V.pdf
E5-roughsets unit-V.pdfE5-roughsets unit-V.pdf
E5-roughsets unit-V.pdf
Ramya Nellutla
 
ESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
Daniel Oberski
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
University of Glasgow
 
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Oana Tifrea-Marciuska
 
Damiano Pasetto
Damiano PasettoDamiano Pasetto
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
Deep Learning JP
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
Necip Oguz Serbetci
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert
 
Frequency14.pptx
Frequency14.pptxFrequency14.pptx
Frequency14.pptx
MewadaHiren
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
일상 온
 

Ähnlich wie Sponsored content in contextual bandits. Deconfounding targeting not at random (20)

2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
2017-03, ICASSP, Projection-based Dual Averaging for Stochastic Sparse Optimi...
 
ijcai09submodularity.ppt
ijcai09submodularity.pptijcai09submodularity.ppt
ijcai09submodularity.ppt
 
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
 Physics-driven Spatiotemporal Regularization for High-dimensional Predictive... Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
Physics-driven Spatiotemporal Regularization for High-dimensional Predictive...
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
main
mainmain
main
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Sketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignmentSketching and locality sensitive hashing for alignment
Sketching and locality sensitive hashing for alignment
 
E5-roughsets unit-V.pdf
E5-roughsets unit-V.pdfE5-roughsets unit-V.pdf
E5-roughsets unit-V.pdf
 
ESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey ResearchESRA2015 course: Latent Class Analysis for Survey Research
ESRA2015 course: Latent Class Analysis for Survey Research
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
 
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
 
Damiano Pasetto
Damiano PasettoDamiano Pasetto
Damiano Pasetto
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
 
Hierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic ArchitectureHierarchical Reinforcement Learning with Option-Critic Architecture
Hierarchical Reinforcement Learning with Option-Critic Architecture
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
Frequency14.pptx
Frequency14.pptxFrequency14.pptx
Frequency14.pptx
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
 
Linear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its GeneralizationLinear Discriminant Analysis and Its Generalization
Linear Discriminant Analysis and Its Generalization
 

Mehr von GRAPE

Introducing Gender Board Diversity Dataset
Introducing Gender Board Diversity DatasetIntroducing Gender Board Diversity Dataset
Introducing Gender Board Diversity Dataset
GRAPE
 
EARHART concluding meeting at NHH in Bergen
EARHART concluding meeting at NHH in BergenEARHART concluding meeting at NHH in Bergen
EARHART concluding meeting at NHH in Bergen
GRAPE
 
ASSA 2024 slides on mechanism design approach
ASSA 2024 slides on mechanism design approachASSA 2024 slides on mechanism design approach
ASSA 2024 slides on mechanism design approach
GRAPE
 
VSET presentation slides presented on June 13 online
VSET presentation slides presented on June 13 onlineVSET presentation slides presented on June 13 online
VSET presentation slides presented on June 13 online
GRAPE
 
Presentation from the MGTA event in Rome
Presentation from the MGTA event in RomePresentation from the MGTA event in Rome
Presentation from the MGTA event in Rome
GRAPE
 
Presentation from the University of Bristol
Presentation from the University of BristolPresentation from the University of Bristol
Presentation from the University of Bristol
GRAPE
 
Slides presented at the University of Iowa
Slides presented at the University of IowaSlides presented at the University of Iowa
Slides presented at the University of Iowa
GRAPE
 
Slides presented at a seminar at Bonn University
Slides presented at a seminar at Bonn UniversitySlides presented at a seminar at Bonn University
Slides presented at a seminar at Bonn University
GRAPE
 
What can a company do to inspire diversity and inclusive worklife
What can a company do to inspire diversity and inclusive worklifeWhat can a company do to inspire diversity and inclusive worklife
What can a company do to inspire diversity and inclusive worklife
GRAPE
 
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder openhamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
GRAPE
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
GRAPE
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
GRAPE
 
Revisiting gender board diversity and firm performance
Revisiting gender board diversity and firm performanceRevisiting gender board diversity and firm performance
Revisiting gender board diversity and firm performance
GRAPE
 
Gender board diversity and firm performance
Gender board diversity and firm performanceGender board diversity and firm performance
Gender board diversity and firm performance
GRAPE
 
Gender board diversity and firm performance: evidence from European data
Gender board diversity and firm performance: evidence from European dataGender board diversity and firm performance: evidence from European data
Gender board diversity and firm performance: evidence from European data
GRAPE
 
Demographic transition and the rise of wealth inequality
Demographic transition and the rise of wealth inequalityDemographic transition and the rise of wealth inequality
Demographic transition and the rise of wealth inequality
GRAPE
 
(Gender) tone at the top: the effect of board diversity on gender inequality
(Gender) tone at the top: the effect of board diversity on gender inequality(Gender) tone at the top: the effect of board diversity on gender inequality
(Gender) tone at the top: the effect of board diversity on gender inequality
GRAPE
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eye
GRAPE
 
Wage Inequality and women's self-employment
Wage Inequality and women's self-employmentWage Inequality and women's self-employment
Wage Inequality and women's self-employment
GRAPE
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
GRAPE
 

Mehr von GRAPE (20)

Introducing Gender Board Diversity Dataset
Introducing Gender Board Diversity DatasetIntroducing Gender Board Diversity Dataset
Introducing Gender Board Diversity Dataset
 
EARHART concluding meeting at NHH in Bergen
EARHART concluding meeting at NHH in BergenEARHART concluding meeting at NHH in Bergen
EARHART concluding meeting at NHH in Bergen
 
ASSA 2024 slides on mechanism design approach
ASSA 2024 slides on mechanism design approachASSA 2024 slides on mechanism design approach
ASSA 2024 slides on mechanism design approach
 
VSET presentation slides presented on June 13 online
VSET presentation slides presented on June 13 onlineVSET presentation slides presented on June 13 online
VSET presentation slides presented on June 13 online
 
Presentation from the MGTA event in Rome
Presentation from the MGTA event in RomePresentation from the MGTA event in Rome
Presentation from the MGTA event in Rome
 
Presentation from the University of Bristol
Presentation from the University of BristolPresentation from the University of Bristol
Presentation from the University of Bristol
 
Slides presented at the University of Iowa
Slides presented at the University of IowaSlides presented at the University of Iowa
Slides presented at the University of Iowa
 
Slides presented at a seminar at Bonn University
Slides presented at a seminar at Bonn UniversitySlides presented at a seminar at Bonn University
Slides presented at a seminar at Bonn University
 
What can a company do to inspire diversity and inclusive worklife
What can a company do to inspire diversity and inclusive worklifeWhat can a company do to inspire diversity and inclusive worklife
What can a company do to inspire diversity and inclusive worklife
 
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder openhamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
hamber of Commerce Bergen LeadHERship: Careers – leave the ladder open
 
Seminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership NetworksSeminar: Gender Board Diversity through Ownership Networks
Seminar: Gender Board Diversity through Ownership Networks
 
The European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population agingThe European Unemployment Puzzle: implications from population aging
The European Unemployment Puzzle: implications from population aging
 
Revisiting gender board diversity and firm performance
Revisiting gender board diversity and firm performanceRevisiting gender board diversity and firm performance
Revisiting gender board diversity and firm performance
 
Gender board diversity and firm performance
Gender board diversity and firm performanceGender board diversity and firm performance
Gender board diversity and firm performance
 
Gender board diversity and firm performance: evidence from European data
Gender board diversity and firm performance: evidence from European dataGender board diversity and firm performance: evidence from European data
Gender board diversity and firm performance: evidence from European data
 
Demographic transition and the rise of wealth inequality
Demographic transition and the rise of wealth inequalityDemographic transition and the rise of wealth inequality
Demographic transition and the rise of wealth inequality
 
(Gender) tone at the top: the effect of board diversity on gender inequality
(Gender) tone at the top: the effect of board diversity on gender inequality(Gender) tone at the top: the effect of board diversity on gender inequality
(Gender) tone at the top: the effect of board diversity on gender inequality
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eye
 
Wage Inequality and women's self-employment
Wage Inequality and women's self-employmentWage Inequality and women's self-employment
Wage Inequality and women's self-employment
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
 

Kürzlich hochgeladen

How do I cash out hamster kombat tokens?
How do I cash out hamster kombat tokens?How do I cash out hamster kombat tokens?
How do I cash out hamster kombat tokens?
CRYPTO SPACE 🪙
 
Sell hamster coins today - top strategies you never knew.
Sell hamster coins today - top strategies you never knew.Sell hamster coins today - top strategies you never knew.
Sell hamster coins today - top strategies you never knew.
CRYPTO SPACE 🪙
 
How can I sell my Hamster Kombat account?
How can I sell my Hamster Kombat account?How can I sell my Hamster Kombat account?
How can I sell my Hamster Kombat account?
CRYPTO SPACE 🪙
 
how to sell hamster kombat on bybit crypto exchange.
how to sell hamster kombat on bybit crypto exchange.how to sell hamster kombat on bybit crypto exchange.
how to sell hamster kombat on bybit crypto exchange.
CRYPTO SPACE 🪙
 
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
CRYPTO SPACE 🪙
 
How can i sell hamster kombat token on Binance exchange!
How can i sell hamster kombat token on Binance exchange!How can i sell hamster kombat token on Binance exchange!
How can i sell hamster kombat token on Binance exchange!
CRYPTO SPACE 🪙
 
Monthly Market Risk Update: July 2024 [SlideShare]
Monthly Market Risk Update: July 2024 [SlideShare]Monthly Market Risk Update: July 2024 [SlideShare]
Monthly Market Risk Update: July 2024 [SlideShare]
Commonwealth
 
2024 Q2 Crypto Industry Report | CoinGecko
2024 Q2 Crypto Industry Report | CoinGecko2024 Q2 Crypto Industry Report | CoinGecko
2024 Q2 Crypto Industry Report | CoinGecko
CoinGecko
 
What website can I sell my hamster kombat tokens.
What website can I sell my hamster kombat tokens.What website can I sell my hamster kombat tokens.
What website can I sell my hamster kombat tokens.
CRYPTO SPACE 🪙
 
is hamster kombat still worth mining (HMSTER - update)
is hamster kombat still worth mining (HMSTER - update)is hamster kombat still worth mining (HMSTER - update)
is hamster kombat still worth mining (HMSTER - update)
CRYPTO SPACE 🪙
 
Hamster kombat - A simple and effective method to withdraw mined tokens.
Hamster kombat - A simple and effective method to withdraw mined tokens.Hamster kombat - A simple and effective method to withdraw mined tokens.
Hamster kombat - A simple and effective method to withdraw mined tokens.
CRYPTO SPACE 🪙
 
how to sell hamster kombat tokens any where in the world?
how to sell hamster kombat tokens any where in the world?how to sell hamster kombat tokens any where in the world?
how to sell hamster kombat tokens any where in the world?
CRYPTO SPACE 🪙
 
How can I withdraw my hamster tokens to real money in India.
How can I withdraw my hamster tokens to real money in India.How can I withdraw my hamster tokens to real money in India.
How can I withdraw my hamster tokens to real money in India.
CRYPTO SPACE 🪙
 
Economic Risk Factor Update: July 2024 [SlideShare]
Economic Risk Factor Update: July 2024 [SlideShare]Economic Risk Factor Update: July 2024 [SlideShare]
Economic Risk Factor Update: July 2024 [SlideShare]
Commonwealth
 
https://strategic-res.com/investors/presentation/
https://strategic-res.com/investors/presentation/https://strategic-res.com/investors/presentation/
https://strategic-res.com/investors/presentation/
Adnet Communications
 
Tapswap - A simple and effective way to withdraw mined tokens
Tapswap - A simple and effective way to withdraw mined tokensTapswap - A simple and effective way to withdraw mined tokens
Tapswap - A simple and effective way to withdraw mined tokens
CRYPTO SPACE 🪙
 
Monthly Economic Monitoring of Ukraine No.234, July 2024
Monthly Economic Monitoring of Ukraine No.234, July 2024Monthly Economic Monitoring of Ukraine No.234, July 2024
how to make money from hamster kombat: beginners guide.
how to make money from hamster kombat: beginners guide.how to make money from hamster kombat: beginners guide.
how to make money from hamster kombat: beginners guide.
CRYPTO SPACE 🪙
 
Can I sell my hamster kombat tokens Now! (latest update - 2024)
Can I sell my hamster kombat tokens Now! (latest update - 2024)Can I sell my hamster kombat tokens Now! (latest update - 2024)
Can I sell my hamster kombat tokens Now! (latest update - 2024)
CRYPTO SPACE 🪙
 
how much can I sell my Hamster Kombat coins.
how much can I sell my Hamster Kombat coins.how much can I sell my Hamster Kombat coins.
how much can I sell my Hamster Kombat coins.
CRYPTO SPACE 🪙
 

Kürzlich hochgeladen (20)

How do I cash out hamster kombat tokens?
How do I cash out hamster kombat tokens?How do I cash out hamster kombat tokens?
How do I cash out hamster kombat tokens?
 
Sell hamster coins today - top strategies you never knew.
Sell hamster coins today - top strategies you never knew.Sell hamster coins today - top strategies you never knew.
Sell hamster coins today - top strategies you never knew.
 
How can I sell my Hamster Kombat account?
How can I sell my Hamster Kombat account?How can I sell my Hamster Kombat account?
How can I sell my Hamster Kombat account?
 
how to sell hamster kombat on bybit crypto exchange.
how to sell hamster kombat on bybit crypto exchange.how to sell hamster kombat on bybit crypto exchange.
how to sell hamster kombat on bybit crypto exchange.
 
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
how to increase profit as an hamster Miner - earn over 100,000,000+ token's p...
 
How can i sell hamster kombat token on Binance exchange!
How can i sell hamster kombat token on Binance exchange!How can i sell hamster kombat token on Binance exchange!
How can i sell hamster kombat token on Binance exchange!
 
Monthly Market Risk Update: July 2024 [SlideShare]
Monthly Market Risk Update: July 2024 [SlideShare]Monthly Market Risk Update: July 2024 [SlideShare]
Monthly Market Risk Update: July 2024 [SlideShare]
 
2024 Q2 Crypto Industry Report | CoinGecko
2024 Q2 Crypto Industry Report | CoinGecko2024 Q2 Crypto Industry Report | CoinGecko
2024 Q2 Crypto Industry Report | CoinGecko
 
What website can I sell my hamster kombat tokens.
What website can I sell my hamster kombat tokens.What website can I sell my hamster kombat tokens.
What website can I sell my hamster kombat tokens.
 
is hamster kombat still worth mining (HMSTER - update)
is hamster kombat still worth mining (HMSTER - update)is hamster kombat still worth mining (HMSTER - update)
is hamster kombat still worth mining (HMSTER - update)
 
Hamster kombat - A simple and effective method to withdraw mined tokens.
Hamster kombat - A simple and effective method to withdraw mined tokens.Hamster kombat - A simple and effective method to withdraw mined tokens.
Hamster kombat - A simple and effective method to withdraw mined tokens.
 
how to sell hamster kombat tokens any where in the world?
how to sell hamster kombat tokens any where in the world?how to sell hamster kombat tokens any where in the world?
how to sell hamster kombat tokens any where in the world?
 
How can I withdraw my hamster tokens to real money in India.
How can I withdraw my hamster tokens to real money in India.How can I withdraw my hamster tokens to real money in India.
How can I withdraw my hamster tokens to real money in India.
 
Economic Risk Factor Update: July 2024 [SlideShare]
Economic Risk Factor Update: July 2024 [SlideShare]Economic Risk Factor Update: July 2024 [SlideShare]
Economic Risk Factor Update: July 2024 [SlideShare]
 
https://strategic-res.com/investors/presentation/
https://strategic-res.com/investors/presentation/https://strategic-res.com/investors/presentation/
https://strategic-res.com/investors/presentation/
 
Tapswap - A simple and effective way to withdraw mined tokens
Tapswap - A simple and effective way to withdraw mined tokensTapswap - A simple and effective way to withdraw mined tokens
Tapswap - A simple and effective way to withdraw mined tokens
 
Monthly Economic Monitoring of Ukraine No.234, July 2024
Monthly Economic Monitoring of Ukraine No.234, July 2024Monthly Economic Monitoring of Ukraine No.234, July 2024
Monthly Economic Monitoring of Ukraine No.234, July 2024
 
how to make money from hamster kombat: beginners guide.
how to make money from hamster kombat: beginners guide.how to make money from hamster kombat: beginners guide.
how to make money from hamster kombat: beginners guide.
 
Can I sell my hamster kombat tokens Now! (latest update - 2024)
Can I sell my hamster kombat tokens Now! (latest update - 2024)Can I sell my hamster kombat tokens Now! (latest update - 2024)
Can I sell my hamster kombat tokens Now! (latest update - 2024)
 
how much can I sell my Hamster Kombat coins.
how much can I sell my Hamster Kombat coins.how much can I sell my Hamster Kombat coins.
how much can I sell my Hamster Kombat coins.
 

Sponsored content in contextual bandits. Deconfounding targeting not at random

  • 1. Authoritarian Sponsor Deconfounding Experiment Conclusions References Sponsored content in contextual bandits. Deconfounding Targeting Not At Random MIUE 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology September 22, 2023 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 1 / 26
  • 2. Authoritarian Sponsor Deconfounding Experiment Conclusions References Motivational examples Recommender systems • Suggest best ads/movies a ∈ {a1, a2, ...aK } • Users X1, X2, ...., XT • Design of the study {na1 , na2 , ..., naK }, P i nai = T • Measured satisfaction {Rt(a1), ...Rt(aK )}T t=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 2 / 26
  • 3. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 4. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 5. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 6. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 7. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 8. Authoritarian Sponsor Deconfounding Experiment Conclusions References The framework Exploration vs exploitation • Allocating limited resources under uncertainty • Sequential manner • Partial feedback (bandit feedback) • Adaptive (non-iid) data • Maximizing cumulative gain • Current actions do not change future environment Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 3 / 26
  • 9. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 10. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 11. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 12. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 13. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 14. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 15. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 16. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 17. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 18. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 19. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 20. Authoritarian Sponsor Deconfounding Experiment Conclusions References The elements of the bandit model (see Lattimore and Szepesvári (2020)) • Context Xt ∈ X • Xt ∼ DX • Actions At ∈ A = {a1, ..., K} • At ∼ πt (a|x) • Policy π ∈ Π • π = {πt }T t=1 • πt : X 7→ P(A), where P(A) := {q ∈ [0, 1]K : P a∈A qa = 1} • Rewards Rt ∈ R+ • (R(a1), R(a2), ..., R(ak )) and Rt = PK k=1 1(At = ak )R(ak ) • Rt ∼ DR|A,X • History Ht ∈ Ht • Ht = σ {(Xs , As , Rs )}t s=1 Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 4 / 26
  • 21. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 22. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 23. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 24. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 25. Authoritarian Sponsor Deconfounding Experiment Conclusions References Details • In short (Xt, At, Rt) ∼ D(πt) • We know πt(a|x) (propensity score) • We don’t know DX,⃗ R • We have 1(At = a) ⊥ ⊥ R(a)|Xt • We want to maximize with π ED(π) T X t=1 Rt(At) # Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 5 / 26
  • 26. Authoritarian Sponsor Deconfounding Experiment Conclusions References The flow of information Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 6 / 26
  • 27. Authoritarian Sponsor Deconfounding Experiment Conclusions References Inverse Gap Weighting Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 7 / 26
  • 28. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 8 / 26
  • 29. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 30. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 31. Authoritarian Sponsor Deconfounding Experiment Conclusions References Authoritarian Sponsor model • The act of sponsoring • Recommender system - marketing campaigns, testing products • Healthcare - funding experiments, lobbying doctors • The sponsor (€, H) intervenes in an authoritarian manner At = StÃt + (1 − St)Āt, St ∈ {0, 1}, St ∼ €(·|X) Āt ∼ πt(·|X), Ãt ∼ H t (·|X) H t (a|x) = € t (1|x) H t (a|x) + € t (0|x)πt(a|x). • The lack of knowledge about sponsor’s policy (€, H) • Not sharing technology or strategy • Lost in human to algorithm translation • Hard to model process like auctions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 9 / 26
  • 32. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 33. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 34. Authoritarian Sponsor Deconfounding Experiment Conclusions References Targeting mechanisms Introducing an unobserved confounder Z 1 Targeting Completely At Random (TCAR): • S(X) = S, H(a|X, R, Z) = H(a) • kind of like MCAR 2 Targeting At Random (TAR) • S(X) = S(X), H(a|X, R, Z) = H(a|X) • kind of like MAR 3 Targeting Not At Random (TNAR) • H(a|X, R, Z) ⇒ R(a) ̸⊥ ⊥ A|X, S = 1. • kind of like MNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 10 / 26
  • 35. Authoritarian Sponsor Deconfounding Experiment Conclusions References Causal interpretation Figure 1: TCAR Figure 2: TAR Figure 3: TNAR Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 11 / 26
  • 36. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 12 / 26
  • 37. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Internal validity External validity Propensity score ? Table 1: Differences and similarities between data sources Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 13 / 26
  • 38. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 39. Authoritarian Sponsor Deconfounding Experiment Conclusions References Data fusion (see Colnet et al. (2020)) RCT OS Learner Sponsor Internal validity External validity ∼ ∼ Propensity score ? ? Table 2: Differences and similarities between data sources • Unsolved challenge: sampling in interaction! Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 14 / 26
  • 40. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 41. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 42. Authoritarian Sponsor Deconfounding Experiment Conclusions References CATE • CATE τa1,a2 (x) = EDR|A,X=x [R(a1) − R(a2)] and b τa1,a2 (x) = b µa1 (x) − b µa2 (x) • Assumptions • SUTVA: Rt = P a∈A 1(At = a)Rt (a), • Ignorability: 1(At = a) ⊥ ⊥ R(a)|Xt , St = 0 • Ignorability of the study participation: Rt (a) ⊥ ⊥ St |Xt • TNAR: R(a) ̸⊥ ⊥ A|X, S = 1. • Biased CATE on sponsor sample ρa1,a2 (x) = E[R|A = a1, X = x, S = 1] − E[R|A = a2, X = x, S = 1]. • Bias measurement ηa1,a2 (x) = τa1,a2 (x) − ρa1,a2 (x) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 15 / 26
  • 43. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 44. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 45. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 46. Authoritarian Sponsor Deconfounding Experiment Conclusions References Two step deconfounding (see Kallus et al. (2018)), A = {a0, a1} 1 On the observational sample data use a metalearner to obtain b ρa1,a0 (X). 2 Postulate a function q(X, a0) such that E[qt(X, a0)R|X = x, S = 0] = τa1,a0 (x). Where qt(X, a0) = 1(A = a1) πt(a1|X) − 1(A = a0) πt(a0|X) . 3 Using qt(X, a0) apply the definition of ηa1,a0 (x) to adjust the b ρ term by solving an optimization problem on the unconfounded sample: b ηa1,a0 (X) = arg min η X t:St =0 (qt(xt, a0)rt − b ρa1,a0 (xt) − η(xt)) 2 . 4 Finally b τa1,a0 (x) = b ρa1,a0 (x) + b ηa1,a0 (x). Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 16 / 26
  • 47. Authoritarian Sponsor Deconfounding Experiment Conclusions References Deconfounded CATE IGW (D-CATE-IGW) • Let b = arg maxa b µa(xt). π(a|x) = ( 1 K+γm(b µm b (x)−b µm a (x)) for a ̸= b 1 − P c̸=b π(c|x) for a = b = ( 1 K+γm b τb,a(x) for a ̸= b 1 − P c̸=b π(c|x) for a = b , • Each round/epoch deconfound the CATE Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 17 / 26
  • 48. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 18 / 26
  • 49. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 50. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 51. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 52. Authoritarian Sponsor Deconfounding Experiment Conclusions References Setup I • St ∼ Bern(ρ) • No overlap scenario Xt|St = 0 ∼Unif([−1, 1]), Ut|St = 0 ∼N(0, 1). • Full overlap scenario Xt|St = 0 ∼ N(0, 1), Ut|St = 0 ∼ N(0, 1). Xt Ut | {At, St = 1} ∼ N 0 0 , 1 (2At − 1)σA (2At − 1)σA 1 , • σA ∈ {0.6, 0.9} • ρ ∈ {0.3, 0.6} Rt(At) = 1 + At + Xt + 2AtXt + 1/2X2 t + 3/4AtX2 t + 2Ut + 1/2ϵt, where ϵ ∼ N(0, 1), τ(Xt) = 3/4X2 t + 2Xt + 1. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 19 / 26
  • 53. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result I Figure 4: Normed cumulative regret for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 20 / 26
  • 54. Authoritarian Sponsor Deconfounding Experiment Conclusions References Result II Figure 5: True and estimated CATE values for different scenarios Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 21 / 26
  • 55. Authoritarian Sponsor Deconfounding Experiment Conclusions References 1 Authoritarian Sponsor 2 Deconfounding 3 Experiment 4 Conclusions Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 22 / 26
  • 56. Authoritarian Sponsor Deconfounding Experiment Conclusions References Contribution 1 Pioneering model for sponsored content in contextual bandits framework 2 Bandits not as experimental studies, but as observational studies 3 Confounding scenario and deconfounding application 4 D-CATE-IGW works Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 23 / 26
  • 57. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 58. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 59. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 60. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 61. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 62. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 63. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 64. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 65. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 66. Authoritarian Sponsor Deconfounding Experiment Conclusions References Future research • Theoretical • Mathematically model the complicated sampling. Especially the flow of information • Consistency proof of CATE estimator in this scenario • High probability regret bounds on D-CATE-IGW P(REWARD(π) BOUND(δ)) 1 − δ • Empirical • More metalearners (X-learner, R-learner) (see Künzel et al. (2019)) • Other deconfounding methods (see Wu and Yang (2022)) • A more comprehensive empirical study Expansion • Policy evaluation V (π) = EX EA∼π(·|X)ER|A,X [R] b Vt(π) on {(Xs, As, Rs)}t s=1 ∼ D(H ) Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 24 / 26
  • 67. Authoritarian Sponsor Deconfounding Experiment Conclusions References The beginning ... Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 25 / 26
  • 68. Authoritarian Sponsor Deconfounding Experiment Conclusions References Colnet, B., I. Mayer, G. Chen, A. Dieng, R. Li, G. Varoquaux, J.-P. Vert, J. Josse, and S. Yang (2020). Causal inference methods for combining randomized trials and observational studies: a review. arXiv preprint arXiv:2011.08047. Kallus, N., A. M. Puli, and U. Shalit (2018). Removing hidden confounding by experimental grounding. Advances in neural information processing systems 31. Künzel, S. R., J. S. Sekhon, P. J. Bickel, and B. Yu (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences 116(10), 4156–4165. Lattimore, T. and C. Szepesvári (2020). Bandit algorithms. Cambridge University Press. Wu, L. and S. Yang (2022). Integrative r-learner of heterogeneous treatment effects combining experimental and observational studies. In Conference on Causal Learning and Reasoning, pp. 904–926. PMLR. Hubert Drążkowski GRAPE|FAME, Warsaw University of Technology Sponsored content in contextual bandits. Deconfounding Targeting Not At Random 26 / 26