SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Introduction of
“Fairness in Learning: Classic and
Contextual Bandits”
authorized by Matthew Joseph, Michael Kearns, Jamie
Morgenstern, and Aaron Roth
NIPS2016-Yomi
January 19, 2017
Presenter: Kazuto Fukuchi
Fairness in Machine Learning
Consequential decisions using machine learning may lead
unfair treatment
E.g., Google’s ad suggestion system [Sweeney 13]
Fairness in contextual bandit problem
African descent names European descent names
Arrested? Located
Negative ad. Neutral ad.
Individual fairness
𝐾 persons
• Choose one person for conducting an action
• E.g., lend loan, hire, admission, etc.
When we can preferentially choose one person?
Only if the person has the largest ability
There is no other reason for preferential choice
Payback 90% Payback 60%
>
Contextual Bandit Problem
Each round 𝑡
1. Obtain a context 𝑥𝑗
𝑡
for each arm 𝑗
2. Choose one arm 𝑖 𝑡 ∈ [𝐾]
3. Observe reward 𝑟𝑖 𝑡
𝑡
s.t. 𝔼 𝑟𝑗
𝑡
= 𝑓𝑗 𝑥𝑗
𝑡
and 𝑟𝑗
𝑡
∈ [0,1] a.s.
𝐾-arms
𝑓1 𝑓2 𝑓3 𝑓4 𝑓5
Unknown to
the learner
Goal: Maximize the expected cumulative reward
𝔼
𝑡
𝑟𝑖 𝑡
𝑡
= 𝔼
𝑡
𝑓𝑖 𝑡
𝑥𝑖 𝑡
𝑡
Example: Linear Contextual Bandit
Define
𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑
, 𝜃 ≤ 1
𝒳 = 𝑥 ∈ ℝ 𝑑
∶ 𝑥 ≤ 1
• Suppose 𝑓𝑗 = 𝑓𝜃 𝑗
∈ 𝐶, 𝑥𝑗
𝑡
∈ 𝒳
E.g., Online recommendation
• 𝜃𝑗: Feature of a product 𝑗
• 𝑥𝑗
𝑡
: Feature of a user 𝑡 regarding the product 𝑗
• Score of a user 𝑡 for a product 𝑗 is an inner product
𝑥𝑗
𝑡
, 𝜃𝑗
Example: Classic Bandit
• Expected reward is 𝔼 𝑟𝑗
𝑡
= 𝜇 𝑗
• Set 𝑓𝑗 𝑥𝑗
𝑡
= 𝜇 𝑗 for any 𝑥𝑗
𝑡
• Then, the contextual bandit becomes to the classic bandit
𝜇1 𝜇2 𝜇3 𝜇4 𝜇5
Regret
• History ℎ 𝑡: a record of 𝑡 − 1 experiences
• contexts, arm chosen, and reward observed
• A policy 𝜋: mapping from 𝑥 𝑡
and ℎ 𝑡 to a distribution on arms [𝐾]
• Probability of choosing arm 𝑗 with ℎ 𝑡 at round 𝑡
𝜋𝑗|ℎ 𝑡
𝑡
Regret: Dropped reward compared to the optimal policy
Regret 𝑥1
, … , 𝑥 𝑇
=
𝑡
max
𝑗
𝑓𝑗 𝑥𝑗
𝑡
− 𝔼𝑖 𝑡∼𝜋 𝑡
𝑡
𝑓𝑖 𝑡 𝑥𝑖 𝑡
𝑡
Regret bound 𝑅(𝑇) if max
𝑥1,…,𝑥 𝑇
Regret 𝑥1
, … , 𝑥 𝑇
≤ 𝑅(𝑇)
Fairness Constraint
It is unfair to preferentially choose one individual without an
acceptable reason
A policy 𝜋 is 𝜹-fair if with probability 1 − 𝛿
𝜋𝑗|ℎ
𝑡
> 𝜋𝑗′|ℎ
𝑡
only if 𝑓𝑗 𝑥𝑗
𝑡
> 𝑓𝑗′ 𝑥𝑗′
𝑡
.
Quality of the chosen individual is larger than others.
Probability of choosing arm
𝑗 at round 𝑡
𝑓𝑗(𝑥𝑗
𝑡
)
>
𝑓𝑗′(𝑥𝑗′
𝑡
)
Institution of Fairness Constraint
• Optimal policy is fair
• But we can’t get the optimal policy due to unknown 𝑓1, … , 𝑓𝐾
>
Can’t distinguish which arm has high
expected reward
Expected reward is lower than the left
group with h.p.
Fairness constraint enforces to choose a arm from the left
group with uniform distribution
Fairness in Classic Bandit
• Consider confidence bounds of the expected rewards
• Choose uniformly from the chained group
expected rewards
Arm 1
Arm 2
Arm 3
Arm 4
Arm 5
Chained
Expected reward is lower than that of arms
in the chained group
Fair Algorithm for Classic Bandit
Regret Upper Bound
If 𝛿 <
1
𝑇
, then FairBandits has regret
𝑅 𝑇 = 𝑂 𝑘3 𝑇 ln
𝑇𝑘
𝛿
• 𝑇 = Ω 𝑘3 rounds require to obtain non-trivial regret, i.e.,
𝑅 𝑇
𝑇
≪ 1
• Non-fair case: 𝑂 𝑘𝑇
• 𝑘 becomes 𝑘3
by fairness constraint
• Dependence on 𝑇 is optimal
Regret Lower Bound
Any fair algorithm experiences constant per-round regret for at
least
𝑇 = Ω 𝑘3
ln
1
𝛿
• constant per-round regret = non-trivial regret
• To achieve non-trivial regret, we need at least 𝑘3
rounds
• Thus, Ω 𝑘3
is necessary and sufficient
Fairness in Contextual Bandit
KWIK learnable = Fair bandit learnable
KWIK (Know What It Know) learning
• Online regression
• Learner outputs either prediction 𝑦 𝑡 ∈ [0,1] or 𝑦 𝑡 =⊥
• ⊥ denotes “I Don’t Know”
• Only when 𝑦 𝑡 =⊥, the learner observes feedback 𝑦 𝑡 s.t.
𝔼 𝑦 𝑡 = 𝑓 𝑥 𝑡
𝑥 𝑡
Feature
Learner
𝑦 𝑡
∈ [0,1]
“I Don’t Know”
Accurately
predictable
KWIK learnable
(𝜖, 𝛿)-KWIK learnable on a class 𝑓 ∈ 𝐶 with 𝑚 𝜖, 𝛿 if
1. 𝑦 𝑡 ∈ ⊥ ∪ [𝑓 𝑥 𝑡 − 𝜖, 𝑓 𝑥 𝑡 + 𝜖] for all 𝑡 w.p. 1 − 𝛿
2. 𝑡=1
∞
𝕀 𝑦 𝑡
=⊥ ≤ 𝑚 𝜖, 𝛿
Institutions
• Prediction is accurate if 𝑦 𝑡 ≠⊥
• With small number of answering ⊥
• number of answering ⊥ = 𝑚 𝜖, 𝛿
KWIK Learnability Implies Fair Bandit
Learnability
Suppose 𝐶 is (𝜖, 𝛿)-KWIK learnable with 𝑚 𝜖, 𝛿
Then, there is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 s.t.
𝑅 𝑇 = 𝑂 max 𝑘2 𝑚 𝜖∗,
min 𝛿,
1
𝑇
𝑇2 𝑘
, 𝑘3 ln
𝑘
𝛿
For 𝛿 ≤
1
𝑇
where
𝜖∗ = arg min
𝜖
max 𝜖𝑇, 𝑘𝑚 𝜖,
min 𝛿,
1
𝑇
𝑇2 𝑘
Linear Contextual Bandit Case
• Let
𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑, 𝜃 ≤ 1
𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1
• Then,
𝑅 𝑇 = 𝑂 max 𝑇
4
5 𝑘
6
5 𝑑
3
5, 𝑘3 ln
𝑘
𝛿
KWIK to Fair
Institution of KWIKToFair
• Predict the expected rewards using KWIK algorithm for each
arm
• If the outputs of KWIK algorithm is not ⊥
• Same strategy of classic bandit is applicable
expected rewards 𝑓𝑗 𝑥𝑗
𝑡
Arm 1
Arm 2
Arm 3
Arm 4
Arm 5
2𝜖∗
Fair Bandit Learnability Implies KWIK
Learnability
Suppose
• There is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 with regret 𝑅 𝑇, 𝛿
• There exists 𝑓 ∈ 𝐶, 𝑥 ℓ ∈ 𝒳 s.t. 𝑓 𝑥 ℓ = ℓ𝜖 for ℓ =
1, … ,
1
𝜖
Then, there is (𝜖, 𝛿)-KWIK learnable algorithm for 𝐶 with
𝑚 𝜖, 𝛿 is the solution of
𝑚 𝜖, 𝛿 𝜖
4
= 𝑅 𝑚 𝜖, 𝛿 ,
𝜖𝛿
2𝑇
An Exponential Separation Between Fair
and Unfair Learning
• Boolean conjunctions: Let 𝑥 ∈ 0,1 𝑑
𝐶 = 𝑓|𝑓 𝑥 = 𝑥𝑖1
∧ ⋯ ∧ 𝑥𝑖 𝑘
, 0 ≤ 𝑘 ≤ 𝑑, 𝑖1, … , 𝑖 𝑘 ∈ [𝑑]
• Boolean conjunctions without fairness constraint
𝑅 𝑇 = 𝑂(𝑘2
𝑑)
• For such 𝐶, KWIK bound is at least 𝑚 𝜖, 𝛿 = Ω 2 𝑑
• For 𝛿 <
1
2𝑇
, worst case regret bound is
𝑅 𝑇 = Ω 2 𝑑
Fair to KWIK
Institution of FairToKWIK
• Divide domain of 𝑓(𝑥 𝑡) s.t. each width becomes 𝜖∗
• Using fair algorithm,
𝑓 𝑥 𝑡0 𝜖∗ 2𝜖∗
𝑥(0) 𝑥(1) 𝑥(2)
𝑥 𝑡
𝑥(ℓ) 𝑥 𝑡
>
<
?
𝑥(3) 𝑥(4)
𝑝ℓ,1 𝑝ℓ,2
Prob. of choosing left
arm
Prob. of choosing
right arm
If 𝑝ℓ,1 ≠ 𝑝ℓ,2 for all ℓ ≠ 3,
𝑥 𝑡
is in the red area
Output 3𝜖∗
Otherwise,
Output ⊥
Conclusions
• Fairness in contextual bandit problem and classic bandit
problem
• 𝛿-fair: with probability 1 − 𝛿
𝜋𝑗|ℎ
𝑡
> 𝜋𝑗′|ℎ
𝑡
only if 𝑓𝑗 𝑥𝑗
𝑡
> 𝑓𝑗′ 𝑥𝑗′
𝑡
Results
• Classical Bandits: Necessary and sufficient rounds to achieve
non-trivial regret is Θ 𝑘3
• Contextual Bandits: Tightly relationship with Knows What it
Knows (KWIK) learning

Weitere ähnliche Inhalte

Was ist angesagt?

K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering SimplyEmad Nabil
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimizationMarco Quartulli
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsKen Kuroki
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptationtaeseon ryu
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmKatsuki Ohto
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distanceananth
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnnJaey Jeong
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 

Was ist angesagt? (20)

K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
08 distributed optimization
08 distributed optimization08 distributed optimization
08 distributed optimization
 
Machine learning
Machine learningMachine learning
Machine learning
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain AdaptationAdversarial Reinforced Learning for Unsupervised Domain Adaptation
Adversarial Reinforced Learning for Unsupervised Domain Adaptation
 
Introduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithmIntroduction of "TrailBlazer" algorithm
Introduction of "TrailBlazer" algorithm
 
L06 stemmer and edit distance
L06 stemmer and edit distanceL06 stemmer and edit distance
L06 stemmer and edit distance
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnn
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Nearest neighbors
Nearest neighborsNearest neighbors
Nearest neighbors
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 

Andere mochten auch

時系列データ3
時系列データ3時系列データ3
時系列データ3graySpace999
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansKimikazu Kato
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoderssuga93
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentHiroyuki Fukuda
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence LearningDeep Learning JP
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics Koichi Hamada
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...Kusano Hitoshi
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介Kohei Hayashi
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 

Andere mochten auch (11)

時系列データ3
時系列データ3時系列データ3
時系列データ3
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 

Ähnlich wie Introduction of “Fairness in Learning: Classic and Contextual Bandits”

2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptxZhiwuGuo1
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
Statistical Mechanics of SCOTUS
Statistical Mechanics of SCOTUSStatistical Mechanics of SCOTUS
Statistical Mechanics of SCOTUSEdward Lee
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random ForestsCloudxLab
 
Sigma xi showcase_2013_draft0
Sigma xi showcase_2013_draft0Sigma xi showcase_2013_draft0
Sigma xi showcase_2013_draft0Edward Lee
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
Fundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationFundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationMEASURE Evaluation
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationJulieDash5
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Amazon Web Services
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdfSugumarSarDurai
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesAlejandro Correa Bahnsen, PhD
 
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...Masanori HIRANO
 
Cooperation and Reputation
Cooperation and ReputationCooperation and Reputation
Cooperation and ReputationVincent Traag
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix reviewtaeseon ryu
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic AlgorithmSHIMI S L
 

Ähnlich wie Introduction of “Fairness in Learning: Classic and Contextual Bandits” (20)

13_RL_1.pdf
13_RL_1.pdf13_RL_1.pdf
13_RL_1.pdf
 
2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx2Multi_armed_bandits.pptx
2Multi_armed_bandits.pptx
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
Statistical Mechanics of SCOTUS
Statistical Mechanics of SCOTUSStatistical Mechanics of SCOTUS
Statistical Mechanics of SCOTUS
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Sigma xi showcase_2013_draft0
Sigma xi showcase_2013_draft0Sigma xi showcase_2013_draft0
Sigma xi showcase_2013_draft0
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Fundamentals of Program Impact Evaluation
Fundamentals of Program Impact EvaluationFundamentals of Program Impact Evaluation
Fundamentals of Program Impact Evaluation
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
Statistics (recap)
Statistics (recap)Statistics (recap)
Statistics (recap)
 
STLtalk about statistical analysis and its application
STLtalk about statistical analysis and its applicationSTLtalk about statistical analysis and its application
STLtalk about statistical analysis and its application
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
Unbiased Learning from Biased User Feedback (AIS304) - AWS re:Invent 2018
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision Trees
 
Stochastic Optimization
Stochastic OptimizationStochastic Optimization
Stochastic Optimization
 
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
2020/11/19 PRIMA2020: Simulation of Unintentional Collusion Caused by Auto Pr...
 
Cooperation and Reputation
Cooperation and ReputationCooperation and Reputation
Cooperation and Reputation
 
Mentor mix review
Mentor mix reviewMentor mix review
Mentor mix review
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Introduction of “Fairness in Learning: Classic and Contextual Bandits”

  • 1. Introduction of “Fairness in Learning: Classic and Contextual Bandits” authorized by Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Aaron Roth NIPS2016-Yomi January 19, 2017 Presenter: Kazuto Fukuchi
  • 2. Fairness in Machine Learning Consequential decisions using machine learning may lead unfair treatment E.g., Google’s ad suggestion system [Sweeney 13] Fairness in contextual bandit problem African descent names European descent names Arrested? Located Negative ad. Neutral ad.
  • 3. Individual fairness 𝐾 persons • Choose one person for conducting an action • E.g., lend loan, hire, admission, etc. When we can preferentially choose one person? Only if the person has the largest ability There is no other reason for preferential choice Payback 90% Payback 60% >
  • 4. Contextual Bandit Problem Each round 𝑡 1. Obtain a context 𝑥𝑗 𝑡 for each arm 𝑗 2. Choose one arm 𝑖 𝑡 ∈ [𝐾] 3. Observe reward 𝑟𝑖 𝑡 𝑡 s.t. 𝔼 𝑟𝑗 𝑡 = 𝑓𝑗 𝑥𝑗 𝑡 and 𝑟𝑗 𝑡 ∈ [0,1] a.s. 𝐾-arms 𝑓1 𝑓2 𝑓3 𝑓4 𝑓5 Unknown to the learner Goal: Maximize the expected cumulative reward 𝔼 𝑡 𝑟𝑖 𝑡 𝑡 = 𝔼 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡
  • 5. Example: Linear Contextual Bandit Define 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑 , 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Suppose 𝑓𝑗 = 𝑓𝜃 𝑗 ∈ 𝐶, 𝑥𝑗 𝑡 ∈ 𝒳 E.g., Online recommendation • 𝜃𝑗: Feature of a product 𝑗 • 𝑥𝑗 𝑡 : Feature of a user 𝑡 regarding the product 𝑗 • Score of a user 𝑡 for a product 𝑗 is an inner product 𝑥𝑗 𝑡 , 𝜃𝑗
  • 6. Example: Classic Bandit • Expected reward is 𝔼 𝑟𝑗 𝑡 = 𝜇 𝑗 • Set 𝑓𝑗 𝑥𝑗 𝑡 = 𝜇 𝑗 for any 𝑥𝑗 𝑡 • Then, the contextual bandit becomes to the classic bandit 𝜇1 𝜇2 𝜇3 𝜇4 𝜇5
  • 7. Regret • History ℎ 𝑡: a record of 𝑡 − 1 experiences • contexts, arm chosen, and reward observed • A policy 𝜋: mapping from 𝑥 𝑡 and ℎ 𝑡 to a distribution on arms [𝐾] • Probability of choosing arm 𝑗 with ℎ 𝑡 at round 𝑡 𝜋𝑗|ℎ 𝑡 𝑡 Regret: Dropped reward compared to the optimal policy Regret 𝑥1 , … , 𝑥 𝑇 = 𝑡 max 𝑗 𝑓𝑗 𝑥𝑗 𝑡 − 𝔼𝑖 𝑡∼𝜋 𝑡 𝑡 𝑓𝑖 𝑡 𝑥𝑖 𝑡 𝑡 Regret bound 𝑅(𝑇) if max 𝑥1,…,𝑥 𝑇 Regret 𝑥1 , … , 𝑥 𝑇 ≤ 𝑅(𝑇)
  • 8. Fairness Constraint It is unfair to preferentially choose one individual without an acceptable reason A policy 𝜋 is 𝜹-fair if with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 . Quality of the chosen individual is larger than others. Probability of choosing arm 𝑗 at round 𝑡 𝑓𝑗(𝑥𝑗 𝑡 ) > 𝑓𝑗′(𝑥𝑗′ 𝑡 )
  • 9. Institution of Fairness Constraint • Optimal policy is fair • But we can’t get the optimal policy due to unknown 𝑓1, … , 𝑓𝐾 > Can’t distinguish which arm has high expected reward Expected reward is lower than the left group with h.p. Fairness constraint enforces to choose a arm from the left group with uniform distribution
  • 10. Fairness in Classic Bandit • Consider confidence bounds of the expected rewards • Choose uniformly from the chained group expected rewards Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 Chained Expected reward is lower than that of arms in the chained group
  • 11. Fair Algorithm for Classic Bandit
  • 12. Regret Upper Bound If 𝛿 < 1 𝑇 , then FairBandits has regret 𝑅 𝑇 = 𝑂 𝑘3 𝑇 ln 𝑇𝑘 𝛿 • 𝑇 = Ω 𝑘3 rounds require to obtain non-trivial regret, i.e., 𝑅 𝑇 𝑇 ≪ 1 • Non-fair case: 𝑂 𝑘𝑇 • 𝑘 becomes 𝑘3 by fairness constraint • Dependence on 𝑇 is optimal
  • 13. Regret Lower Bound Any fair algorithm experiences constant per-round regret for at least 𝑇 = Ω 𝑘3 ln 1 𝛿 • constant per-round regret = non-trivial regret • To achieve non-trivial regret, we need at least 𝑘3 rounds • Thus, Ω 𝑘3 is necessary and sufficient
  • 14. Fairness in Contextual Bandit KWIK learnable = Fair bandit learnable KWIK (Know What It Know) learning • Online regression • Learner outputs either prediction 𝑦 𝑡 ∈ [0,1] or 𝑦 𝑡 =⊥ • ⊥ denotes “I Don’t Know” • Only when 𝑦 𝑡 =⊥, the learner observes feedback 𝑦 𝑡 s.t. 𝔼 𝑦 𝑡 = 𝑓 𝑥 𝑡 𝑥 𝑡 Feature Learner 𝑦 𝑡 ∈ [0,1] “I Don’t Know” Accurately predictable
  • 15. KWIK learnable (𝜖, 𝛿)-KWIK learnable on a class 𝑓 ∈ 𝐶 with 𝑚 𝜖, 𝛿 if 1. 𝑦 𝑡 ∈ ⊥ ∪ [𝑓 𝑥 𝑡 − 𝜖, 𝑓 𝑥 𝑡 + 𝜖] for all 𝑡 w.p. 1 − 𝛿 2. 𝑡=1 ∞ 𝕀 𝑦 𝑡 =⊥ ≤ 𝑚 𝜖, 𝛿 Institutions • Prediction is accurate if 𝑦 𝑡 ≠⊥ • With small number of answering ⊥ • number of answering ⊥ = 𝑚 𝜖, 𝛿
  • 16. KWIK Learnability Implies Fair Bandit Learnability Suppose 𝐶 is (𝜖, 𝛿)-KWIK learnable with 𝑚 𝜖, 𝛿 Then, there is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 s.t. 𝑅 𝑇 = 𝑂 max 𝑘2 𝑚 𝜖∗, min 𝛿, 1 𝑇 𝑇2 𝑘 , 𝑘3 ln 𝑘 𝛿 For 𝛿 ≤ 1 𝑇 where 𝜖∗ = arg min 𝜖 max 𝜖𝑇, 𝑘𝑚 𝜖, min 𝛿, 1 𝑇 𝑇2 𝑘
  • 17. Linear Contextual Bandit Case • Let 𝐶 = 𝑓𝜃 ∶ 𝑓𝜃 𝑥 = 𝜃, 𝑥 , 𝜃 ∈ ℝ 𝑑, 𝜃 ≤ 1 𝒳 = 𝑥 ∈ ℝ 𝑑 ∶ 𝑥 ≤ 1 • Then, 𝑅 𝑇 = 𝑂 max 𝑇 4 5 𝑘 6 5 𝑑 3 5, 𝑘3 ln 𝑘 𝛿
  • 19. Institution of KWIKToFair • Predict the expected rewards using KWIK algorithm for each arm • If the outputs of KWIK algorithm is not ⊥ • Same strategy of classic bandit is applicable expected rewards 𝑓𝑗 𝑥𝑗 𝑡 Arm 1 Arm 2 Arm 3 Arm 4 Arm 5 2𝜖∗
  • 20. Fair Bandit Learnability Implies KWIK Learnability Suppose • There is 𝛿-fair algorithm for 𝑓𝑗 ∈ 𝐶 with regret 𝑅 𝑇, 𝛿 • There exists 𝑓 ∈ 𝐶, 𝑥 ℓ ∈ 𝒳 s.t. 𝑓 𝑥 ℓ = ℓ𝜖 for ℓ = 1, … , 1 𝜖 Then, there is (𝜖, 𝛿)-KWIK learnable algorithm for 𝐶 with 𝑚 𝜖, 𝛿 is the solution of 𝑚 𝜖, 𝛿 𝜖 4 = 𝑅 𝑚 𝜖, 𝛿 , 𝜖𝛿 2𝑇
  • 21. An Exponential Separation Between Fair and Unfair Learning • Boolean conjunctions: Let 𝑥 ∈ 0,1 𝑑 𝐶 = 𝑓|𝑓 𝑥 = 𝑥𝑖1 ∧ ⋯ ∧ 𝑥𝑖 𝑘 , 0 ≤ 𝑘 ≤ 𝑑, 𝑖1, … , 𝑖 𝑘 ∈ [𝑑] • Boolean conjunctions without fairness constraint 𝑅 𝑇 = 𝑂(𝑘2 𝑑) • For such 𝐶, KWIK bound is at least 𝑚 𝜖, 𝛿 = Ω 2 𝑑 • For 𝛿 < 1 2𝑇 , worst case regret bound is 𝑅 𝑇 = Ω 2 𝑑
  • 23. Institution of FairToKWIK • Divide domain of 𝑓(𝑥 𝑡) s.t. each width becomes 𝜖∗ • Using fair algorithm, 𝑓 𝑥 𝑡0 𝜖∗ 2𝜖∗ 𝑥(0) 𝑥(1) 𝑥(2) 𝑥 𝑡 𝑥(ℓ) 𝑥 𝑡 > < ? 𝑥(3) 𝑥(4) 𝑝ℓ,1 𝑝ℓ,2 Prob. of choosing left arm Prob. of choosing right arm If 𝑝ℓ,1 ≠ 𝑝ℓ,2 for all ℓ ≠ 3, 𝑥 𝑡 is in the red area Output 3𝜖∗ Otherwise, Output ⊥
  • 24. Conclusions • Fairness in contextual bandit problem and classic bandit problem • 𝛿-fair: with probability 1 − 𝛿 𝜋𝑗|ℎ 𝑡 > 𝜋𝑗′|ℎ 𝑡 only if 𝑓𝑗 𝑥𝑗 𝑡 > 𝑓𝑗′ 𝑥𝑗′ 𝑡 Results • Classical Bandits: Necessary and sufficient rounds to achieve non-trivial regret is Θ 𝑘3 • Contextual Bandits: Tightly relationship with Knows What it Knows (KWIK) learning