SlideShare ist ein Scribd-Unternehmen logo
1 von 26
METAGAMING:
Bandits with simple regret and small
budget
Chen-Wei Chou, Ping-Chiang Chou,
Chang-Shing Lee, David Lupien St-Pierre,
Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu
and Shi-Jim Yen
Outline:
- what is a bandit problem ?
- what is a strategic bandit problem ?
- is a strategic bandit different from a bandit ?
- algorithms
- results
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we collect
information
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here we use
information for
the final choice
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we
explore
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step:
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end:
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Here, we take no risk
What is a bandit problem ?
A finite number of time steps
A (finite) number of options,
each of them equipped with a (unknown) proba distribution
At each time step (exploration):
- you choose one option
- you get a reward, distributed according to its proba distribution
At the end (recommendation):
- you choose one option (you can not change anymore...)
- your reward is the expected reward associated to this option
Which kind of bandit ?
- in the bandit literature, options are
also termed “arms”
- here the criterion is the expected reward
of the option chosen at the end
(sometimes it is the sum
of the rewards during exploration)
- we presented here stochastic bandits
(a probability distribution
per option) ==> next slide is different
And adversarial bandit ?
A finite number of time steps
A (finite) number of options for player 1,
and a finite number of options for player 2.
An unknown probability distribution for each pair of options
At each time step:
- you choose one option for P1 and one option for P2
- you get a reward, distributed according to the
corresponding proba distribution
At the end:
- you choose one **probabilistic** option for P1
(you can not change anymore...)
- your reward is the expected reward associated to this option,
for the worst choice by P2
What is meta-gaming ?
What is “strategic choice” ?
Strategic choices:
- decisions once and for all, at a high level
- ≠ from tactical level
Meta-gaming: choice at a strategic level, in games:
- choosings cards, in card games
- choosing handicap positioning, in Go
==> once and for all, at the beginning of the game
Example of stochastic bandit
(i.e. 1P strategic choice)
Game of Go handicap bandit problem, at each time step:
- you choose one handicap positioning
- then you simulate one game from this position
==> only one player has a strategic choice
==> stochastic bandit
Example of adversarial bandit
(i.e. 2P strategic choice)
Urban Rivals bandit problem, at each time step:
- you choose
- one set of cards for you (P1)
- one set of cards for P2
- then you simulate one Urban Rivals game from this position
PLAYER 1:
PLAYER 2:
==> two players have a strategic choice
==> adversarial bandit
Is a strategic bandit problem
different from
a classical bandit problem ?
No difference in nature
Just a much
smaller budget
Algorithms
Reminder:
- two algorithms needed:
- one for choosing during N exploration steps
- one for choosing during 1 recommendation step
- two settings
- one-player case
- two-player case
Algorithms for exploration
Uniform: test all options uniformly
Bernstein races:
- uniformly among non discarded options,
- discard options with statistical tests
Successive reject:
- uniformly among non discarded options,
- discard periodically the worst option
UCB: choose option with best average result + bonus
for options weakly sampled,
Adaptive-UCB-E: a variant of UCB aimed at removing
hyper-parameters
EXP3: empirically best option + random perturbation
Algorithms for recommendation
Empirically Best Arm: choose empirically best option
Most Played Arm: choose most simulated option
Successive reject:: the only non discarded option
UCB: choose option with best average result + bonus
for options weakly sampled.
LCB: choose option with best average result + malus for
options weakly sampled.
Empirical distribution of play: an option has its
frequency (during exploration) as probability (for
recommendation)
TEXP3: idem, but discard low probability options
Experimental results
Big boring tables of results
are in the paper.
Only a sample of most clear
results here.
One player case
Killall Go stones positionning
One player case
Killall Go stones positionning
Uncertainty
should
have
malus in
recommend.
One player case
Killall Go stones positionning
EXP3 for
2player
case
Experimental results: TEXP3
outperforms EXP3 by far
2-player case, game =
Urban Rivals (free online card game)
Do you know killall-Go ?
Black has stones in advance (e.g. 8 in 13x13).
If white makes life, white wins.
If black kills everything, black wins.
Black choose stones
positioning
(strategic decisions).
Left: human is Black and chooses E3 C4.
Right: computer is Black and chooses D3 D5.
White won both.
Human said that the computer choice D3 D5 is good.
Killall Go, H8 (left) H9 (right)
Left: Human Pro Player (5P) as black has 8 handicap stones.
White (computer) makes life and wins.
Right: Human Pro Player (5P) as black has 9 handicap stones
and kills everything and wins.
CONCLUSIONS
1 player case:
UCB for exploration,
LCB or MPA for recommendation
2 player case:
TEXP3 performs best.
Killall-Go
Win against pro with H2 in 7x7 Killall-Go as white.
Loss against pro with H2 in 7x7 Killall-Go as black.
13x13: Computer won as white with H8, lost with H9.
13x13: Computer lost as black with H8 and with H9.
Further work:
Structured bandit: some options are close to each other.
Batoo: Go with strategic choice for both players; nice test case.
Industry: choosing investments for power grid simulations – in progress.

Weitere ähnliche Inhalte

Andere mochten auch

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Olivier Teytaud
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsOlivier Teytaud
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsOlivier Teytaud
 
Games with partial information
Games with partial informationGames with partial information
Games with partial informationOlivier Teytaud
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...Olivier Teytaud
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new toolsOlivier Teytaud
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind GoOlivier Teytaud
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)Olivier Teytaud
 

Andere mochten auch (11)

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and s...
 
Noisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary AlgorithmsNoisy Optimization combining Bandits and Evolutionary Algorithms
Noisy Optimization combining Bandits and Evolutionary Algorithms
 
Tools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power SystemsTools for Discrete Time Control; Application to Power Systems
Tools for Discrete Time Control; Application to Power Systems
 
Games with partial information
Games with partial informationGames with partial information
Games with partial information
 
Grenoble
GrenobleGrenoble
Grenoble
 
Theory of games
Theory of gamesTheory of games
Theory of games
 
The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...The game of Go and energy; two nice computational intelligence problems (with...
The game of Go and energy; two nice computational intelligence problems (with...
 
Optimization of power systems - old and new tools
Optimization of power systems - old and new toolsOptimization of power systems - old and new tools
Optimization of power systems - old and new tools
 
Artificial intelligence and blind Go
Artificial intelligence and blind GoArtificial intelligence and blind Go
Artificial intelligence and blind Go
 
Energy Management (production side)
Energy Management (production side)Energy Management (production side)
Energy Management (production side)
 
Openoffice and Linux
Openoffice and LinuxOpenoffice and Linux
Openoffice and Linux
 

Ähnlich wie Choosing between several options in uncertain environments

Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docxsnehil35
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theorytutor2u
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationOlivier Teytaud
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxDrNavaneethaKumar
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdfDebadattaPanda4
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game TheoryPaul Trafford
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptSanGeet25
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionJohn Bradford
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAsst.prof M.Gokilavani
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and youtrixobird
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...DevGAMM Conference
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxVivekSaurabh7
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentDrDeeptiSharma12
 

Ähnlich wie Choosing between several options in uncertain environments (20)

file1
file1file1
file1
 
Game Theory SV.docx
Game Theory SV.docxGame Theory SV.docx
Game Theory SV.docx
 
Oligopoly and Game Theory
Oligopoly and Game TheoryOligopoly and Game Theory
Oligopoly and Game Theory
 
Simple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimizationSimple regret bandit algorithms for unstructured noisy optimization
Simple regret bandit algorithms for unstructured noisy optimization
 
Module 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptxModule 3 Game Theory (1).pptx
Module 3 Game Theory (1).pptx
 
navingameppt-191018085333.pdf
navingameppt-191018085333.pdfnavingameppt-191018085333.pdf
navingameppt-191018085333.pdf
 
An introduction to Game Theory
An introduction to Game TheoryAn introduction to Game Theory
An introduction to Game Theory
 
game THEORY ppt
game THEORY pptgame THEORY ppt
game THEORY ppt
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 
cai
caicai
cai
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
 
Topic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective ActionTopic 3- Cooperation and Collective Action
Topic 3- Cooperation and Collective Action
 
adversial search.pptx
adversial search.pptxadversial search.pptx
adversial search.pptx
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdfAI3391 Artificial Intelligence UNIT III Notes_merged.pdf
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
 
Misscommunication and you
Misscommunication and youMisscommunication and you
Misscommunication and you
 
Adversarial search
Adversarial search Adversarial search
Adversarial search
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
 
OR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptxOR PPT 280322 maximin final - nikhil tiwari.pptx
OR PPT 280322 maximin final - nikhil tiwari.pptx
 
Game theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics contentGame theory.ppt for Micro Economics content
Game theory.ppt for Micro Economics content
 
AI_unit3.pptx
AI_unit3.pptxAI_unit3.pptx
AI_unit3.pptx
 

Kürzlich hochgeladen

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

Choosing between several options in uncertain environments

  • 1. METAGAMING: Bandits with simple regret and small budget Chen-Wei Chou, Ping-Chiang Chou, Chang-Shing Lee, David Lupien St-Pierre, Olivier Teytaud, Mei-Hui Wang, Li-Wen Wu and Shi-Jim Yen
  • 2. Outline: - what is a bandit problem ? - what is a strategic bandit problem ? - is a strategic bandit different from a bandit ? - algorithms - results
  • 3. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 4. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we collect information
  • 5. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here we use information for the final choice
  • 6. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we explore
  • 7. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step: - you choose one option - you get a reward, distributed according to its proba distribution At the end: - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option Here, we take no risk
  • 8. What is a bandit problem ? A finite number of time steps A (finite) number of options, each of them equipped with a (unknown) proba distribution At each time step (exploration): - you choose one option - you get a reward, distributed according to its proba distribution At the end (recommendation): - you choose one option (you can not change anymore...) - your reward is the expected reward associated to this option
  • 9. Which kind of bandit ? - in the bandit literature, options are also termed “arms” - here the criterion is the expected reward of the option chosen at the end (sometimes it is the sum of the rewards during exploration) - we presented here stochastic bandits (a probability distribution per option) ==> next slide is different
  • 10. And adversarial bandit ? A finite number of time steps A (finite) number of options for player 1, and a finite number of options for player 2. An unknown probability distribution for each pair of options At each time step: - you choose one option for P1 and one option for P2 - you get a reward, distributed according to the corresponding proba distribution At the end: - you choose one **probabilistic** option for P1 (you can not change anymore...) - your reward is the expected reward associated to this option, for the worst choice by P2
  • 11. What is meta-gaming ? What is “strategic choice” ? Strategic choices: - decisions once and for all, at a high level - ≠ from tactical level Meta-gaming: choice at a strategic level, in games: - choosings cards, in card games - choosing handicap positioning, in Go ==> once and for all, at the beginning of the game
  • 12. Example of stochastic bandit (i.e. 1P strategic choice) Game of Go handicap bandit problem, at each time step: - you choose one handicap positioning - then you simulate one game from this position ==> only one player has a strategic choice ==> stochastic bandit
  • 13. Example of adversarial bandit (i.e. 2P strategic choice) Urban Rivals bandit problem, at each time step: - you choose - one set of cards for you (P1) - one set of cards for P2 - then you simulate one Urban Rivals game from this position PLAYER 1: PLAYER 2: ==> two players have a strategic choice ==> adversarial bandit
  • 14. Is a strategic bandit problem different from a classical bandit problem ? No difference in nature Just a much smaller budget
  • 15. Algorithms Reminder: - two algorithms needed: - one for choosing during N exploration steps - one for choosing during 1 recommendation step - two settings - one-player case - two-player case
  • 16. Algorithms for exploration Uniform: test all options uniformly Bernstein races: - uniformly among non discarded options, - discard options with statistical tests Successive reject: - uniformly among non discarded options, - discard periodically the worst option UCB: choose option with best average result + bonus for options weakly sampled, Adaptive-UCB-E: a variant of UCB aimed at removing hyper-parameters EXP3: empirically best option + random perturbation
  • 17. Algorithms for recommendation Empirically Best Arm: choose empirically best option Most Played Arm: choose most simulated option Successive reject:: the only non discarded option UCB: choose option with best average result + bonus for options weakly sampled. LCB: choose option with best average result + malus for options weakly sampled. Empirical distribution of play: an option has its frequency (during exploration) as probability (for recommendation) TEXP3: idem, but discard low probability options
  • 18. Experimental results Big boring tables of results are in the paper. Only a sample of most clear results here.
  • 19. One player case Killall Go stones positionning
  • 20. One player case Killall Go stones positionning Uncertainty should have malus in recommend.
  • 21. One player case Killall Go stones positionning EXP3 for 2player case
  • 22. Experimental results: TEXP3 outperforms EXP3 by far 2-player case, game = Urban Rivals (free online card game)
  • 23. Do you know killall-Go ? Black has stones in advance (e.g. 8 in 13x13). If white makes life, white wins. If black kills everything, black wins. Black choose stones positioning (strategic decisions).
  • 24. Left: human is Black and chooses E3 C4. Right: computer is Black and chooses D3 D5. White won both. Human said that the computer choice D3 D5 is good.
  • 25. Killall Go, H8 (left) H9 (right) Left: Human Pro Player (5P) as black has 8 handicap stones. White (computer) makes life and wins. Right: Human Pro Player (5P) as black has 9 handicap stones and kills everything and wins.
  • 26. CONCLUSIONS 1 player case: UCB for exploration, LCB or MPA for recommendation 2 player case: TEXP3 performs best. Killall-Go Win against pro with H2 in 7x7 Killall-Go as white. Loss against pro with H2 in 7x7 Killall-Go as black. 13x13: Computer won as white with H8, lost with H9. 13x13: Computer lost as black with H8 and with H9. Further work: Structured bandit: some options are close to each other. Batoo: Go with strategic choice for both players; nice test case. Industry: choosing investments for power grid simulations – in progress.