SlideShare ist ein Scribd-Unternehmen logo
1 von 10
BLAZING THE TRAILS BEFORE
BEATING THE PATH:
SAMPLE-EFFICIENT MONTE-
CARLO PLANNING
KATSUKI OHTO
@NIPS2016-YOMI
2017/1/19
INTRODUCED PAPER
• Blazing the trails before beating the path:
Sample - efficient Monte-Carlo planning
(JB. Grill, M. Valko and R. Munos)
• NIPS 2016 accepted paper (poster session)
• Abstract starts with “You are a robot…”
• http://papers.nips.cc/paper/6253-blazing-the-trails-before-
beating-the-path-sample-efficient-monte-carlo-planning
TRAILBLAZER
• Nested-fashion Monte-Carlo Planning Algorithm
• Problem settings:
MDP (contains MAX nodes and AVG nodes)
Actions per each state : Finite
State transition candidates : Finite or Infinite
• Strong theoretical guarantee
MAX
AVG
AIM
• Input : an MDP (Markov Decision Process)
(discount factor 𝛾, maximum number of valid actions 𝐾),
𝜀 (> 0), 𝛿 (0 < 𝛿 < 1)
• Output : estimated value 𝜇 𝜀,𝛿 of current state 𝑠0
• Aim : Get good estimation of real value 𝒱[𝑠0] of current state
such as
ℙ 𝜇 𝜀,𝛿 − 𝒱 𝑠0 > 𝜀 ≤ 𝛿
( ℙ ∙ means probability of ∙ )
with the minimum number of calls to the generative model (state transition function)
1 PLAYER TREE MODEL
IN STOCHASTIC ENVIRONMENT
• Each MAX node means an
opportunity to decide action
• Each AVG node means
stochastic state transition
MAX
AVG
ALGORITHM OVERVIEW
• Global Initialization
set 𝜂, 𝜆 as global value
set 𝑚 as an argument of
root node
• Recursive algorithm
log(𝜂/𝛾)
ALGORITHM OVERVIEW 2
• In both MAX nodes and AVG nodes,
arguments are
𝑚 (desired branching factor)
and
𝜀 (admissible estimation error)
• If 𝑚 is large, we can search many children, but we need much time
(dilemma)
• If 𝜀 is small, we can search deeply, but we need much time (dilemma)
ALGORITHM
FOR AVG NODES
• Input : 𝑚 and 𝜀
• Output : estimated value
• If admissible error 𝜀 is large, ignore
successive reward
• Fill 𝑚 transition samples
(and store immediate reward)
• search all of 𝑚 sampled next states
• return averaged immediate reward +
estimated successive reward
ALGORITHM
FOR MAX NODES
• Input : 𝑚 and 𝜀
• Output : estimated value
• Fill candidate action pool ℒ by all valid actions
• U is a value like standard error of estimation
• Search candidate actions repeatedly until
“Only 1 action left” or “Error might be small”
• If “Error might be small”
then return estimated value of best action
else
search best action 1 more time carefully
SAMPLE COMPLEXITY OF TRAILBLAER
• Sample Complexity is a measure of performance of algorithm
• If N (the number of next states) is finite,
(
1
𝜀
)
max(2,
log 𝑁𝜅
log
1
𝛾
+𝑜 1 )
on condition that 𝜅 ∈ 1, 𝐾 (in detail in
the paper)
else
(
1
𝜀
)2+𝑑
on condition that 𝑑 is a measure of difficulty to identify near-
optimal nodes

Weitere ähnliche Inhalte

Was ist angesagt?

Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
MLconf
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Kmeans initialization
Kmeans initializationKmeans initialization
Kmeans initialization
djempol
 

Was ist angesagt? (20)

0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
Competition winning learning rates
Competition winning learning ratesCompetition winning learning rates
Competition winning learning rates
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
K-Means Clustering Simply
K-Means Clustering SimplyK-Means Clustering Simply
K-Means Clustering Simply
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Opti...
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Kmeans initialization
Kmeans initializationKmeans initialization
Kmeans initialization
 
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
 

Andere mochten auch

時系列データ3
時系列データ3時系列データ3
時系列データ3
graySpace999
 

Andere mochten auch (13)

時系列データ3
時系列データ3時系列データ3
時系列データ3
 
Conditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN DecodersConditional Image Generation with PixelCNN Decoders
Conditional Image Generation with PixelCNN Decoders
 
Interaction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and PhysicsInteraction Networks for Learning about Objects, Relations and Physics
Interaction Networks for Learning about Objects, Relations and Physics
 
Value iteration networks
Value iteration networksValue iteration networks
Value iteration networks
 
Learning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descentLearning to learn by gradient descent by gradient descent
Learning to learn by gradient descent by gradient descent
 
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”Introduction of “Fairness in Learning: Classic and Contextual Bandits”
Introduction of “Fairness in Learning: Classic and Contextual Bandits”
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
論文紹介 Combining Model-Based and Model-Free Updates for Trajectory-Centric Rein...
 
NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics  NIPS 2016 Overview and Deep Learning Topics
NIPS 2016 Overview and Deep Learning Topics
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 

Ähnlich wie Introduction of "TrailBlazer" algorithm

STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
Avay Minni
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
itzik cohen
 
Ga presentation
Ga presentationGa presentation
Ga presentation
ziad zohdy
 

Ähnlich wie Introduction of "TrailBlazer" algorithm (20)

Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Design and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptxDesign and Analysis of Algorithms.pptx
Design and Analysis of Algorithms.pptx
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
 
Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]Data Structures - Lecture 1 [introduction]
Data Structures - Lecture 1 [introduction]
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
EMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptxEMOD_Optimization_Presentation.pptx
EMOD_Optimization_Presentation.pptx
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Ga presentation
Ga presentationGa presentation
Ga presentation
 
Scaling out logistic regression with Spark
Scaling out logistic regression with SparkScaling out logistic regression with Spark
Scaling out logistic regression with Spark
 
Synthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-makingSynthesis of analytical methods data driven decision-making
Synthesis of analytical methods data driven decision-making
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 

Mehr von Katsuki Ohto

Mehr von Katsuki Ohto (8)

論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
論文紹介 Anomaly Detection using One-Class Neural Networks (修正版
 
ゲームAIを学んで1000年生きた話
ゲームAIを学んで1000年生きた話ゲームAIを学んで1000年生きた話
ゲームAIを学んで1000年生きた話
 
Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)
Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)
Tensorflowユーザから見た Alpha(Go)Zero, Ponanza (TFUG #7)
 
論文紹介: Value Prediction Network
論文紹介: Value Prediction Network論文紹介: Value Prediction Network
論文紹介: Value Prediction Network
 
将棋ニューラルネットとこれからのゲームAI
将棋ニューラルネットとこれからのゲームAI将棋ニューラルネットとこれからのゲームAI
将棋ニューラルネットとこれからのゲームAI
 
大富豪に対する機械学習の適用 + α
大富豪に対する機械学習の適用 + α大富豪に対する機械学習の適用 + α
大富豪に対する機械学習の適用 + α
 
論文紹介 : Unifying count based exploration and intrinsic motivation
論文紹介 : Unifying count based exploration and intrinsic motivation論文紹介 : Unifying count based exploration and intrinsic motivation
論文紹介 : Unifying count based exploration and intrinsic motivation
 
カーリングの局面評価関数を学習 WITH “TENSOR FLOW”
カーリングの局面評価関数を学習 WITH “TENSOR FLOW”カーリングの局面評価関数を学習 WITH “TENSOR FLOW”
カーリングの局面評価関数を学習 WITH “TENSOR FLOW”
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Introduction of "TrailBlazer" algorithm

  • 1. BLAZING THE TRAILS BEFORE BEATING THE PATH: SAMPLE-EFFICIENT MONTE- CARLO PLANNING KATSUKI OHTO @NIPS2016-YOMI 2017/1/19
  • 2. INTRODUCED PAPER • Blazing the trails before beating the path: Sample - efficient Monte-Carlo planning (JB. Grill, M. Valko and R. Munos) • NIPS 2016 accepted paper (poster session) • Abstract starts with “You are a robot…” • http://papers.nips.cc/paper/6253-blazing-the-trails-before- beating-the-path-sample-efficient-monte-carlo-planning
  • 3. TRAILBLAZER • Nested-fashion Monte-Carlo Planning Algorithm • Problem settings: MDP (contains MAX nodes and AVG nodes) Actions per each state : Finite State transition candidates : Finite or Infinite • Strong theoretical guarantee MAX AVG
  • 4. AIM • Input : an MDP (Markov Decision Process) (discount factor 𝛾, maximum number of valid actions 𝐾), 𝜀 (> 0), 𝛿 (0 < 𝛿 < 1) • Output : estimated value 𝜇 𝜀,𝛿 of current state 𝑠0 • Aim : Get good estimation of real value 𝒱[𝑠0] of current state such as ℙ 𝜇 𝜀,𝛿 − 𝒱 𝑠0 > 𝜀 ≤ 𝛿 ( ℙ ∙ means probability of ∙ ) with the minimum number of calls to the generative model (state transition function)
  • 5. 1 PLAYER TREE MODEL IN STOCHASTIC ENVIRONMENT • Each MAX node means an opportunity to decide action • Each AVG node means stochastic state transition MAX AVG
  • 6. ALGORITHM OVERVIEW • Global Initialization set 𝜂, 𝜆 as global value set 𝑚 as an argument of root node • Recursive algorithm log(𝜂/𝛾)
  • 7. ALGORITHM OVERVIEW 2 • In both MAX nodes and AVG nodes, arguments are 𝑚 (desired branching factor) and 𝜀 (admissible estimation error) • If 𝑚 is large, we can search many children, but we need much time (dilemma) • If 𝜀 is small, we can search deeply, but we need much time (dilemma)
  • 8. ALGORITHM FOR AVG NODES • Input : 𝑚 and 𝜀 • Output : estimated value • If admissible error 𝜀 is large, ignore successive reward • Fill 𝑚 transition samples (and store immediate reward) • search all of 𝑚 sampled next states • return averaged immediate reward + estimated successive reward
  • 9. ALGORITHM FOR MAX NODES • Input : 𝑚 and 𝜀 • Output : estimated value • Fill candidate action pool ℒ by all valid actions • U is a value like standard error of estimation • Search candidate actions repeatedly until “Only 1 action left” or “Error might be small” • If “Error might be small” then return estimated value of best action else search best action 1 more time carefully
  • 10. SAMPLE COMPLEXITY OF TRAILBLAER • Sample Complexity is a measure of performance of algorithm • If N (the number of next states) is finite, ( 1 𝜀 ) max(2, log 𝑁𝜅 log 1 𝛾 +𝑜 1 ) on condition that 𝜅 ∈ 1, 𝐾 (in detail in the paper) else ( 1 𝜀 )2+𝑑 on condition that 𝑑 is a measure of difficulty to identify near- optimal nodes