2. Outline of MctsAi
A sample fighting game AI implementing UCB applied to trees (UCT)
[1] for the FightingICE platform
A typical Monte-Carlo Tree Search (MCTS) algorithm [2]
[1] Levente Kocsis and Csaba Szepesvari, “Bandit based Monte-Carlo Planning”
[2] R Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search”
3. UCT
Repeat Selection→Expansion→Playout→Backpropagation until
Reaching the predefined maximum time-length or the maximum number of playouts
Use UCB1 value in Selection
Finally select the action associated with the adjacent child node, of the root node,
having maximum number of visits
selection expansion playout backpropagation
4. Upper Confidence Bound (UCB1) [3]
𝑈𝐶𝐵1(𝑖) = 𝑋𝑖 + 𝐶
2𝑙𝑛𝑁𝑖
𝑝
𝑁𝑖
𝑈𝐶𝐵1(𝑖): 𝑈𝐶𝐵1 value of node i
𝑋𝑖: Average evaluation value of node i
𝐶 : Balancing parameter (empirically set to 3 in the sample AI)
𝑁𝑖
𝑝
: Number of visits to the parent node of node 𝑖
𝑁𝑖: Number of visits to node 𝑖
Select a less visited node with a high evaluation value
[3] P Auer and N Cesa-Bianchi and P Fischer, “Finite-time analysis of the multiarmed bandit problem”
5. MctsAi Procedure
1. Expand all adjacent child nodes at once from the root node
2. Repeat an iteration of Selection, Expansion, Playout, and
Backprogation as many times as possible for 16.5ms (<-also
empirically set)
3. Select an action to perform
6. 1 Expansion of all adjacent child nodes
from the root node
Assign a very large random value to non-visited nodes as
their initial UCB1 value
0 0
10002
NaN
0
NaN
0
100109999
NaN
0
ucb1value
avg eval.
value
# of visits
Node :
7. 2.1 Selection
Select nodes with highest UCB1 value all the way down to a leaf node
0
10002
NaN
0
NaN
0
100109999
NaN
0
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
Example 1
Example 2
8. 2.2 Expansion
If a leaf node having 10 visits at the depth level of 1 is reached, then
expand all of its child nodes at once
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4
NaN
0
10030
NaN
0
10028
NaN
0
10020
17
4.42
0.3
3
2.5
10
4.764.07
0.5
4