Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications

SOME TOOLS
FOR ARTIFICIAL INTELLIGENCE
Olivier Teytaud --- olivier.teytaud@gmail.com

NUTN, Tainan, 2011

Tao (Inria, Cnrs, Lri, Paris-Sud)
People:
Permanent staff: 11
~15 ph.D. Students
In Université Paris-Sud
Largest campus in France
Faculty of sciences: mathematics, computer science,
physics, chemistry, biology, earth and space
sciences ==> 12000 students
Inria affiliation:
Around 50 years old
Devoted to research in comp. science

Tao (Inria, Cnrs, Lri, Paris-Sud)

Reservoir computing
Optimal decision making under uncertainty
Optimization
Autonomic computer
Machine learning

Communication not always so easy:

Many of you speak Chinese + Taiwanese.
So English = third language.
I am French.
English = second language.

I work mainly in mathematical aspects
of computer science, more than computer science.

Difficulties might also be an enrichment.

Feel free to interrupt me as much as useful.

NUTN, Tainan, 2011

Vita in a nutshell:

1) First research: mathematical logic

2) I had fun, but I wanted to be “directly” useful. I switched
to Statistics.

3) I had fun, but I wanted to be “more directly” useful. Switched
to Operational Research, in industry.
- Many applications.
- My favorite: electricity generation.

4) Now (40 dangerously approaching), Artificial Intelligence:
- Mathematics.
- Challenges (in particular games).
- Applications.

Vita in a nutshell:


to Statistics.

3) I had fun, but I wanted to be “more directly” useful.
Switched to Operation Research, in industry.

4) Now (40 dangerously approaching), Artificial Intelligence:
- Mathematics. Goes back to military
- Challenges application around world war II,
(in particular games).
- Applications. UK resisted to Hitler thanks
when
to optimized radars.
Now essentially civil applications.

Vita in a nutshell:


to Statistics.

3) I had fun, but I wanted to be “more directly” useful. Switched
to Operational Research, in industry.

4) Now (40 years old soon...), Artificial Intelligence:
- Mathematics.
- Beautiful challenges (in particular games).
- Applications.

Outline of what I'll discuss:

1) Some concepts:
- simplified problems
- toolboxes for these problems

2) Principle:
- reducing real problems to groups of artificial problems
- small problems might be considered as artificial
and useless when considered alone.
- but when you solve a clearly stated small problem, usually
you can find an application for this solution.
- we will see applications as well.

==> For the moment let's see “big” applications

3) I'll also show some works on which contributors are welcome.

ELECTRICITY GENERATION
The case of France

Data:
- climate model (stochastic)
- model of electricity demand (stochastic)
- model of power plants

Each day we receive:
- electricity consumption
- weather information
- info on faults

Each day, we decide how to distribute the production
among the power plants. (also: schedule long-term
investiments)

Data:
- climate model (stochastic)
- model of electricity demand (stochastic)
- model of power plants (PP): nuclear PP (NPP), thermal PP (TPP),
Hydroelectric PP (HPP)...

Each day we receive:
- electricity consumption
- weather information
- info on faults

Each day, we decide how to distribute the production among the power
plants.
Daily information

DATA
(climate, Electric
PROGRAM STRATEGY
plants, system
economy)

Decisions

One of the most important industrial problem you can imagine:
how to produce energy ?

France has specific elements:
- heavily nuclearized (most nuclearized country in the world)
- often cooled by rivers (do not work in case of droughts ==> hard
to predict)
- we must schedule maintenance
- we must take long-term decisions (building new NPP ? Removing ?)
- also hydroelectricity:
- should we use water now ?
- should we keep it for winter (in France, high consumption is in
winter)

Daily information

DATA
(climate, Electric
PROGRAM STRATEGY
plants, system
economy)

Decisions

Problem 1: Taiwan is very different from France :-)
Almost no nuclear power plant ? Cooled by sea ?
Electrically connected to other countries ? (France might
be connected to Africa)
Sun sufficient for massive photo-voltaic units ?
Wind much stronger than in France - can be used ?
Other questions ?
Electriciy consumption dominated by air conditioning ?
Maybe electric cars in the future ?
Climate maybe more regular ? Problem easier than
in France ?

==> I don't know
==> I'd like to work on it (energy is an important
concern, in Taiwan as well – lack of independence ?)
==> Need Chinese-reading persons
==> Other (Taiwan-independent) concern: tackling partial
observation in energy generation problem

GOOD NEWS: we had a
GAME OF GO lot of progress with
**generic** algorithms
(with Nutn) (algorithms which can be
used for many things).

The revolution in Go which
occurred in 2007-2009 is a
major breakthrough in
Artificial Intelligence.

We'll see that in details.

I am a little bit tired of the
game of Go, because I
have no recent progress,
and recent progress in the
community comes from Go
expertise, which is only
useful for Go...

Problem 2: Solving unsolved situations in Go
Now computers are much stronger than in the past.
However, they still
misunderstand some
trivial situations
(in particular,
liberty races).
You have an idea ?
Tell me :-)
We have a solver in
France (not for playing Go;
aimed at provably solving),
that we would
like to test on various
situations. We do not
play Go. If you are 5kyu
or better, you can
contribute.

URBAN RIVALS

17 Millions registered users. Important company.

URBAN RIVALS
- Choose 4 cards, your opponent chooses 4
Cards
- Each player gets 12 “Pilz” (i.e. strength points)
- Each player gets health points.
- Each turn:
- each player chooses a card
- each player uses pilz
(each used pilz is
lost forever, but
it gives strength)
- read cards, apply rules
==> no more health point ?
==> you're dead.

Urban Rivals
==> Partial information
because you don't observe your opponent's decisions

==> There are “on the shell” algorithms and programs
for full information games,
but not for partial information games.

==> We used a (provable) combination of MCTS and EXP3

==> Immediately human level performance

==> suggests that maths can help
==> still possible works:
- automatic choice of cards ?
- reducing comp. cost ?

POKEMONS
皮卡丘
Second most lucrative video
game.

Meta-gaming: choosing your deck.

POKEMONS: Problem 3

Second most lucrative video
game.

Meta-gaming:
choosing your deck.
In-gaming: playing with your set of
cards.

Problem 4: Solving MineSweeper.

Find an optimal
move ?
Looks like a trivial boring problem.
Certainly not indeed.
Many papers with the same approach
(so-called CSP technique)
We could outperform these algorithms thanks to
a probabilistic approach.
But my approach only works on small board (or huge
computational cost) ==> we want to extend.
Quite similar to electricity generation (yes, I believe in this)

Game applications can be considered as childish.
Shouldn't we focus on more important things ?
However:
- If you have a breakthrough in an important game,
people will trust you. Doors will be opened when you
will propose new algorithms for real-world applications.
- Testing ideas on a nuclear power plant is more dangerous
than testing ideas on a game of Go.
- It's easier to compare approaches in games than in
electricity generation.

INTRODUCTION IS OVER.

NOW TECHNICAL STUFF.

REMARKS, QUESTIONS ?

TODAY, GAMES.

1) HOW TO SOLVE THEM

2) C IMPLEMENTATION

ONE FUNDAMENTAL TOOL: ZERMELO

Consider the following game:

- there are 5 sticks;
- in turn, each player removes 1 or 2 sticks;
- the player which removes the last stick looses.

Example:
Player I: IIIII
Player II: III
Player I: I ==> looses!

How should I play ?


Zermelo proposed a solution (for full-information games).

Born in 1871.

1900-1905: major contributions in logic.

1913: major contribution to games in 1913.

1931: Optimized navigation (from games to applications).

Resigned in 1935 (he did not like Hitler).

Died in 1953.


5

LOSS!
4 3
WIN!

WIN! 3 2 2 1
LOSS!
WIN! WIN!
1 2
WIN! LOSS!

ZERMELO: I HAVE
THE OPTIMAL STRATEGY!

5

LOSS!
4 3
WIN!

WIN! 3 2 2 1
LOSS!
WIN! WIN!
1 2
WIN! LOSS!

ZERMELO: not limited to win/loss games.
Can work on games with continuous rewards.
New rule: if the game contains 4, reward is multiplied by 2.

YELLOW NODES: 5 BLUE NODES:
LABEL = MINIMUM 2 LABEL = MAXIMUM
OF CHILDREN's LABELS OF CHILDREN's LABELS

0
4 3
2

2 3 2 2 1
0
2 1
1 2
2 0

ZERMELO: C CODE
struct gameState
{
int *descriptionOfState;
int numberOfLegalMoves;
int * legalMoves;
int turn; // 1 if player 1 plays, -1 otherwise
int result; // final reward, if numberOfLegalMoves=0
};
struct gameState next(struct gameState s,int move) { RULES };
double zermeloValue(struct gameState s)
{
int i;double value;
double maxValue=-MAXDOUBLE;
if (s.numberOfLegalMoves==0) return(s.turn * s.result);
for (i=0;i<s.numberOfLegalMoves;i++)
{
value=s.turn*zermeloValue(next(s,s.legalMoves[i]));
if (value>maxValue) maxValue=value;
}
return s.turn*maxValue; //we return value for player 1
}

ZERMELO: C CODE
struct gameState
{
int *descriptionOfState;
int numberOfLegalMoves;
Int * legalMoves;
int turn; // 1 if player 1 plays, -1 otherwise
int result; // final reward, if numberOfLegalMoves=0
};
struct gameState next(struct gameState s,int move) { RULES };
{
int i;double value;
{
}
}

Last week: Zermelo algorithm.
What is Zermelo ?
= Simplest algorithm for solving 1Player
or 2Player games.
= Recursive algorithm
= Conveniently (but slowly) implemented with “struct”

This week
= a bit more on Zermelo algorithm
= C development: “static” random variables

Future weeks
Still some C implementation (or other languages ? as you wish)
Still some (not always easy) algorithms
Models of applications
I hope I can convince you that
operational research / artificial intelligence
are useful and fun.

Zermelo again.
What does the “zermeloValue()” function returns ?

===> The reward in case of perfect play.
===> A perfect strategy.

===> Gods can run Zermelo algorithms: perfect play.
==> humans have no time for this.
==> Can we design a new version in case
it is too slow ?

Let's see a pseudo-code, instead of a code.

{

if (s is end of game) then return score.
else
{
If (play 1 plays) then
return max(zermeloValue(children))
Else
return min(zermeloValue(children))
}

}

ZERMELO: A NATURAL CONCEPT,
THE DEPTH.

5
(0)

LOSS!
4(1) 3(1)
WIN!

WIN!3(2) 2(2) 2(2) 1(2)
LOSS!
WIN! WIN!
1(3) 2(3)
WIN! LOSS!

ZERMELO: C CODE FOR THE DEPTH

{
static int depth=0;
int i;double value;
depth++;
{
}
depth--;

}

Sometimes it is too slow.
Then, what can I do ?

We will not go
below this depth.

We will not go But, what should
below this depth. zermeloFunction return ?

{
static int depth=0; Should we return
int i;double value; a random number ?
if (depth>5) return drand48();
depth++;
{
}
depth--;

}

{
static int depth=0;
int i;double value;
if (depth>5) return heuristicValue(s);
depth++;
{ A function written
by some expert of
if (value>maxValue) maxValue=value; game.
the
}
depth--;

}

SHANNON and games
This idea is a main contribution
by Shannon (for European chess).

Shannon 1916-2001
Noble prize (not Nobel!)

Works in:
- Logic
- Games (also: artificial
mouse for mazes)
- Financial analysis

double heuristicValue(struct gameState s)
{

if (!strcmp(gameName,”chineseChess”))

{
/******/
Return
0.1*(nbOfBlackElephants(s) – nbOfRedElephants(s) )
+0.1*(nbOfBlackGuards(s) - nbOfWhiteGuards(s) )
+0.03*(nbOfBlackPieces(s) - nbOfWhitePieces(s) )
+0.01*(nbOfBlackPawns(s) - nbOfWhitePawns(s) );
}
else
{ assert(0); }

}

Zermelo's algorithm is too slow.
MINIMAX: an approximation of Zermelo's algo.
Thanks to Wikipedia

ALPHA-BETA

PRINCIPLE OF ALPHA-BETA:
In zermeloFunction, considering a opponent node, if I know:

- THAT AT PREVIOUS DEPTH,
I CAN REACH SCORE ALPHA=6,

- THAT IN CURRENT STATE
MY OPPONENT CAN ENSURE SCORE BETA<6,
I CAN STOP STUDYING THIS BRANCH.

==> THIS IS A “ALPHA-CUTOFF“
==> OTHER PLAYER:
“BETA-CUTOFF“ (just exchange players)

EXAMPLE OF GAME (we can
discuss why it is a good game)

- Randomly generate a 4x4 matrix with 0 and 1 (K=4).
0011
1001
0111
1000
- Player one removes top part or bottom part
0111
1000
- Player two removes left part or right part
01
10
- Player one removes top part of bottom part
01
- Player two removes left part or right part
0 ==> Player one wins if 1, player two wins if 0!

POSSIBLE HOME WORK
1) ZERMELO: can you implement it on a simple game ?

2) MINIMAX: can you add a heuristic function ?
Which heuristic function ?
Experiments: plot a graph:
X(depth) = computation time of minimax
(divided by Zermelo's computation time)
Y(depth) = win rate against Zermelo

3) ALPHA-BETA
Can you modify it ==> alpha-beta pruning ?
Plot a graph for various sizes:
X = number of visited nodes
Y = average winning rate of alpha-beta vs minimax
Or
X = depth
Y = average winning rate of a-b vs a-b with depth -1

APPLICATION OF ZERMELO

WE HAVE SEEN THE 5-STICKS GAME.
CAN WE FIND A REALLY USEFUL APPLICATION ?



I have:
- water



I have:
- water
- plants (which need water during summer's
heat wave)


I have:
- water
heat wave)

Actions = giving water to plants, or not.

I have:
- water
heat wave)
Each day, I choose an action.
State = { date +water level in stock
+ water level in plants }

Reward = quality / quantity of production.

Zermelo ==> optimal sequence of
actions ==> optimal stock level.

IMPORTANT REMARK:

- Maybe this does not look serious.
- But heat waves are a serious problem.
- Here the problem is simplified, but the concepts
for the real application are the same.
- Applying this just requires a computer and
datas/models about plants/water resources.
==> if you can apply Zermelo variants
correctly, you can help for a better world.

However, the “nextState” function is
randomized ==> we need a Zermelo for this
case

s.turn == 0: action is
randomly chosen.
{ This is Zermelo, adapted to
int i;double value; static int depth=0; stochastic games.
If (s.turn==0) References:
{ value=0; - Massé
double total=0; - Bellman
value+=zermeloValue(next(s,s.legalMoves[i]));
return value/s.numberOfLegalMoves;
}
if (depth>5) return heuristicValue(s);
depth++;
{ value=s.turn*zermeloValue(next(s,s.legalMoves[i]));
if (value>maxValue) maxValue=value; }
depth--;
}

ONE MORE TOOL: MATRIX GAMES

The problem:

Solving Matrix Games.

A solution:

EXP3.

What is a (0-sum) Matrix Game ?

Example:

1 0 0
M= 0 1 1
1 0 1

- You choose (privately) a row (i is 1, 2 or 3).
- In same time, I choose (privately) a column (j=1, 2 or 3).
- My reward: M(i,j)
- Your reward: -M(i,j)

I want a 1, you want a 0.
Given M, how should I play ?

What is a (0-sum) Matrix Game ?

Example: rock-paper-scissor
Rock Paper Scissor
Rock 0 -1 1
M= Paper 1 0 -1
Scissor -1 1 0

- You choose (privately) a row (i is 1, 2 or 3).
- In same time, I choose (privately) a column (j=1, 2 or 3).
- My reward: M(i,j)
- Your reward: -M(i,j)

I want a 1, you want a 0.


Nash (diagnosed with paranoid schizophrenia)
got a Nobel prize for his work around that.

Principle of a Nash equilibrium:
- pure strategy = “fixed” strategy
(e.g. “play scissor”)
- mixed strategy = randomized strategy
(e.g. “play scissor with probability ½
and play rock with probability ½”
- choose the mixed strategy such that

“The worst possible score against
any opponent strategy is maximum”

==> “Nash” strategy
==> EXP3: algorithm for finding Nash strategies.

IMPORTANT FACTS ON GAMES:

- Turn-based, full-information games,
solvers exist:
- Too slow for chess, Go.
- Ok for 8x8 checkers.
==> Zermelo
==> variants: Minimax, Alpha-beta, play
reasonably well many games

- Matrix games:
- Nash strategies = wort-case optimal
- Nash strategies = randomized strategies

A BETTER EXAMPLE ? POKEMON.

Each player chooses 2 pokemons among
the 3 possible ones (real life: 3 or 4
among hundreds).


Three possibilities:


Three possibilities (the same as choosing
a row in a 3x3 matrix
game):
Player 2

Player 1 Check who
wins (by some
full-observation
game-solver).


game):
Player 2

Player 1
P1 P2 P2

P2 P1 P1

P1 P2 P1


game):
Player 2

Player 1
1 0 0

0 1 1

1 0 1

EXP3 principle for Nash equilibrium of KxK matrix M:
- choose a number N of iterations
- S1=null vector
- S2=null vector
- at each iteration t=1, ..., t=N:
{
- compute p1 as a function of S1 // we will see how
- compute p2 as a function of S2 // we will see how
- randomly draw i according to probability distribution p1
- randomly draw j according to probability distribution p2
- define r=M(i,j) in the matrix

- S1(i)+= r / p1(i)
- S2(j)+=(1-r) / p2(j)

- Player1Nash(i)+= (1/N);
- Player2Nash(j)+= (1/N);
}

Q&A: (my questions, and also yours)

Q: Who cares about matrix games ?
A: Useful for many things. Unfortunately, it's usually
a building block inside more complex algorithms.
We will see examples, but later.

Q: Is a Nash strategy optimal ?
A: It depends for what... It is optimal in a worst case sense
(i.e. against a very strong opponent).
Not necessarily very good against a weak opponent.

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications

Ähnlich wie Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Tools for artificial intelligence: EXP3, Zermelo algorithm, Alpha-Beta, and some applications