SlideShare a Scribd company logo
1 of 115
Hard optimization
     (multimodal, adversarial,
                             noisy)
                 http://www.lri.fr/~teytaud/hardopt.odp
                 http://www.lri.fr/~teytaud/hardopt.pdf
                        (or Quentin's web page)


     Acknowledgments: koalas, dinosaurs, mathematicians.

Olivier Teytaud      olivier.teytaud@inria.fr
Knowledge also from A. Auger, M. Schoenauer.
The next slide is the most important
                                      of all.




Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
In case of trouble,
                               Interrupt me.

Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
In case of trouble,
                                    Interrupt me.

         Further discussion needed:
            - R82A, Montefiore institute
            - olivier.teytaud@inria.fr
            - or after the lessons (the 25          th
                                                         , not the 18th)
Olivier Teytaud
Inria Tao, en visite dans la belle ville de Liège
Rather than complex algorithms,
we'll see some concepts and some simple
building blocks for defining algorithms.    Robust optimization
                                              Noisy optimization
 Warm starts                                     Nash equilibria
 Coevolution                                  Surrogate models
 Niching                                   Maximum uncertainty
 Clearing                                  Monte-Carlo estimate
 Multimodal optimization                     Quasi-Monte-Carlo
 Restarts (sequential / parallel)               Van Der Corput
 Koalas and Eucalyptus                                   Halton
 Dinosaurs (and other mass extinction)              Scrambling
 Fictitious Play                                           EGO
 EXP3
I. Multimodal optimization
II. Adversarial / robust cases
III. Noisy cases
e.g. Schwefel's function




(thanks to
Irafm)
What is multimodal optimization ?


- finding one global optimum ?
   (not getting stuck in a local minimum)

- finding all local minima ?
- finding all global minima ?
Classical methodologies:

1. Restarts

2. ``Real'' diversity mechanisms
Restarts:


While (time left >0)
{
    run your favorite algorithm
}


(requires halting criterion + randomness – why ?)
Parallel restarts


Concurrently, p times:
{
     run your favorite algorithm
}
    ==> still needs randomization
Parallel restarts


Concurrently, p times:
{
    run your favorite algorithm
}
    ==> needs randomization
     (at least in the initialization)
Random restarts   Quasi-random restarts
Let's open a parenthesis...




              (
The Van Der Corput sequence:
quasi-random numbers in dimension 1.
How should I write the nth point ?
(p=integer, parameter of the algorithm)
A (simple) Scrambled
   Van Der Corput sequence:
Halton sequence:

= generalization to dimension d


Choose p1,...,pd




All pi's must be different (why ?).


Typically, the first prime numbers (why ?).
Scrambled Halton sequence:
= generalization to dimension d with scrambling


Choose p1,...,pd




All pi's must be different (why ?).


Typically, the first prime numbers (why ?).
Let's close the parenthesis...




(thank you Asterix)
                      )
Consider an ES (an evolution strategy = an
    evolutionary algorithm in continuous domains).

The restarts are parametrized by
   an initial point x and an initial step-size .

Which hypothesis on x and  do we need, so that the
   restarts of the ES will find all local optima ?
X*(f)
                                              = set of optima




If I start close enough to an optimum, with
     small enough step-size, I find it.
Restart (RS) algorithm:
What do you suggest ?
Accumulation
of a sequence
THEOREM:
THEOREM:




     ( (1) ==> (2); do you see why ?)
Ok, step-sizes should accumulate around 0,
and the xi's should be dense.
X(i) = uniform random = ok
   and                               Or something decreasing;
                                        But not a constant.
sigma(i) = exp( Gaussian )
  are ok.                                    (Why ?)



Should I do something more sophisticated ?
I want a small number of restarts.

A criterion: dispersion (some maths
       should be developped around why...).
for random points


==> small difference
==> the key point is more the decreasing (or small) step-size
Classical methodologies:

1. Restarts

2. ``Real'' diversity mechanisms
7
~ -7x10 years ago




                       Highly optimized.
                    Hundreds of millions of
                        years on earth.
7
     ~ -7x10 years ago

       Died!
   Not adapted to
the new environment.
7
~ -7x10 years ago



                    Survived thanks
                      to niching.
Eating Eucalyptus
                                             is dangerous for
                                health (for many animals).
                                           But not for koalas.


                                             Also they sleep
                                             most of the day.
                                     They're original animals.



==> This is an example of Niching.
==> Niching is good for diversity.
USUAL SCHEME: WINNERS KILL LOOSERS


      POPULATION




     GOOD    BAD


            GOOD's
     GOOD
            BABIES
NEW SCHEME: WINNERS KILL LOOSERS
                              ONLY
      POPULATION              IF THEY
                              LOOK
                              LIKE

     GOOD    BAD              THEM



            GOOD's
     GOOD
            BABIES
NEW SCHEME: WINNERS KILL LOOSERS
                                   ONLY
      POPULATION                   IF THEY
                                   LOOK
                                   LIKE

     GOOD    BAD                   THEM


                              Don't kill koalas;
            GOOD's              their food is
     GOOD                      useless to you.
            BABIES         They can be useful in a
                                future world.
CLEARING ALGORITHM


Input: population              Output: smaller population


1) Sort the individuals (the best first)
2) Loop on individuals; for each of them (if alive)
    kill individuals which are
            (i) worse
            (ii) at distance < some constant

             (...but the  first murders are sometimes canceled)

3) Keep at most  individuals (the best)
I. Multimodal optimization
II. Adversarial / robust cases
  a) What does it mean ?
  b) Nash's stuff: computation
  c) Robustness stuff: computation
III. Noisy cases
What is adversarial / robust optimization ?


I want to find argmin f.
But I don't know exactly f
I just know that f is in a family of functions
parameterized by .
I can just compute f(x,), for a given .
What is adversarial / robust optimization ?
I want to find argmin f.
But I don't know exactly f
I just know that f is in a family of functions parameterized by .
I can just compute f(x,), for a given .


Criteria:
      Argmin E f(x,).                      (average case; requires a
          X     .                          probability distribution)
Or
      Argmin sup f(x,).             <== this is adversarial / robust.
          X       
What is adversarial / robust optimization ?
I want to find argmin f.
But I don't know exactly f
                                                   We'll see
                                             this case later.
I just know that f is in a family of functions parameterized by .
I can just compute f(x,), for a given .


Criteria:
      Argmin E f(x,).                      (average case; requires a
          X     .                          probability distribution)
Or
      Argmin sup f(x,).             <== this is adversarial / robust.
          X       
                                                    Let's see this.
==> looks like a game; two opponents choosing their strategies.




                              VS


           x                                           y(=)
                    Can he try multiple times ?
                 Does he choose his strategy first ?
Let's have a high-level look at all this stuff.

We have seen two cases:

inf sup f(x,y)           (best power plant for
x     y                       worst earthquake ?)

inf E f(x,y)              (best power plant for
x y                           average earthquake)

Does it really cover all possible models ?
Let's have a high-level look at all this stuff.

We have seen two cases:          A bit pessimistic.
                          This is the performance if y is
inf sup f(x,y)            chosen by an adversary who:

x     y             - knows us
                    - and has infinite computational resources

inf E f(x,y)
x y

Does it really cover all cases ?
Let's have a high-level look at all this stuff.

We have seen two cases:          A bit pessimistic.
                           This is the performance if y is
inf sup f(x,y)             chosen by an adversary who:

x     y              - knows us (or can do many trials)
                     - and has infinite computational resources

inf E f(x,y)
x y                                      Good for nuclear
                                           power plants.
Does it really cover all cases ?
                                      Not good for economy,
                                             games...
Let's have a high-level look at all this stuff.

We have seen two cases:          A bit pessimistic.
                          This is the performance if y is
inf sup f(x,y)            chosen by an adversary who:

x     y             - knows us
                    - and has infinite computational resources

inf E f(x,y)
x y                                      Good for nuclear
                                           power plants.
Does it really cover all cases ?
                                      Not good for economy,
                                             games...
Decision = positionning my distribution points.
Reward = size of Voronoi cells.




           Newspapers distribution,
           or Quick vs MacDonalds
          (if customers ==> nearest)
Decision = positionning my distribution points.
Reward = size of Voronoi cells.


Simple model:
 Each company makes his positionning
 am morning for the whole day.




           Newspapers distribution,
           or Quick vs MacDonalds
          (if customers ==> nearest)
Decision = positionning my distribution points.
Reward = size of Voronoi cells.


Simple model:
 Each company makes his positionning
 am morning for the whole day.


But choosing among strategies
       (i.e. decision rules, programs)
         leads to the same maths.




           Newspapers distribution,
           or Quick vs MacDonalds
          (if customers ==> nearest)
Ok for power plants and earthquakes.
Inf sup f(x,y)               But for economical choices ?
x y

Replaced by sup inf f(x,y)      ?
                 y   x
Inf sup f(x,y)
x y

Replaced by sup inf f(x,y)   ?
                 y   x

Means that we play second.


==> let's see “real” uncertainty techniques.
inf sup f(x,y) is not equal to   sup inf f(x,y)
x y                               y x
...but with finite domains:
inf   sup E f(x,y) is equal to   sup inf f(x,y)
L(x) L(y)                        L(y) L(x)


                                         Von
                                     Neumann
inf sup f(x,y) is not equal to   sup inf f(x,y)
x y                               y x


E.g.: Rock-paper-scissors


  Inf sup = I win
  Sup inf = I loose
And sup inf sup inf sup inf... ==> loop


But equality in
  Chess, draughts, … (turn-based, full-information)
inf sup f(x,y) is not equal to     sup inf f(x,y)
x y                                  y x
...but with finite domains:
inf    sup E f(x,y) is equal to     sup inf f(x,y)
L(x) L(y)                           L(y) L(x)
      ...and is equal to inf sup E f(x,y)
                          x L(y)
==> The opponent chooses a randomized strategy
      without knowing what we choose.
inf sup f(x,y) is not equal to     sup inf f(x,y)
x y                                  y x
...but with finite domains:
inf    sup E f(x,y) is equal to     sup inf f(x,y)
L(x) L(y)                           L(y) L(x)
      ...and is equal to inf sup E f(x,y)
                          x L(y)
==> The opponent chooses a randomized strategy
      without knowing what we choose.
==> Good model for average performance against
      non-informed opponents.
Let's summarize.
Nash: best distribution against worst
   distribution = good model if opponent can't
   know us, if criterion = average perf.
Best against worst: good in many games, and
  good for robust industrial design
Be creative: inf sup E is often reasonnable
      (e.g. sup on opponent, E on climate...)
Let's summarize.
Nash: best distribution against worst
   distribution = good model if opponent can't
   know us, if criterion = average perf.
Best against worst: good in many games, and
 Nash:
  good for robust industrial
  Adversarial bidding (economy).   design
  Choices on markets.
Be creative: inf sup E is often reasonnable
  Military strategy.
     (e.g.
  Games.      sup on opponent, E on climate...)
Let's summarize.
Nash: best distribution against worst
     distribution = good model if opponent can't
       know us, if criterion = average perf.
Best against worst: good in many games, and
    good for robust industrial design
 In repeated games, you can
(try to)
Be creative: inf sup E is often reasonnable
predict your opponent's decisions.
           (e.g. sup on opponent, E on climate...)
E.g.: Rock Paper Scissors.
Let's summarize.
Nash: best distribution against worst
    distribution = good model if opponent can't
     know us, if criterion = average perf.
Best against worst:“worst-case” analysis:
      Remarks on this good in many games, and
   good foryou're stronger (e.g. computer against
        - if
             robust industrial design
              humans in chess), take into account human's weakness
Be creative: inf supearnis often reasonnable
       - in Poker, you will E more money by playing against weak
              Opponents
        (e.g. sup on opponent, E on climate...)
inf sup f(x,y) is not equal to     sup inf f(x,y)
x y                                  y x
...but with finite domains:
inf    sup E f(x,y) is equal to     sup inf f(x,y)
L(x) L(y)                           L(y) L(x)
      ...and is equal to inf sup E f(x,y)
                          x L(y)


                              L(x),L(y) is a
                           Nash equilibrium;
                  L(x), L(y) not necessarily unique.
I. Multimodal optimization
II. Adversarial / robust cases
  a) What does it mean ?
  b) Nash's stuff: computation
   - fictitious play
   - EXP3
   - coevolution

  c) Robustness stuff: computation
III. Noisy cases
FIRST CASE: everything finite (   x and  are in finite domains).

==> Fictitious Play (Brown, Robinson)



                                                  Simple,
                                                 consistant,
                                                  but slow.
                                               (see however
                                                 Kummer's
                                                   paper)
FIRST CASE: everything finite (     x and  are in finite domains).

==> Fictitious Play (Brown, Robinson)


==> sequence of simulated games as follows:
  (x(n),y(n)) defined as follows:
                                                    Simple,
                                                   consistant,
x(n+1) = argmin sum       f(x, y(i))                but slow.
            x     i<n+1                          (see however
y(n+1) = argmax sum f(x(i), y)
                                                   Kummer's
            x     i<n+1
                                                     paper)
FIRST CASE: everything finite (     x and  are in finite domains).

==> Fictitious Play (Brown, Robinson)


==> sequence of simulated games as follows:
  (x(n),y(n)) defined as follows:
                                                    Simple,
                                                   consistant,
x(n+1) = argmin sum       f(x, y(i))                but slow.
            x     i<n+1                          (see however
y(n+1) = argmax sum f(x(i), y)
                                                    Kummer's
            x     i<n+1
                                                     paper)

==> … until n=N; play randomly one of the x(i)'s.
Much faster: EXP3,   or INF (more complicated, see
Audibert)
Complexity of EXP3 within logarithmic terms:
   K / epsilon2
for reaching precision   epsilon.
Complexity of EXP3 within logarithmic terms:
   K / epsilon2
for reaching precision   epsilon.


==> we don't even read all the matrice of
                                2
         the f(x,y) ! (of size K )
Complexity of EXP3 within logarithmic terms:
   K / epsilon2
for reaching precision   epsilon.


==> we don't even read all the matrice of
                                2
         the f(x,y) ! (of size K )
==> provably better than all deterministic
       algorithms
Complexity of EXP3 within logarithmic terms:
                 2
   K / epsilon
for reaching precision   epsilon.


==> we don't even read all the matrice of
                                2
         the f(x,y) ! (of size K )
==> provably better than all deterministic
       algorithms
==> requires randomness + RAM
Complexity of EXP3 within logarithmic terms:
    K / epsilon2
for reaching precision     epsilon.


Ok, it's great, but complexity K is still far too big.


For Chess, log10(K) l 43.
For the game of Go, log10(K) l 170.
Coevolution for K huge:


       Population 1   Population 2
Coevolution for K huge:


       Population 1   Population 2




                          (play games)
Coevolution for K huge:


       Population 1   Population 2




                              Fitness = average
                              fitness against
                                opponents
Coevolution for K huge:


       Population 1   Population 2




                               Fitness = average
            Pop. 1    Pop. 2   fitness against
                                 opponents
Coevolution for K huge:


       Population 1   Population 2




                               Fitness = average
+ babies    Pop. 1    Pop. 2   fitness against
                                 opponents
       Population 1   Population 2
RED QUEEN EFFECT in finite population:
(FP keeps all the memory, not coevolution)
 1.   Rock     vs    Scissor
 2.   Rock     vs    Paper
 3.   Scissor vs     Paper
 4.   Scissor vs     Rock
 5.   Paper    vs    Rock
 6.   Paper    vs    Scissor
 7.   Rock      vs   Scissor     =1
RED QUEEN EFFECT:


 1.   Rock           vs   Scissor
 2.   Rock           vs   Paper
              ....
 6.   Paper          vs   Scissor
 7.   Rock           vs   Scissor   =1


==> Population with archive (e.g.
        the best of each past iteration – looks like FP)
Reasonnably good in very hard cases
but plenty of more specialized algorithms for
various cases:
     - alpha-beta
     - (fitted)-Q-learning
     - TD(lambda)
     - MCTS
     - ...
I. Multimodal optimization
II. Adversarial / robust cases
  a) What does it mean ?
  b) Nash's stuff: computation
  c) Robustness stuff: computation
III. Noisy cases
c) Robustness stuff: computation


==> we look for argmin sup f(x,y)
                    x    y
==> classical optimization for x
==> classical optimization for y with
      restart
Naive solution for robust optimization



Procedure Fitness(x)

  y=maximize f(x,.)
  Return f(x,y)


Mimimize fitness
Warm start for robust optimization
  ==> much faster

Procedure Fitness(x)

  Static y=y0;
  y = maximize f(x,y)
  Return f(x,y)


Mimimize fitness
Population-based warm start for
robust optimization

Procedure Fitness(x)

  Static P=P0;
  P = multimodal-maximize f(x,P)
  Return max f(x,y)
           y in P


Mimimize fitness
Population-based warm start for
    robust optimization

  Procedure Fitness(x)
  f(x,y) = sin( y ) + cos((x+y) / pi)      x,y in [-10,10]
       Static P = multimodal-maximize f(x,P)
==> global maxima (in y) moves quickly (==> looks hard)
       Return max f(x,y)
==> but local maxima (in y) move slowly as a function of x
                y in P
==> warm start very efficient



  Mimimize fitness
Population-based warm start for
     robust optimization

    Procedure + cos((x+y) / pi)
   f(x,y) = sin( y )
                     Fitness(x)               x,y in [-10,10]

        Static P = multimodal-maximize f(x,P)
 ==> global maxima (in y) moves quickly (==> looks hard)
       Return max f(x,y)
 ==> but local maxima (in y) move slowly as a function of x
 ==> warm start very efficient
                  y in P

==> interestingly, we just rediscovered coevolution :-)
    Mimimize fitness
Plots of f(x,.)
for various values of x.
Coevolution for Nash:


         Population 1      Population 2




Fitness = average                    Fitness = average
fitness against                      fitness against
                  Pop. 1   Pop. 2
  opponents                               opponents
Coevolution for inf sup:


         Population 1       Population 2




Fitness = worst                       Fitness = average
fitness against                       fitness against
                  Pop. 1    Pop. 2
  opponents                                opponents
I. Multimodal optimization
II. Adversarial / robust cases
III. Noisy cases
Arg min E f(x,y)
   x    y


1) Monte-Carlo estimates
2) Races: choosing the precision
3) Using Machine Learning
Monte-Carlo estimates

E f(x,y) = 1/n sum f(x, yi) + error

Error scaling as 1/squareRoot( n )


==> how to be faster ?
==> how to choose n ? (later)
Faster Monte-Carlo evaluation:


- more points in critical areas (prop. to
   standard deviation), (+rescaling!)
- evaluating f-g, where g has known
      expectation and is “close” to f.
- evaluating (f + f o h)/2
     where h is some symmetry
Faster Monte-Carlo evaluation:


- more points in critical areas (prop. to
   standard deviation), (+rescaling!)
- evaluating f-g, where g has known
      expectation and is “close” to f.
                    Importance sampling
- evaluating (f + f o h)/2
     where h is some symmetry
Faster Monte-Carlo evaluation:


- more points in criticalControl variable to
                          areas (prop.
   standard deviation), (+rescaling!)
- evaluating f-g, where g has known
      expectation and is “close” to f.
- evaluating (f + f o h)/2
     where h is some symmetry
Faster Monte-Carlo evaluation:


- more points in critical areas (prop. to
   standard deviation), (+rescaling!)
- evaluating f-g, where g has variables
                      Antithetic known
      expectation and is “close” to f.
- evaluating (f + f o h)/2
     where h is some symmetry
And quasi-Monte-Carlo evaluation ?

        d
log(n) / n           instead of      std / sqrt(n)


==> much better for n really large (if some smoothness)
==> much worse for d large
Arg min E f(x,y)


1) Monte-Carlo estimates
2) Races: choosing the precision
3) Using Machine Learning
Bernstein's inequality (see Audibert et al):




Where:
             R = range of the Xi's
Comparison-based optimization with noise:
       how to find the best individuals by a race ?


While (I want to go on)
{
  I evaluate once each arm.
  I compute a lower and upper bound for all
       non-discarded individuals
            (by Bernstein's bound).
  I discard individuals which are excluded
             by the bounds.
}
Trouble:
  What if two individuals have the same fitness value ?


   ==> infinite run !


Tricks for avoiding that at iteration N:
  - limiting number of evals by some function of N
           (e.g. exp(N) if a log-linear convergence is
           expected w.r.t number of iterations)
  - limiting precision by some function of sigma
           (e.g. precision = sqrt(sigma) / log(1/sigma) )
Arg min E f(x,y)


1) Monte-Carlo estimates
2) Races: choosing the precision
3) Using Machine Learning
USING MACHINE LEARNING




Statistical tools: f ' (x) = approximation
                             ( x, x1,f(x1), x2,f(x2), … , xn,f(xn))
                y(n+1) = f ' (x(n+1) )


        e.g. f' = quadratic function closest to f on the x(i)'s.
and x(n+1) = random         or     x(n+1) = maxUncertainty
  or x(n+1) = argmin Ef or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
USING MACHINE LEARNING SVM,
                  Kriging, RBF,
                             Gaussian Processes,
                          Or f'(x) = maximum likelihood
                          on a parametric noisy model


Statistical tools: f ' (x) = parametric approximation
                                ( x, x1,f(x1), x2,f(x2), … , xn,f(xn))
                 y(n+1) = f ' (x(n+1) )


 e.g. f' = logistic quadratic function closest to f on the x(i)'s.
and x(n+1) = random           or      x(n+1) = maxUncertaintyRed
  or x(n+1) = argmin f'   or x(n+1)=argmax E max(0,bestSoFar - f(y))
USING MACHINE LEARNING




                                    Very fast if good model
Statistical tools: f ' (x) = approximation model + noise)
                               (e.g. quadratic
       Expensive but efficient: x, x1,f(x1), x2,f(x2), … , xn,f(xn))
                              (
           Efficient Global f ' (x(n+1) )
                  y(n+1) =
             Optimization
         e.g. f' = quadratic function closest to f on the x(i)'s.
        in multimodal context
and x(n+1) = random           or      x(n+1) = maxUncertaintyRed
  or x(n+1) = argmin f'   or x(n+1)=argmax E max(0,bestSoFar - f(y))
USING MACHINE LEARNING




                                    Very fast if good model
Statistical tools: f ' (x) = approximation model + noise)
                               (e.g. quadratic
                               ( x, x1,f(x1), x2,f(x2), … , xn,f(xn))
             Alsoy(n+1) = f ' (x(n+1) )
                  termed
      “expected improvement”
         e.g. f' = quadratic function closest to f on the x(i)'s.
and x(n+1) = random           or     x(n+1) = maxUncertaintyRed
  or x(n+1) = argmin f'   or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
So some tools for noisy optimization:
 1. introducing races + tricks
 2. using models equipped
      with uncertainty
 (2. more difficult, but
  sometimes much more efficient)
Deng et al: application to DIRECT
A rectangle j or size alpha(j) is splitted if for some K:
A rectangle j or size alpha(j) is splitted if for some K:
Warm starts                              Robust optimization
Coevolution                                Noisy optimization

Niching                                       Nash equilibria
                                           Surrogate models
Clearing
                                        Maximum uncertainty
Multimodal optimization
                                        Monte-Carlo estimate
Restarts (sequential / parallel)          Quasi-Monte-Carlo
Koalas and Eucalyptus                        Van Der Corput
Dinosaurs (and other mass extinction)                 Halton
Fictitious Play                                  Scrambling

EXP3                            Antithetic variables, control

EGO                         variables, importance sampling.
EXERCISES (to be done at home, deadline given soon):

1. Implement a (1+1)-ES and test it on
  the sphere function f(x)=||x||^2 in
  dimension 3. Plot the convergence.

 Nb: choose a scale so that you can check
  the convergence rate conveniently.

2. Implement a pattern search method (PSM)
 and test it on the sphere function, also in
 dimension 3 (choose the same scale as above).
3. We will now consider a varying dimension;
  - x(n,d) will be the nth visited point in dimension d for the (1+1)-ES, and
  - x'(n,d) will be the nth visited point in dimension d for the PSM.
  For both implementations, choose e.g. n=5000, and plot x(n,d) and
 x'(n,d) as a function of d. Choose a convenient scaling as a function of d
  so that you can see the effect of the dimension.


4. What happens if you consider a quadratic positive definite
    ill-conditionned function instead of the sphere function ?
  E.g.: f(x) = sum of 10^i xi^2
     = 10x(1)^2 + 100x(2)^2 + 1000x(3)^2 in dimension 3; test in
       Dimension 50.


5. (without implementing) Suggest a solution for the problem in Ex. 4.

More Related Content

Viewers also liked

Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systemsOlivier Teytaud
 
Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Olivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
 
Computers and Killall-Go
Computers and Killall-GoComputers and Killall-Go
Computers and Killall-GoOlivier Teytaud
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchOlivier Teytaud
 
Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Olivier Teytaud
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsOlivier Teytaud
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial informationOlivier Teytaud
 

Viewers also liked (11)

Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012Energy Management Forum, Tainan 2012
Energy Management Forum, Tainan 2012
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Statistics 101
Statistics 101Statistics 101
Statistics 101
 
Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for Minesweeper
 
3slides
3slides3slides
3slides
 
Computers and Killall-Go
Computers and Killall-GoComputers and Killall-Go
Computers and Killall-Go
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree Search
 
Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...Introduction to the TAO Uct Sig, a team working on computational intelligence...
Introduction to the TAO Uct Sig, a team working on computational intelligence...
 
Dynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systemsDynamic Optimization without Markov Assumptions: application to power systems
Dynamic Optimization without Markov Assumptions: application to power systems
 
Complexity of planning and games with partial information
Complexity of planning and games with partial informationComplexity of planning and games with partial information
Complexity of planning and games with partial information
 

Similar to Here are a few other cases that may be relevant for adversarial/robust optimization:- inf Esup f(x,y): Average over worst-case scenarios. Less pessimistic than pure min-max.- inf max f(x,y): Find the strategy that performs best against the single best response by the adversary. - Game-theoretic solutions like Nash equilibria: Strategies where neither player can benefit by changing their strategy alone, given the other's strategy. - Bayesian games: Players have probabilistic beliefs over each other's types/strategies and best respond based on those beliefs. - Limited lookahead: Adversary can only consider a bounded number of

Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionXin-She Yang
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisXin-She Yang
 
Surviving and Thriving in Technical Interviews
Surviving and Thriving in Technical InterviewsSurviving and Thriving in Technical Interviews
Surviving and Thriving in Technical InterviewsJeremy Lindblom
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity lbienven
 
Undecidability in partially observable deterministic games
Undecidability in partially observable deterministic gamesUndecidability in partially observable deterministic games
Undecidability in partially observable deterministic gamesOlivier Teytaud
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Xin-She Yang
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...inductible
 
Presentation
PresentationPresentation
Presentationvk3454
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
Derivative Free Optimization
Derivative Free OptimizationDerivative Free Optimization
Derivative Free OptimizationOlivier Teytaud
 
Section 11: Normal Subgroups
Section 11: Normal SubgroupsSection 11: Normal Subgroups
Section 11: Normal SubgroupsKevin Johnson
 
STAT: Random experiments(2)
STAT: Random experiments(2)STAT: Random experiments(2)
STAT: Random experiments(2)Tuenti SiIx
 
Cuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsCuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsMustafa Salam
 

Similar to Here are a few other cases that may be relevant for adversarial/robust optimization:- inf Esup f(x,y): Average over worst-case scenarios. Less pessimistic than pure min-max.- inf max f(x,y): Find the strategy that performs best against the single best response by the adversary. - Game-theoretic solutions like Nash equilibria: Strategies where neither player can benefit by changing their strategy alone, given the other's strategy. - Bayesian games: Players have probabilistic beliefs over each other's types/strategies and best respond based on those beliefs. - Limited lookahead: Adversary can only consider a bounded number of (20)

Cuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An IntroductionCuckoo Search Algorithm: An Introduction
Cuckoo Search Algorithm: An Introduction
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Metaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical AnalysisMetaheuristic Algorithms: A Critical Analysis
Metaheuristic Algorithms: A Critical Analysis
 
Surviving and Thriving in Technical Interviews
Surviving and Thriving in Technical InterviewsSurviving and Thriving in Technical Interviews
Surviving and Thriving in Technical Interviews
 
The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity The axiomatic power of Kolmogorov complexity
The axiomatic power of Kolmogorov complexity
 
Undecidability in partially observable deterministic games
Undecidability in partially observable deterministic gamesUndecidability in partially observable deterministic games
Undecidability in partially observable deterministic games
 
Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms Nature-Inspired Optimization Algorithms
Nature-Inspired Optimization Algorithms
 
artficial intelligence
artficial intelligenceartficial intelligence
artficial intelligence
 
Standrewstalk
StandrewstalkStandrewstalk
Standrewstalk
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...
Genetic Algorithms and Particle Swarm Optimisation @ Flash Brighton, 28th Jul...
 
Presentation
PresentationPresentation
Presentation
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Derivative Free Optimization
Derivative Free OptimizationDerivative Free Optimization
Derivative Free Optimization
 
Section 11: Normal Subgroups
Section 11: Normal SubgroupsSection 11: Normal Subgroups
Section 11: Normal Subgroups
 
cs-171-05-LocalSearch.pptx
cs-171-05-LocalSearch.pptxcs-171-05-LocalSearch.pptx
cs-171-05-LocalSearch.pptx
 
STAT: Random experiments(2)
STAT: Random experiments(2)STAT: Random experiments(2)
STAT: Random experiments(2)
 
Module1, probablity
Module1, probablityModule1, probablity
Module1, probablity
 
Cuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly AlgorithmsCuckoo Search & Firefly Algorithms
Cuckoo Search & Firefly Algorithms
 
Genetic algorithms
Genetic algorithms Genetic algorithms
Genetic algorithms
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Here are a few other cases that may be relevant for adversarial/robust optimization:- inf Esup f(x,y): Average over worst-case scenarios. Less pessimistic than pure min-max.- inf max f(x,y): Find the strategy that performs best against the single best response by the adversary. - Game-theoretic solutions like Nash equilibria: Strategies where neither player can benefit by changing their strategy alone, given the other's strategy. - Bayesian games: Players have probabilistic beliefs over each other's types/strategies and best respond based on those beliefs. - Limited lookahead: Adversary can only consider a bounded number of

  • 1. Hard optimization (multimodal, adversarial, noisy) http://www.lri.fr/~teytaud/hardopt.odp http://www.lri.fr/~teytaud/hardopt.pdf (or Quentin's web page) Acknowledgments: koalas, dinosaurs, mathematicians. Olivier Teytaud olivier.teytaud@inria.fr Knowledge also from A. Auger, M. Schoenauer.
  • 2. The next slide is the most important of all. Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 3. In case of trouble, Interrupt me. Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 4. In case of trouble, Interrupt me. Further discussion needed: - R82A, Montefiore institute - olivier.teytaud@inria.fr - or after the lessons (the 25 th , not the 18th) Olivier Teytaud Inria Tao, en visite dans la belle ville de Liège
  • 5. Rather than complex algorithms, we'll see some concepts and some simple building blocks for defining algorithms. Robust optimization Noisy optimization Warm starts Nash equilibria Coevolution Surrogate models Niching Maximum uncertainty Clearing Monte-Carlo estimate Multimodal optimization Quasi-Monte-Carlo Restarts (sequential / parallel) Van Der Corput Koalas and Eucalyptus Halton Dinosaurs (and other mass extinction) Scrambling Fictitious Play EGO EXP3
  • 6. I. Multimodal optimization II. Adversarial / robust cases III. Noisy cases
  • 8. What is multimodal optimization ? - finding one global optimum ? (not getting stuck in a local minimum) - finding all local minima ? - finding all global minima ?
  • 9. Classical methodologies: 1. Restarts 2. ``Real'' diversity mechanisms
  • 10. Restarts: While (time left >0) { run your favorite algorithm } (requires halting criterion + randomness – why ?)
  • 11. Parallel restarts Concurrently, p times: { run your favorite algorithm } ==> still needs randomization
  • 12. Parallel restarts Concurrently, p times: { run your favorite algorithm } ==> needs randomization (at least in the initialization)
  • 13. Random restarts Quasi-random restarts
  • 14. Let's open a parenthesis... (
  • 15. The Van Der Corput sequence: quasi-random numbers in dimension 1. How should I write the nth point ? (p=integer, parameter of the algorithm)
  • 16. A (simple) Scrambled Van Der Corput sequence:
  • 17. Halton sequence: = generalization to dimension d Choose p1,...,pd All pi's must be different (why ?). Typically, the first prime numbers (why ?).
  • 18. Scrambled Halton sequence: = generalization to dimension d with scrambling Choose p1,...,pd All pi's must be different (why ?). Typically, the first prime numbers (why ?).
  • 19. Let's close the parenthesis... (thank you Asterix) )
  • 20. Consider an ES (an evolution strategy = an evolutionary algorithm in continuous domains). The restarts are parametrized by an initial point x and an initial step-size . Which hypothesis on x and  do we need, so that the restarts of the ES will find all local optima ?
  • 21. X*(f) = set of optima If I start close enough to an optimum, with small enough step-size, I find it.
  • 23.
  • 24. What do you suggest ?
  • 27. THEOREM: ( (1) ==> (2); do you see why ?)
  • 28. Ok, step-sizes should accumulate around 0, and the xi's should be dense. X(i) = uniform random = ok and Or something decreasing; But not a constant. sigma(i) = exp( Gaussian ) are ok. (Why ?) Should I do something more sophisticated ?
  • 29. I want a small number of restarts. A criterion: dispersion (some maths should be developped around why...).
  • 30. for random points ==> small difference ==> the key point is more the decreasing (or small) step-size
  • 31. Classical methodologies: 1. Restarts 2. ``Real'' diversity mechanisms
  • 32. 7 ~ -7x10 years ago Highly optimized. Hundreds of millions of years on earth.
  • 33. 7 ~ -7x10 years ago Died! Not adapted to the new environment.
  • 34. 7 ~ -7x10 years ago Survived thanks to niching.
  • 35. Eating Eucalyptus is dangerous for health (for many animals). But not for koalas. Also they sleep most of the day. They're original animals. ==> This is an example of Niching. ==> Niching is good for diversity.
  • 36. USUAL SCHEME: WINNERS KILL LOOSERS POPULATION GOOD BAD GOOD's GOOD BABIES
  • 37. NEW SCHEME: WINNERS KILL LOOSERS ONLY POPULATION IF THEY LOOK LIKE GOOD BAD THEM GOOD's GOOD BABIES
  • 38. NEW SCHEME: WINNERS KILL LOOSERS ONLY POPULATION IF THEY LOOK LIKE GOOD BAD THEM Don't kill koalas; GOOD's their food is GOOD useless to you. BABIES They can be useful in a future world.
  • 39. CLEARING ALGORITHM Input: population Output: smaller population 1) Sort the individuals (the best first) 2) Loop on individuals; for each of them (if alive) kill individuals which are (i) worse (ii) at distance < some constant (...but the  first murders are sometimes canceled) 3) Keep at most  individuals (the best)
  • 40. I. Multimodal optimization II. Adversarial / robust cases a) What does it mean ? b) Nash's stuff: computation c) Robustness stuff: computation III. Noisy cases
  • 41. What is adversarial / robust optimization ? I want to find argmin f. But I don't know exactly f I just know that f is in a family of functions parameterized by . I can just compute f(x,), for a given .
  • 42. What is adversarial / robust optimization ? I want to find argmin f. But I don't know exactly f I just know that f is in a family of functions parameterized by . I can just compute f(x,), for a given . Criteria: Argmin E f(x,). (average case; requires a X . probability distribution) Or Argmin sup f(x,). <== this is adversarial / robust. X 
  • 43. What is adversarial / robust optimization ? I want to find argmin f. But I don't know exactly f We'll see this case later. I just know that f is in a family of functions parameterized by . I can just compute f(x,), for a given . Criteria: Argmin E f(x,). (average case; requires a X . probability distribution) Or Argmin sup f(x,). <== this is adversarial / robust. X  Let's see this.
  • 44. ==> looks like a game; two opponents choosing their strategies. VS x y(=) Can he try multiple times ? Does he choose his strategy first ?
  • 45. Let's have a high-level look at all this stuff. We have seen two cases: inf sup f(x,y) (best power plant for x y worst earthquake ?) inf E f(x,y) (best power plant for x y average earthquake) Does it really cover all possible models ?
  • 46. Let's have a high-level look at all this stuff. We have seen two cases: A bit pessimistic. This is the performance if y is inf sup f(x,y) chosen by an adversary who: x y - knows us - and has infinite computational resources inf E f(x,y) x y Does it really cover all cases ?
  • 47. Let's have a high-level look at all this stuff. We have seen two cases: A bit pessimistic. This is the performance if y is inf sup f(x,y) chosen by an adversary who: x y - knows us (or can do many trials) - and has infinite computational resources inf E f(x,y) x y Good for nuclear power plants. Does it really cover all cases ? Not good for economy, games...
  • 48. Let's have a high-level look at all this stuff. We have seen two cases: A bit pessimistic. This is the performance if y is inf sup f(x,y) chosen by an adversary who: x y - knows us - and has infinite computational resources inf E f(x,y) x y Good for nuclear power plants. Does it really cover all cases ? Not good for economy, games...
  • 49. Decision = positionning my distribution points. Reward = size of Voronoi cells. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  • 50. Decision = positionning my distribution points. Reward = size of Voronoi cells. Simple model: Each company makes his positionning am morning for the whole day. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  • 51. Decision = positionning my distribution points. Reward = size of Voronoi cells. Simple model: Each company makes his positionning am morning for the whole day. But choosing among strategies (i.e. decision rules, programs) leads to the same maths. Newspapers distribution, or Quick vs MacDonalds (if customers ==> nearest)
  • 52. Ok for power plants and earthquakes. Inf sup f(x,y) But for economical choices ? x y Replaced by sup inf f(x,y) ? y x
  • 53. Inf sup f(x,y) x y Replaced by sup inf f(x,y) ? y x Means that we play second. ==> let's see “real” uncertainty techniques.
  • 54. inf sup f(x,y) is not equal to sup inf f(x,y) x y y x ...but with finite domains: inf sup E f(x,y) is equal to sup inf f(x,y) L(x) L(y) L(y) L(x) Von Neumann
  • 55. inf sup f(x,y) is not equal to sup inf f(x,y) x y y x E.g.: Rock-paper-scissors Inf sup = I win Sup inf = I loose And sup inf sup inf sup inf... ==> loop But equality in Chess, draughts, … (turn-based, full-information)
  • 56. inf sup f(x,y) is not equal to sup inf f(x,y) x y y x ...but with finite domains: inf sup E f(x,y) is equal to sup inf f(x,y) L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y) ==> The opponent chooses a randomized strategy without knowing what we choose.
  • 57. inf sup f(x,y) is not equal to sup inf f(x,y) x y y x ...but with finite domains: inf sup E f(x,y) is equal to sup inf f(x,y) L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y) ==> The opponent chooses a randomized strategy without knowing what we choose. ==> Good model for average performance against non-informed opponents.
  • 58. Let's summarize. Nash: best distribution against worst distribution = good model if opponent can't know us, if criterion = average perf. Best against worst: good in many games, and good for robust industrial design Be creative: inf sup E is often reasonnable (e.g. sup on opponent, E on climate...)
  • 59. Let's summarize. Nash: best distribution against worst distribution = good model if opponent can't know us, if criterion = average perf. Best against worst: good in many games, and Nash: good for robust industrial Adversarial bidding (economy). design Choices on markets. Be creative: inf sup E is often reasonnable Military strategy. (e.g. Games. sup on opponent, E on climate...)
  • 60. Let's summarize. Nash: best distribution against worst distribution = good model if opponent can't know us, if criterion = average perf. Best against worst: good in many games, and good for robust industrial design In repeated games, you can (try to) Be creative: inf sup E is often reasonnable predict your opponent's decisions. (e.g. sup on opponent, E on climate...) E.g.: Rock Paper Scissors.
  • 61. Let's summarize. Nash: best distribution against worst distribution = good model if opponent can't know us, if criterion = average perf. Best against worst:“worst-case” analysis: Remarks on this good in many games, and good foryou're stronger (e.g. computer against - if robust industrial design humans in chess), take into account human's weakness Be creative: inf supearnis often reasonnable - in Poker, you will E more money by playing against weak Opponents (e.g. sup on opponent, E on climate...)
  • 62. inf sup f(x,y) is not equal to sup inf f(x,y) x y y x ...but with finite domains: inf sup E f(x,y) is equal to sup inf f(x,y) L(x) L(y) L(y) L(x) ...and is equal to inf sup E f(x,y) x L(y) L(x),L(y) is a Nash equilibrium; L(x), L(y) not necessarily unique.
  • 63. I. Multimodal optimization II. Adversarial / robust cases a) What does it mean ? b) Nash's stuff: computation - fictitious play - EXP3 - coevolution c) Robustness stuff: computation III. Noisy cases
  • 64. FIRST CASE: everything finite ( x and  are in finite domains). ==> Fictitious Play (Brown, Robinson) Simple, consistant, but slow. (see however Kummer's paper)
  • 65. FIRST CASE: everything finite ( x and  are in finite domains). ==> Fictitious Play (Brown, Robinson) ==> sequence of simulated games as follows: (x(n),y(n)) defined as follows: Simple, consistant, x(n+1) = argmin sum f(x, y(i)) but slow. x i<n+1 (see however y(n+1) = argmax sum f(x(i), y) Kummer's x i<n+1 paper)
  • 66. FIRST CASE: everything finite ( x and  are in finite domains). ==> Fictitious Play (Brown, Robinson) ==> sequence of simulated games as follows: (x(n),y(n)) defined as follows: Simple, consistant, x(n+1) = argmin sum f(x, y(i)) but slow. x i<n+1 (see however y(n+1) = argmax sum f(x(i), y) Kummer's x i<n+1 paper) ==> … until n=N; play randomly one of the x(i)'s.
  • 67. Much faster: EXP3, or INF (more complicated, see Audibert)
  • 68. Complexity of EXP3 within logarithmic terms: K / epsilon2 for reaching precision epsilon.
  • 69. Complexity of EXP3 within logarithmic terms: K / epsilon2 for reaching precision epsilon. ==> we don't even read all the matrice of 2 the f(x,y) ! (of size K )
  • 70. Complexity of EXP3 within logarithmic terms: K / epsilon2 for reaching precision epsilon. ==> we don't even read all the matrice of 2 the f(x,y) ! (of size K ) ==> provably better than all deterministic algorithms
  • 71. Complexity of EXP3 within logarithmic terms: 2 K / epsilon for reaching precision epsilon. ==> we don't even read all the matrice of 2 the f(x,y) ! (of size K ) ==> provably better than all deterministic algorithms ==> requires randomness + RAM
  • 72. Complexity of EXP3 within logarithmic terms: K / epsilon2 for reaching precision epsilon. Ok, it's great, but complexity K is still far too big. For Chess, log10(K) l 43. For the game of Go, log10(K) l 170.
  • 73. Coevolution for K huge: Population 1 Population 2
  • 74. Coevolution for K huge: Population 1 Population 2 (play games)
  • 75. Coevolution for K huge: Population 1 Population 2 Fitness = average fitness against opponents
  • 76. Coevolution for K huge: Population 1 Population 2 Fitness = average Pop. 1 Pop. 2 fitness against opponents
  • 77. Coevolution for K huge: Population 1 Population 2 Fitness = average + babies Pop. 1 Pop. 2 fitness against opponents Population 1 Population 2
  • 78. RED QUEEN EFFECT in finite population: (FP keeps all the memory, not coevolution) 1. Rock vs Scissor 2. Rock vs Paper 3. Scissor vs Paper 4. Scissor vs Rock 5. Paper vs Rock 6. Paper vs Scissor 7. Rock vs Scissor =1
  • 79. RED QUEEN EFFECT: 1. Rock vs Scissor 2. Rock vs Paper .... 6. Paper vs Scissor 7. Rock vs Scissor =1 ==> Population with archive (e.g. the best of each past iteration – looks like FP)
  • 80. Reasonnably good in very hard cases but plenty of more specialized algorithms for various cases: - alpha-beta - (fitted)-Q-learning - TD(lambda) - MCTS - ...
  • 81. I. Multimodal optimization II. Adversarial / robust cases a) What does it mean ? b) Nash's stuff: computation c) Robustness stuff: computation III. Noisy cases
  • 82. c) Robustness stuff: computation ==> we look for argmin sup f(x,y) x y ==> classical optimization for x ==> classical optimization for y with restart
  • 83. Naive solution for robust optimization Procedure Fitness(x) y=maximize f(x,.) Return f(x,y) Mimimize fitness
  • 84. Warm start for robust optimization ==> much faster Procedure Fitness(x) Static y=y0; y = maximize f(x,y) Return f(x,y) Mimimize fitness
  • 85. Population-based warm start for robust optimization Procedure Fitness(x) Static P=P0; P = multimodal-maximize f(x,P) Return max f(x,y) y in P Mimimize fitness
  • 86. Population-based warm start for robust optimization Procedure Fitness(x) f(x,y) = sin( y ) + cos((x+y) / pi) x,y in [-10,10] Static P = multimodal-maximize f(x,P) ==> global maxima (in y) moves quickly (==> looks hard) Return max f(x,y) ==> but local maxima (in y) move slowly as a function of x y in P ==> warm start very efficient Mimimize fitness
  • 87. Population-based warm start for robust optimization Procedure + cos((x+y) / pi) f(x,y) = sin( y ) Fitness(x) x,y in [-10,10] Static P = multimodal-maximize f(x,P) ==> global maxima (in y) moves quickly (==> looks hard) Return max f(x,y) ==> but local maxima (in y) move slowly as a function of x ==> warm start very efficient y in P ==> interestingly, we just rediscovered coevolution :-) Mimimize fitness
  • 88.
  • 89. Plots of f(x,.) for various values of x.
  • 90. Coevolution for Nash: Population 1 Population 2 Fitness = average Fitness = average fitness against fitness against Pop. 1 Pop. 2 opponents opponents
  • 91. Coevolution for inf sup: Population 1 Population 2 Fitness = worst Fitness = average fitness against fitness against Pop. 1 Pop. 2 opponents opponents
  • 92. I. Multimodal optimization II. Adversarial / robust cases III. Noisy cases
  • 93. Arg min E f(x,y) x y 1) Monte-Carlo estimates 2) Races: choosing the precision 3) Using Machine Learning
  • 94. Monte-Carlo estimates E f(x,y) = 1/n sum f(x, yi) + error Error scaling as 1/squareRoot( n ) ==> how to be faster ? ==> how to choose n ? (later)
  • 95. Faster Monte-Carlo evaluation: - more points in critical areas (prop. to standard deviation), (+rescaling!) - evaluating f-g, where g has known expectation and is “close” to f. - evaluating (f + f o h)/2 where h is some symmetry
  • 96. Faster Monte-Carlo evaluation: - more points in critical areas (prop. to standard deviation), (+rescaling!) - evaluating f-g, where g has known expectation and is “close” to f. Importance sampling - evaluating (f + f o h)/2 where h is some symmetry
  • 97. Faster Monte-Carlo evaluation: - more points in criticalControl variable to areas (prop. standard deviation), (+rescaling!) - evaluating f-g, where g has known expectation and is “close” to f. - evaluating (f + f o h)/2 where h is some symmetry
  • 98. Faster Monte-Carlo evaluation: - more points in critical areas (prop. to standard deviation), (+rescaling!) - evaluating f-g, where g has variables Antithetic known expectation and is “close” to f. - evaluating (f + f o h)/2 where h is some symmetry
  • 99. And quasi-Monte-Carlo evaluation ? d log(n) / n instead of std / sqrt(n) ==> much better for n really large (if some smoothness) ==> much worse for d large
  • 100. Arg min E f(x,y) 1) Monte-Carlo estimates 2) Races: choosing the precision 3) Using Machine Learning
  • 101. Bernstein's inequality (see Audibert et al): Where: R = range of the Xi's
  • 102. Comparison-based optimization with noise: how to find the best individuals by a race ? While (I want to go on) { I evaluate once each arm. I compute a lower and upper bound for all non-discarded individuals (by Bernstein's bound). I discard individuals which are excluded by the bounds. }
  • 103. Trouble: What if two individuals have the same fitness value ? ==> infinite run ! Tricks for avoiding that at iteration N: - limiting number of evals by some function of N (e.g. exp(N) if a log-linear convergence is expected w.r.t number of iterations) - limiting precision by some function of sigma (e.g. precision = sqrt(sigma) / log(1/sigma) )
  • 104. Arg min E f(x,y) 1) Monte-Carlo estimates 2) Races: choosing the precision 3) Using Machine Learning
  • 105. USING MACHINE LEARNING Statistical tools: f ' (x) = approximation ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) y(n+1) = f ' (x(n+1) ) e.g. f' = quadratic function closest to f on the x(i)'s. and x(n+1) = random or x(n+1) = maxUncertainty or x(n+1) = argmin Ef or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
  • 106. USING MACHINE LEARNING SVM, Kriging, RBF, Gaussian Processes, Or f'(x) = maximum likelihood on a parametric noisy model Statistical tools: f ' (x) = parametric approximation ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) y(n+1) = f ' (x(n+1) ) e.g. f' = logistic quadratic function closest to f on the x(i)'s. and x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f' or x(n+1)=argmax E max(0,bestSoFar - f(y))
  • 107. USING MACHINE LEARNING Very fast if good model Statistical tools: f ' (x) = approximation model + noise) (e.g. quadratic Expensive but efficient: x, x1,f(x1), x2,f(x2), … , xn,f(xn)) ( Efficient Global f ' (x(n+1) ) y(n+1) = Optimization e.g. f' = quadratic function closest to f on the x(i)'s. in multimodal context and x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f' or x(n+1)=argmax E max(0,bestSoFar - f(y))
  • 108. USING MACHINE LEARNING Very fast if good model Statistical tools: f ' (x) = approximation model + noise) (e.g. quadratic ( x, x1,f(x1), x2,f(x2), … , xn,f(xn)) Alsoy(n+1) = f ' (x(n+1) ) termed “expected improvement” e.g. f' = quadratic function closest to f on the x(i)'s. and x(n+1) = random or x(n+1) = maxUncertaintyRed or x(n+1) = argmin f' or x(n+1)=argmax E max(0,bestSoFar - Ef(x))
  • 109. So some tools for noisy optimization: 1. introducing races + tricks 2. using models equipped with uncertainty (2. more difficult, but sometimes much more efficient)
  • 110. Deng et al: application to DIRECT
  • 111. A rectangle j or size alpha(j) is splitted if for some K:
  • 112. A rectangle j or size alpha(j) is splitted if for some K:
  • 113. Warm starts Robust optimization Coevolution Noisy optimization Niching Nash equilibria Surrogate models Clearing Maximum uncertainty Multimodal optimization Monte-Carlo estimate Restarts (sequential / parallel) Quasi-Monte-Carlo Koalas and Eucalyptus Van Der Corput Dinosaurs (and other mass extinction) Halton Fictitious Play Scrambling EXP3 Antithetic variables, control EGO variables, importance sampling.
  • 114. EXERCISES (to be done at home, deadline given soon): 1. Implement a (1+1)-ES and test it on the sphere function f(x)=||x||^2 in dimension 3. Plot the convergence. Nb: choose a scale so that you can check the convergence rate conveniently. 2. Implement a pattern search method (PSM) and test it on the sphere function, also in dimension 3 (choose the same scale as above).
  • 115. 3. We will now consider a varying dimension; - x(n,d) will be the nth visited point in dimension d for the (1+1)-ES, and - x'(n,d) will be the nth visited point in dimension d for the PSM. For both implementations, choose e.g. n=5000, and plot x(n,d) and x'(n,d) as a function of d. Choose a convenient scaling as a function of d so that you can see the effect of the dimension. 4. What happens if you consider a quadratic positive definite ill-conditionned function instead of the sphere function ? E.g.: f(x) = sum of 10^i xi^2 = 10x(1)^2 + 100x(2)^2 + 1000x(3)^2 in dimension 3; test in Dimension 50. 5. (without implementing) Suggest a solution for the problem in Ex. 4.