SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Motivation
                                    Background
             Adaptive Coordinate Descent (ACiD)




            Adaptive Coordinate Descent

Ilya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1

                   1
                       TAO Project-team, INRIA Saclay - Île-de-France
      2
          and Laboratoire de Recherche en Informatique (UMR CNRS 8623)
                 Université Paris-Sud, 91128 Orsay Cedex, France




 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   1/ 15
Motivation
                                             Background
                      Adaptive Coordinate Descent (ACiD)


Content


  1   Motivation

  2   Background
        Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
        Coordinate Descent

  3   Adaptive Coordinate Descent (ACiD)
        Algorithm
        Experimental Validation
        Computation Complexity




          Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   2/ 15
Motivation
                                            Background
                     Adaptive Coordinate Descent (ACiD)


Motivation



  Coordinate Descent
      Fast and Simple
      Not suitable for non-separable optimization

  A hypothesis to check
      Some Coordinate Descent with adaptation of coordinate system
      can be as fast as the state-of-the art Evolutionary Algorithms.




         Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   3/ 15
Motivation
                                                                Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
                                                 Background
                                                                Coordinate Descent
                          Adaptive Coordinate Descent (ACiD)


Covariance Matrix Adaptation Evolution Strategy
Decompose to understand



          While CMA-ES by definition is CMA and ES, only recently the
          algorithmic decomposition has been presented. 1


   Algorithm 1 CMA-ES = Adaptive Encoding + ES

     1:   xi ← m + σNi (0, I), for i = 1 . . . λ
     2:   fi ← f (Bxi ), for i = 1 . . . λ
     3:   if Evolution Strategy (ES) with 1/5th success rule then
                              success rate
     4:      σ ← σexp∝( expected success rate −1)
     5:   if Cumulative Step-Size Adaptation ES (CSA-ES) then
                                         evolution path
     6:     σ ← σexp∝( expected evolution path −1)
     7:   B ←AdaptiveEncoding(Bx1 , . . . , Bxµ )


      1 N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant"

              Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent              4/ 15
Motivation
                                                               Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
                                                Background
                                                               Coordinate Descent
                         Adaptive Coordinate Descent (ACiD)


Adaptive Encoding
Inspired by Principal Component Analysis (PCA)

                                                Principal Component Analysis




                                                  Adaptive Encoding Update




             Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent              5/ 15
Motivation
                                                                Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
                                                 Background
                                                                Coordinate Descent
                          Adaptive Coordinate Descent (ACiD)


Coordinate Descent (CD)
Simple Idea


         Goal: minimize f (x) = x1 2 + x2 2 up to value ftarget = 10−10
         Initial state: x0 = (−3.1, −4.1) , step-size σ = 10
         Simple Idea: iteratively optimize f (x) with respect to one
         coordinate, while other are fixed




        (a) Optimize the first coordinate                              (b) Cyclically optimize the first
        by Dichotomy, then the second.                                and the second coordinates.
              Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent              6/ 15
Motivation
                                                                   Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
                                                    Background
                                                                   Coordinate Descent
                             Adaptive Coordinate Descent (ACiD)


Coordinate Descent (CD)
Algorithm




  Algorithm 1 Coordinate Descent
   1:   m ← xmin + I i:d (xmax − xmin )
                i:d     U     i:d   i:d
   2:   fbest ← evaluate(m)
   3:               i:d
                           min
        σi:d ← (xmax − xi:d )/4
   4:   ix ← 0
   5:   while NOT Stopping Criterion do
   6:      ix ← ix + 1 mod. d // Cycling over [1, d]
   7:      x1:d ← 0
   8:      xix ← −σix ; x1 ← m + x ; f1 ← evaluate(x1 )
   9:      xix ← +σix ; x2 ← m + x ; f2 ← evaluate(x2 )
  10:      succ ← 0                                                        (b) ksucc = 1.0, 117 evals.
  11:      if f1 < fbest then
  12:         fbest ← f1 ; m ← x1 ; succ ← 1
  13:      if f2 < fbest then
  14:         fbest ← f2 ; m ← x2 ; succ ← 1
  15:      if succ = 1 then
  16:         σix ← ksucc · σix
  17:      else
  18:         σix ← kunsucc · σix



                                                                           (b) ksucc = 2.0, 149 evals.
                 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent              7/ 15
Motivation
                                                              Covariance Matrix Adaptation Evolution Strategy (CMA-ES)
                                               Background
                                                              Coordinate Descent
                        Adaptive Coordinate Descent (ACiD)


Coordinate Descent (CD)
Convergence Rates




   Figure: Left: Evolution of distance to the optimum versus number of function
   evaluations for the (1+1)-ES, (1+1)-ES opt, CD ksucc = 0.5, CD ksucc = 1.0,
   CD ksucc = 2.0 and CD ksucc = 2.0 ms on f (x) = x 2 in dimension 10.
   Right: Convergence rate c (the lower the better) multiplied by the dimension
   d for different algorithms depending on the dimension d. The convergence
   rates have been estimated for the median of 101 runs.

            Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent              8/ 15
Motivation    Algorithm
                                                Background     Experimental Validation
                         Adaptive Coordinate Descent (ACiD)    Computation Complexity


Adaptive Coordinate Descent (ACiD)
Coordinate Descent for optimization of non-separable problems


              a) Principal Component Analysis                           c) Coordinate Descent Method
                                                                                       0
                                                                                     b2   b1
                                                                                           2
                                                                                              1
                                                                                      1
                                                                                     a1      b1
                                                                           0
                                                                          a1         (0,0)
                                                                                                      0
                                                                                                  a2 b1
                                                                                                   1




                                                                                              0
                                                                                             a2
              b) Adaptive Encoding Update                               d) Adaptive Coordinate Descent Method




    Figure: AECM A -like Adaptive Encoding Update (b) mostly based on Principal
    Component Analysis (a) is used to extend some Coordinate Descent method
    (c) to the optimization of non-separable problems (d).
             Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent    9/ 15
Motivation    Algorithm
                                                     Background     Experimental Validation
                              Adaptive Coordinate Descent (ACiD)    Computation Complexity


Adaptive Coordinate Descent (ACiD)
ACiD on Rosenbrock 2-D function



  Algorithm 1 Adaptive Coordinate Descent (ACiD)
   1:   m ← xi:d + I i:d (xmax − xmin )
                 min
                       U       i:d    i:d
   2:   fbest ← evaluate(m)
                   max     min
   3:   σi:d ← (xi:d − xi:d )/4
   4:   B←I
   5:   ix ← 0
   6:   while NOT Stopping Criterion do
   7:      ix ← ix + 1 mod. d // Cycling over [1, d]
   8:      x1:d ← 0
   9:      xix ← −σix ; x1 ← m + Bx ; f1 ← evaluate(x1 )
  10:      xix ← +σix ; x2 ← m + Bx ; f2 ← evaluate(x2 )
  11:      succ ← 0                                                       (b) ksucc = 2.0, 22231 evals.
  12:      if f1 < fbest then
  13:         fbest ← f1 ; m ← x1 ; succ ← 1
  14:      if f2 < fbest then
  15:         fbest ← f2 ; m ← x2 ; succ ← 1
  16:      if succ = 1 then
  17:         σix ← ksucc · σix
  18:      else
  19:         σix ← kunsucc · σix
  20:        (2i
                               a
           xa x −1) ← x1 ; f(2ix −1) ← f1
             a           a
  21:      x2ix ← x2 ; f2ix ← f2
  22:      if ix = d then
  23:         xa ← xa f a :i |1 ≤ i ≤ 2d
                       <
                       i
  24:       B ← AdaptiveEncoding(xa , . . . , xa )
                                  1            µ
                                                                           (b) ksucc = 1.2, 325 evals.

                  Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   10/ 15
Motivation    Algorithm
                                                Background     Experimental Validation
                         Adaptive Coordinate Descent (ACiD)    Computation Complexity


Experimental Validation
ACiD on the noiseless Black-Box Optimization Benchmarking (BBOB) testbed

    BBOB benchmarking
         24 uni-modal, multi-modal, ill-conditioned, separable and
         non-separable problems.
         Results available of BIPOP-CMA-ES, IPOP-CMA-ES,
         (1+1)-CMA-ES and many other state-of-the-art algorithms.


      How ACiD is sensitive
      to the step-size
      multiplier ksucc ?
      (an example in 10-D)




             Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   11/ 15
Motivation    Algorithm
                                               Background     Experimental Validation
                        Adaptive Coordinate Descent (ACiD)    Computation Complexity


Experimental Validation
Comparison of ACiD and (1+1)-CMA-ES




   Figure: BBOB-like results for noiseless functions in 20-D: Empirical Cumulative
   Distribution Function (ECDF), for ACiD (continuous lines) and (1 + 1)-CMA-ES
   (dashed lines), of the running time (number of function evaluations), normalized by
   dimension d, needed to reach target precision fopt + 10k (for k = +1, −1, −4, −8).
   Light yellow lines in the background show similar ECDFs for target value 10−8 of all
   algorithms benchmarked during BBOB 2009.
            Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   12/ 15
Motivation    Algorithm
                                                Background     Experimental Validation
                         Adaptive Coordinate Descent (ACiD)    Computation Complexity


Experimental Validation
Comparison with IPOP-CMA-ES, BIPOP-CMA-ES and (1+1)-CMA-ES




   Figure: Empirical cumulative distribution of the bootstrapped distribution of ERT vs
   dimension for 50 targets in 10[−8..2] for all functions and subgroups in 10-D (Left) and
   40-D (Right). The best ever line corresponds to the best result obtained by at least one
   algorithm from BBOB 2009 for each of the targets.


             Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   13/ 15
Motivation        Algorithm
                                                                                                 Background         Experimental Validation
                                                                          Adaptive Coordinate Descent (ACiD)        Computation Complexity


Computation Complexity
Some work in progress


                                                     Eigendecomposition as a part of CMA-like algorithms: O(n3 ).
                                                     Cholesky factor and its inverse can be learned incrementally: O(n2 ).
                                                     (1+1)-CMA-ES has O(n3 ) ; with Cholesky update - O(n2 ).
                                                     ACiD has O(n2 ); might have O(n).

                                                      (1+1)−CMA cpp (2009,Suttorp et al), 3.173
                                            1
                                                      (1+1)−CMA Cholesky cpp (2009,Suttorp et al), 2.0767
 CPU per 1 function evaluation (in sec.)




                                           10
                                                      (1+1)−CMA Cholesky cpp (our implementation), 1.7406
                                            0
                                           10         One Matrix−Vector Multiplication, 2.1746                                          100.000 evaluations for
                                                      ACiD−original Matlab, 1.6021
                                            −1
                                           10         ACiD UpdateEveryN cpp, 1.0955
                                                                                                                                        1000-dimensional Ellipsoid:
                                            −2
                                           10
                                                                                                                                              CMA-ES - 15 minutes.
                                            −3                                                                                                (GECCO 2011 CMA-ES
                                           10
                                            −4
                                                                                                                                              Tutorial)
                                           10
                                            −5                                                                                                ACiD - 1-2 seconds.
                                           10
                                            −6
                                                                                                                                              (f = 10−8 after 5-10
                                           10
                                                                                                                                              minutes and 2.4 · 107
                                                                                                                                              evaluations)
                                                 0                    1                      2                  3                    4
                                                10                 10                     10                   10                  10
                                                                                      dimension


                                                           Ilya Loshchilov, Marc Schoenauer, Michèle Sebag          ACiD = Adaptive Encoding + Coordinate Descent     14/ 15
Motivation    Algorithm
                                          Background     Experimental Validation
                   Adaptive Coordinate Descent (ACiD)    Computation Complexity


Summary

                                                    ACiD

    At least as fast as (1+1)-CMA-ES.
    The computation complexity is O(n2 ), might be O(n) if call
    eigendecomposition every n iterations.
    The source code is available online:
    http://www.lri.fr/~ilya/publications/ACiDgecco2011.zip


                                          Open Questions

    O(n) complexity, that is important for large-scale optimization in
    Machine Learning.
    Extention to multi-modal optimization.
    Fast meta-models in dimension d < n (even for d=1).

       Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   15/ 15
Motivation    Algorithm
                                        Background     Experimental Validation
                 Adaptive Coordinate Descent (ACiD)    Computation Complexity


Summary




                            Thank you for your attention!

                                            Questions?




     Ilya Loshchilov, Marc Schoenauer, Michèle Sebag   ACiD = Adaptive Encoding + Coordinate Descent   16/ 15
Motivation   Algorithm
                                       Background    Experimental Validation
                Adaptive Coordinate Descent (ACiD)   Computation Complexity


Adaptive Encoding


       Algorithm 1 Adaptive Encoding
        1: Input: x1 , . . . , xµ
        2: if Initialize then
        3:    wi ← µ ; cp ← √d ; c1 ← 0.5 ; cµ ←
                     1            1
                                       d
                                                      0.5
                                                       d
        4:    p←0
        5:    C←I ;B←I
        6:    m ← µ xi w i
                       i=1
        7:    return.
        8: m− ← m
        9: m ← µ xi wi
                    i=1 √
                            d
       10: z0 ← B−1 (m−m− ) (m − m− )
                        √
                          d
       11: zi ←    B−1 (xi −m− )
                                   (xi − m− )
       12:   p ← (1 − cp )p + cp (2 − cp )z0
       13:   Cµ ← µ wi zi zi
                     i=1
                               T

       14:   C ← (1 − c1 − cµ )C + c1 ppT + cµ Cµ
       15:   B◦ DDB◦ ← eigendecomposition(C)
       16:   B ← B◦ D
       17:   Output: B

Weitere ähnliche Inhalte

Mehr von Ilya Loshchilov

Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationIlya Loshchilov
 
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ES
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ESNew Surrogate-Assisted Search Control and Restart Strategies for CMA-ES
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ESIlya Loshchilov
 
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Ilya Loshchilov
 
BI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line SearchesBI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line SearchesIlya Loshchilov
 
A Pareto-Compliant Surrogate Approach for Multiobjective Optimization
A Pareto-Compliant Surrogate Approach  for Multiobjective OptimizationA Pareto-Compliant Surrogate Approach  for Multiobjective Optimization
A Pareto-Compliant Surrogate Approach for Multiobjective OptimizationIlya Loshchilov
 
Not all parents are equal for MO-CMA-ES
Not all parents are equal for MO-CMA-ESNot all parents are equal for MO-CMA-ES
Not all parents are equal for MO-CMA-ESIlya Loshchilov
 
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...Ilya Loshchilov
 

Mehr von Ilya Loshchilov (7)

Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective OptimizationDominance-Based Pareto-Surrogate for Multi-Objective Optimization
Dominance-Based Pareto-Surrogate for Multi-Objective Optimization
 
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ES
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ESNew Surrogate-Assisted Search Control and Restart Strategies for CMA-ES
New Surrogate-Assisted Search Control and Restart Strategies for CMA-ES
 
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
Intensive Surrogate Model Exploitation in Self-adaptive Surrogate-assisted CM...
 
BI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line SearchesBI-population CMA-ES Algorithms with Surrogate Models and Line Searches
BI-population CMA-ES Algorithms with Surrogate Models and Line Searches
 
A Pareto-Compliant Surrogate Approach for Multiobjective Optimization
A Pareto-Compliant Surrogate Approach  for Multiobjective OptimizationA Pareto-Compliant Surrogate Approach  for Multiobjective Optimization
A Pareto-Compliant Surrogate Approach for Multiobjective Optimization
 
Not all parents are equal for MO-CMA-ES
Not all parents are equal for MO-CMA-ESNot all parents are equal for MO-CMA-ES
Not all parents are equal for MO-CMA-ES
 
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...
Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-B...
 

Kürzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Adaptive Coordinate Descent

  • 1. Motivation Background Adaptive Coordinate Descent (ACiD) Adaptive Coordinate Descent Ilya Loshchilov1,2 , Marc Schoenauer1,2 , Michèle Sebag2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université Paris-Sud, 91128 Orsay Cedex, France Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 1/ 15
  • 2. Motivation Background Adaptive Coordinate Descent (ACiD) Content 1 Motivation 2 Background Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Coordinate Descent 3 Adaptive Coordinate Descent (ACiD) Algorithm Experimental Validation Computation Complexity Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 2/ 15
  • 3. Motivation Background Adaptive Coordinate Descent (ACiD) Motivation Coordinate Descent Fast and Simple Not suitable for non-separable optimization A hypothesis to check Some Coordinate Descent with adaptation of coordinate system can be as fast as the state-of-the art Evolutionary Algorithms. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 3/ 15
  • 4. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Coordinate Descent Adaptive Coordinate Descent (ACiD) Covariance Matrix Adaptation Evolution Strategy Decompose to understand While CMA-ES by definition is CMA and ES, only recently the algorithmic decomposition has been presented. 1 Algorithm 1 CMA-ES = Adaptive Encoding + ES 1: xi ← m + σNi (0, I), for i = 1 . . . λ 2: fi ← f (Bxi ), for i = 1 . . . λ 3: if Evolution Strategy (ES) with 1/5th success rule then success rate 4: σ ← σexp∝( expected success rate −1) 5: if Cumulative Step-Size Adaptation ES (CSA-ES) then evolution path 6: σ ← σexp∝( expected evolution path −1) 7: B ←AdaptiveEncoding(Bx1 , . . . , Bxµ ) 1 N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant" Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 4/ 15
  • 5. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Coordinate Descent Adaptive Coordinate Descent (ACiD) Adaptive Encoding Inspired by Principal Component Analysis (PCA) Principal Component Analysis Adaptive Encoding Update Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 5/ 15
  • 6. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Coordinate Descent Adaptive Coordinate Descent (ACiD) Coordinate Descent (CD) Simple Idea Goal: minimize f (x) = x1 2 + x2 2 up to value ftarget = 10−10 Initial state: x0 = (−3.1, −4.1) , step-size σ = 10 Simple Idea: iteratively optimize f (x) with respect to one coordinate, while other are fixed (a) Optimize the first coordinate (b) Cyclically optimize the first by Dichotomy, then the second. and the second coordinates. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 6/ 15
  • 7. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Coordinate Descent Adaptive Coordinate Descent (ACiD) Coordinate Descent (CD) Algorithm Algorithm 1 Coordinate Descent 1: m ← xmin + I i:d (xmax − xmin ) i:d U i:d i:d 2: fbest ← evaluate(m) 3: i:d min σi:d ← (xmax − xi:d )/4 4: ix ← 0 5: while NOT Stopping Criterion do 6: ix ← ix + 1 mod. d // Cycling over [1, d] 7: x1:d ← 0 8: xix ← −σix ; x1 ← m + x ; f1 ← evaluate(x1 ) 9: xix ← +σix ; x2 ← m + x ; f2 ← evaluate(x2 ) 10: succ ← 0 (b) ksucc = 1.0, 117 evals. 11: if f1 < fbest then 12: fbest ← f1 ; m ← x1 ; succ ← 1 13: if f2 < fbest then 14: fbest ← f2 ; m ← x2 ; succ ← 1 15: if succ = 1 then 16: σix ← ksucc · σix 17: else 18: σix ← kunsucc · σix (b) ksucc = 2.0, 149 evals. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 7/ 15
  • 8. Motivation Covariance Matrix Adaptation Evolution Strategy (CMA-ES) Background Coordinate Descent Adaptive Coordinate Descent (ACiD) Coordinate Descent (CD) Convergence Rates Figure: Left: Evolution of distance to the optimum versus number of function evaluations for the (1+1)-ES, (1+1)-ES opt, CD ksucc = 0.5, CD ksucc = 1.0, CD ksucc = 2.0 and CD ksucc = 2.0 ms on f (x) = x 2 in dimension 10. Right: Convergence rate c (the lower the better) multiplied by the dimension d for different algorithms depending on the dimension d. The convergence rates have been estimated for the median of 101 runs. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 8/ 15
  • 9. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Adaptive Coordinate Descent (ACiD) Coordinate Descent for optimization of non-separable problems a) Principal Component Analysis c) Coordinate Descent Method 0 b2 b1 2 1 1 a1 b1 0 a1 (0,0) 0 a2 b1 1 0 a2 b) Adaptive Encoding Update d) Adaptive Coordinate Descent Method Figure: AECM A -like Adaptive Encoding Update (b) mostly based on Principal Component Analysis (a) is used to extend some Coordinate Descent method (c) to the optimization of non-separable problems (d). Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 9/ 15
  • 10. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Adaptive Coordinate Descent (ACiD) ACiD on Rosenbrock 2-D function Algorithm 1 Adaptive Coordinate Descent (ACiD) 1: m ← xi:d + I i:d (xmax − xmin ) min U i:d i:d 2: fbest ← evaluate(m) max min 3: σi:d ← (xi:d − xi:d )/4 4: B←I 5: ix ← 0 6: while NOT Stopping Criterion do 7: ix ← ix + 1 mod. d // Cycling over [1, d] 8: x1:d ← 0 9: xix ← −σix ; x1 ← m + Bx ; f1 ← evaluate(x1 ) 10: xix ← +σix ; x2 ← m + Bx ; f2 ← evaluate(x2 ) 11: succ ← 0 (b) ksucc = 2.0, 22231 evals. 12: if f1 < fbest then 13: fbest ← f1 ; m ← x1 ; succ ← 1 14: if f2 < fbest then 15: fbest ← f2 ; m ← x2 ; succ ← 1 16: if succ = 1 then 17: σix ← ksucc · σix 18: else 19: σix ← kunsucc · σix 20: (2i a xa x −1) ← x1 ; f(2ix −1) ← f1 a a 21: x2ix ← x2 ; f2ix ← f2 22: if ix = d then 23: xa ← xa f a :i |1 ≤ i ≤ 2d < i 24: B ← AdaptiveEncoding(xa , . . . , xa ) 1 µ (b) ksucc = 1.2, 325 evals. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 10/ 15
  • 11. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Experimental Validation ACiD on the noiseless Black-Box Optimization Benchmarking (BBOB) testbed BBOB benchmarking 24 uni-modal, multi-modal, ill-conditioned, separable and non-separable problems. Results available of BIPOP-CMA-ES, IPOP-CMA-ES, (1+1)-CMA-ES and many other state-of-the-art algorithms. How ACiD is sensitive to the step-size multiplier ksucc ? (an example in 10-D) Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 11/ 15
  • 12. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Experimental Validation Comparison of ACiD and (1+1)-CMA-ES Figure: BBOB-like results for noiseless functions in 20-D: Empirical Cumulative Distribution Function (ECDF), for ACiD (continuous lines) and (1 + 1)-CMA-ES (dashed lines), of the running time (number of function evaluations), normalized by dimension d, needed to reach target precision fopt + 10k (for k = +1, −1, −4, −8). Light yellow lines in the background show similar ECDFs for target value 10−8 of all algorithms benchmarked during BBOB 2009. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 12/ 15
  • 13. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Experimental Validation Comparison with IPOP-CMA-ES, BIPOP-CMA-ES and (1+1)-CMA-ES Figure: Empirical cumulative distribution of the bootstrapped distribution of ERT vs dimension for 50 targets in 10[−8..2] for all functions and subgroups in 10-D (Left) and 40-D (Right). The best ever line corresponds to the best result obtained by at least one algorithm from BBOB 2009 for each of the targets. Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 13/ 15
  • 14. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Computation Complexity Some work in progress Eigendecomposition as a part of CMA-like algorithms: O(n3 ). Cholesky factor and its inverse can be learned incrementally: O(n2 ). (1+1)-CMA-ES has O(n3 ) ; with Cholesky update - O(n2 ). ACiD has O(n2 ); might have O(n). (1+1)−CMA cpp (2009,Suttorp et al), 3.173 1 (1+1)−CMA Cholesky cpp (2009,Suttorp et al), 2.0767 CPU per 1 function evaluation (in sec.) 10 (1+1)−CMA Cholesky cpp (our implementation), 1.7406 0 10 One Matrix−Vector Multiplication, 2.1746 100.000 evaluations for ACiD−original Matlab, 1.6021 −1 10 ACiD UpdateEveryN cpp, 1.0955 1000-dimensional Ellipsoid: −2 10 CMA-ES - 15 minutes. −3 (GECCO 2011 CMA-ES 10 −4 Tutorial) 10 −5 ACiD - 1-2 seconds. 10 −6 (f = 10−8 after 5-10 10 minutes and 2.4 · 107 evaluations) 0 1 2 3 4 10 10 10 10 10 dimension Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 14/ 15
  • 15. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Summary ACiD At least as fast as (1+1)-CMA-ES. The computation complexity is O(n2 ), might be O(n) if call eigendecomposition every n iterations. The source code is available online: http://www.lri.fr/~ilya/publications/ACiDgecco2011.zip Open Questions O(n) complexity, that is important for large-scale optimization in Machine Learning. Extention to multi-modal optimization. Fast meta-models in dimension d < n (even for d=1). Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 15/ 15
  • 16. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Summary Thank you for your attention! Questions? Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACiD = Adaptive Encoding + Coordinate Descent 16/ 15
  • 17. Motivation Algorithm Background Experimental Validation Adaptive Coordinate Descent (ACiD) Computation Complexity Adaptive Encoding Algorithm 1 Adaptive Encoding 1: Input: x1 , . . . , xµ 2: if Initialize then 3: wi ← µ ; cp ← √d ; c1 ← 0.5 ; cµ ← 1 1 d 0.5 d 4: p←0 5: C←I ;B←I 6: m ← µ xi w i i=1 7: return. 8: m− ← m 9: m ← µ xi wi i=1 √ d 10: z0 ← B−1 (m−m− ) (m − m− ) √ d 11: zi ← B−1 (xi −m− ) (xi − m− ) 12: p ← (1 − cp )p + cp (2 − cp )z0 13: Cµ ← µ wi zi zi i=1 T 14: C ← (1 − c1 − cµ )C + c1 ppT + cµ Cµ 15: B◦ DDB◦ ← eigendecomposition(C) 16: B ← B◦ D 17: Output: B