@inproceedings{teytaud:inria-00451416,
hal_id = {inria-00451416},
url = {http://hal.inria.fr/inria-00451416},
title = {{Bias and variance in continuous EDA}},
author = {Teytaud, Fabien and Teytaud, Olivier},
abstract = {{Estimation of Distribution Algorithms are based on statistical estimates. We show that when combining classical tools from statistics, namely bias/variance decomposition, reweighting and quasi-randomization, we can strongly improve the convergence rate. All modifications are easy, compliant with most algorithms, and experimentally very efficient in particular in the parallel case (large offsprings).}},
language = {Anglais},
affiliation = {TAO - INRIA Futurs , Laboratoire de Recherche en Informatique - LRI , TAO - INRIA Saclay - Ile de France},
booktitle = {{EA 09}},
address = {Strasbourg, France},
audience = {internationale },
year = {2009},
month = May,
pdf = {http://hal.inria.fr/inria-00451416/PDF/decsigma.pdf},
}
2. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 2
3. Introduction: I like large
Grid5000 = 5 000 cores (increasing)
Submitting jobs ==> grouping runs
==> much bigger than number of cores.
Next generations of computers: tenths,
hundreds, thousands of cores.
Evolutionary algorithms are population
based but they have a bad speed-up.
Teytaud and Teytaud TRSH 09 is great 3
4. Introduction: I like large
Grid5000 = 5 000 cores (increasing)
Submitting jobs ==> grouping runs
==> much bigger than number of cores.
Next generations of computers: tenths,
hundreds, thousands of cores.
Evolutionary algorithms are population
based but they have a bad speed-up.
Teytaud and Teytaud TRSH 09 is great 4
5. Introduction: I like large
Grid5000 = 5 000 cores (increasing)
Submitting jobs ==> grouping runs
==> much bigger than number of cores.
Next generations of computers: tenths,
hundreds, thousands of cores.
Evolutionary algorithms are population
based but they have a bad speed-up.
Teytaud and Teytaud TRSH 09 is great 5
6. Introduction: I like large
Grid5000 = 5 000 cores (increasing)
Submitting jobs ==> grouping runs
==> much bigger than number of cores.
Next generations of computers: tenths,
hundreds, thousands of cores.
Evolutionary algorithms are population
based but they have a bad speed-up.
Teytaud and Teytaud TRSH 09 is great 6
7. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 7
8. Complexity bounds
= nb of fitness evaluations for precision
with probability at least ½
Exp ( - Convergence ratio ) = Convergence rate
Convergence ratio ~ 1 / computational cost
==> more convenient for speed-ups
Teytaud and Teytaud TRSH 09 is great 8
9. Complexity bounds on the convergence ratio
FR: full ranking (selected points are ranked)
SB: selection-based (selected points are not ranked)
Teytaud and Teytaud TRSH 09 is great 9
10. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 10
11. Branching factor K (more in Gelly06; Fournier08)
Rewrite your evolutionary algorithm as follows:
g has values in a finite set of cardinal K:
- e.g. subsets of {1,2,...,} of size (K=! / (!(-)!) )
- or ordered subsets (K=! / (-)! ).
- ...
Teytaud and Teytaud TRSH 09 is great 11
12. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 12
14. Automatic parallelization with branching factor 3
Consider the sequential algorithm.
(iteration 1)
Teytaud and Teytaud TRSH 09 is great 14
15. Automatic parallelization with branching factor 3
Consider the sequential algorithm.
(iteration 2)
Teytaud and Teytaud TRSH 09 is great 15
16. Automatic parallelization with branching factor 3
Consider the sequential algorithm.
(iteration 3)
Teytaud and Teytaud TRSH 09 is great 16
17. Automatic parallelization with branching factor 3
Parallel version for D=2.
Population = union of all pops for 2 iterations.
Teytaud and Teytaud TRSH 09 is great 17
18. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 18
19. Real world algorithms
Define:
Necessary condition for log() speed-up:
- E log( * ) ~ log()
But for many algorithms,
- E log( * ) = O(1) ==> constant speed-up
Teytaud and Teytaud TRSH 09 is great 19
20. One-fifth rule: E log( * ) = O(1)
Consider e.g.
Or consider e.g.
In both cases * is lower-bounded
independently of
==> parameters should
strongly depend on !
Teytaud and Teytaud TRSH 09 is great 20
21. Self-adaptation, cumulative step-size adaptation
In both case, the same result: with parameters
depending on the dimension only (and not depending on ),
the speed-up is limited by a constant!
Teytaud and Teytaud TRSH 09 is great 21
22. Outline
Introduction
Complexity bounds
Branching Factor
Automatic Parallelization
Real-world algorithms
Log() corrections
Teytaud and Teytaud TRSH 09 is great 22
23. The starting point of this work
Many algorithms have parameters defined
by handcrafted rules,
Fournier08 shows rates which are
reachable by comparison-based
algorithms
not reached by usual algorithms.
Teytaud and Teytaud TRSH 09 is great 23
24. Log() corrections
We can change that:
In the discrete case (XPs): automatic
parallelization surprisingly efficient.
Simple trick in the continuous case
- E log( *) should be linear in log()
(see papers for details, sorry!)
(this provides corrections which
work for SA and CSA)
Teytaud and Teytaud TRSH 09 is great 24
25. Conclusion
The case of large population size is not well
handled by usual algorithms.
We proposed
(I) theoretical guarantees
(II) an automatic parallelization
matching the bound, and which works well
in the discrete case.
(III) a necessary condition for the
continuous case, which provides
useful hints.
Teytaud and Teytaud TRSH 09 is great 25
26. Main limitation
All this is about a logarithmic speed-up.
The computational
power is like this ==>
<== and the result is like that.
==> much better speed-up for noisy
optimization.
Teytaud and Teytaud TRSH 09 is great 26
27. Further work
Apply VC-bounds for considering only
“reasonnable” branches in the automatic
parallelization.
Theoretically easy, but provides extremely
complicated algorithms.
Teytaud and Teytaud TRSH 09 is great 27
Hinweis der Redaktion
I am Frederic Lemoine, PhD student at the University Paris Sud. I will present you my work on GenoQuery, a new querying module adapted to a functional genomics warehouse