Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
1. AutoML-Zero: Evolving Machine Learning
Algorithms From Scratch
Esteban Real et al (Google Brain / Research). In <ICML 2020>
발표자 : 윤지상
Graduate School of Information. Yonsei Univ.
Machine Learning & Computational Finance Lab.
2. 1. Introduction
2. AutoML-Zero
3. Methods
4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
4.2. Searching with Minimal Human Input
4.3. Discovering Algorithm Adaptations
5. Conclusion
INDEX
5. 1. Introduction
What is the best architecture to tackle this problem?
What is the best strategy to improve this model well?
Drop Out
Batch Normalization
Convolutional Neural Network
Transformer
GPT
YOLO
Res-Net
How Deep
Adam
Learning rate Decay
LSTM
PEGASUS
Cycle-GAN
BERT
NAS
Cut-Mix
Auto-Encoder
Seq2Seq
Anomaly Detection
Recommend System
Capsule-Net
Time series Forecasting Weight Initialization
6. 1. Introduction
Eventually to make ‘Strong AI’,
machine should be able to find the best model
for each problems by itself.
7. 1. Introduction
‘AutoML’ aims to automate progress in modeling for ML
by spending machine compute time instead of human research time
<NAS>
<NAS Net>
9. 1. Introduction
Many Architecture search methods constrain their search space
by human such as methods like employing hand-designed layers as
building blocks and respecting the rules of backpropagation
Drawback :
1. Reduce the innovation potential of AutoML
2. Bias the search results depending on search space
3. Create a new burden on researchers to make search space
10. 1. Introduction
Many Architecture search methods constrain their search space
by human such as methods like employing hand-designed layers as
building blocks and respecting the rules of backpropagation
Drawback :
1. Reduce the innovation potential of AutoML
2. Bias the search results depending on search space
3. Create a new burden on researchers to make search space
AutoML with little restriction on form and with simple mathematical operations as
building blocks is essential AutoML-Zero
15. 2. Auto-ML Zero
def Setup():
…
def Predict():
…
def Learn():
…
Sample algorithms
X
1 2 3
4 5
Remove the oldest algorithm
𝑇 sample algorithms
16. 2. Auto-ML Zero
def Setup():
…
def Predict():
…
def Learn():
…
Sample algorithms
Best
Select the best algorithm
Best
X
1 2 3
4 5
17. 2. Auto-ML Zero
Operation set
def Setup():
…
def Predict():
…
def Learn():
…
Sample algorithms
Best
Select the best algorithm
Best
Mutate
New
Parent
child
X
1 2 3
4 5
18. 2. Auto-ML Zero
Operation set
def Setup():
…
def Predict():
…
def Learn():
…
New operation
…
New
(6)
Sample algorithms
Best
Select the best algorithm
Best
Mutate
New
Parent
child
Put mutation to populations
2 3 5
4
19. 2. Auto-ML Zero
Operation set
New
Remove and Sample
algorithms
Best
Select the best algorithm
Best
Mutate
New
Parent
child
Last
After N
iterations
Put mutation to population
Last candidates
Evaluate
20. 2. Auto-ML Zero
Operation set
New
Remove and Sample
algorithms
Best
Select the best algorithm
Best
Mutate
New
Parent
child
Last
After N
iterations
Put mutation to population
21. 2. Auto-ML Zero
Contribution
1. AutoML-Zero, the proposal to automatically search for ML algorithms
from scratch with minimal human design
2. A novel framework with open-sourced code and a search space that
combines only basic mathematical operations
3. Detailed results to show potential through the discovery of nuanced
ML algorithms using evolutionary search.
23. 3. Methods
Components of Search space
1. 64 mathematical operations of high-school
level
2. Address integers at which scalar / vector /
matrix variable is located(e.g. m1, v0).
Range of these integers are restricted.
3. Real-valued constants of some operation as
input(e.g. gaussian(-0.033, 0.01))
24. 3. Methods
Given set of ML Tasks 𝒯(e.g. binary classification of animals),
• subset 𝒯
𝑠𝑒𝑎𝑟𝑐ℎ ⊂ 𝒯(e.g. binary classification of {(dogs, cats), (lions, tigers)})
is used to find the candidate algorithms
• subset 𝒯
𝑠𝑒𝑙𝑒𝑐𝑡 ⊂ 𝒯(e.g. binary classification of {(wolves, foxes), (cows, pigs)})
is used to select the best algorithms
𝒯
𝑠𝑒𝑎𝑟𝑐ℎ is only used
during N iterations
Last
candidates
Evaluate and select the
best model using 𝒯
𝑠𝑒𝑙𝑒𝑐𝑡
( )
( )
25. 3. Methods
Each task in 𝒯
𝑠𝑒𝑎𝑟𝑐ℎ train set / validation set exist
Using train set,
weights in each
algorithm are
updated
Using validation
set, evaluate
each algorithm
26. 3. Methods
Each task in 𝒯
𝑠𝑒𝑎𝑟𝑐ℎ train set / validation set exist
Using train set,
weights in each
algorithm are
updated
Using validation
set, evaluate
each algorithm
mean_loss_1, mean_loss_2, … , mean_loss_len(𝒯
𝑠𝑒𝑎𝑟𝑐ℎ)
Sample 𝐷 mean_loss and select median
median_1, median_2, … , median_len(sample algo-)
27. 3. Methods
1. Choose an instruction function randomly
from {Setup(), Predict(), Learn()}
2. Choose an action type to mutate randomly
from {Type(i), Type(ii), Type(iii)}
28. 3. Methods
Add or remove
Randomize all
Modify an
argument
1. Choose an instruction function randomly
from {Setup(), Predict(), Learn()}
2. Choose an action type to mutate randomly
from {Type(i), Type(ii), Type(iii)}
<Mutation example>
29. 3. Methods
In order to achieve speedup during model search,
1. Use functional equivalence checking (FEC)
(shortly, reusing same accuracy algorithm to reduce computation time)
2. Add Hurdles which is used as threshold of early-stopping
3. Exploit Parallel search with 𝑊 worker processes
31. 4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
4.2. Searching with Minimal Human Input
4.3. Discovering Algorithm Adaptations
32. 4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
Experiments are conducted with AutoML-Zero of using
1) Random Search
2) Evolutionary Algorithm
33. 4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
type Linear Affine
formula 𝑢 ∙ 𝑥𝑖 𝑢 ∙ 𝑥𝑖 + 𝑎
1. Linear regression
Experiments are conducted with AutoML-Zero of using
1) Random Search
2) Evolutionary Algorithm
Experiment : Generate simple regression tasks and Restrict search
space with respect to tasks then Search the best model at each task
2. Nonlinear regression
type Nonlinear
formula 𝑢 ∙ 𝑅𝑒𝐿𝑈(𝑀𝑥𝑖)
34. 4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
The number of iterations for
which Random Search method
can get acceptable algorithm first
• With Evolutionary Algorithm, we can find acceptable algorithms (e.g. those with lower
mean RMS error than a hand-designed reference) more than when we use Random Search
1. Linear regression
35. 4. Results
4.1. Finding Simple Neural Nets in a Difficult Space
• We can no longer find solutions with RS since search space is too large to find solution
• But, with the Evolutionary Algorithm, we can find the exact solution 𝑢 ∙ 𝑅𝑒𝐿𝑈(𝑀𝑥𝑖) when
𝐷 = 1
• As the numbers of 𝐷 and 𝒯
𝑠𝑒𝑎𝑟𝑐ℎ are increasing, evolution ‘invents’ forward pass and back-
propagation due to necessity of generalization
2. Nonlinear regression
( )
( )
( )
36. 4. Results
4.2. Searching with Minimal Human Input
Experiments are conducted with binary classification tasks of
1) MNIST (total 45 tasks)
2) CIFAR-10 (total 45 tasks)
- Binary classification tasks are made by combining different labels such as (1,9) or
(4,6). So (10 x 9) / 2 = 45 tasks we can obtain
Experiment : Binary classification tasks of CIFAR-10
To search more generalized architecture, we train population with
CIFAR-10 as well as MNIST data
But performance of architecture will be evaluated only using CIFAR-10
38. 4. Results
4.2. Searching with Minimal Human Input
• The last candidate algorithms in 13 out of 20 experiments perform better than hand-
designed 2-layer fully connected NN
• Result of various tasks is following:
model
task
AutoML-Zero
Logistic
Regression
2-layer FCNN
CIFAR-10 84.06±0.10% 77.65±0.22% 82.22±0.17%
SVHN 88.12% 59.58% 85.14%
ImageNet
(down sampled)
80.78% 76.44% 78.44%
Fashion MNIST 98.6% 97.9% 98.21%
39. 4. Results
4.3. Discovering Algorithm Adaptations
Experiment : Binary classification of CIFAR-10 with only 80 train data for 100 epochs
1. Few training examples
Experiment : Binary classification of CIFAR-10 with 800 train data for only 10 epochs
2. Fast training
With the architecture, called SGD model, we already obtain, conduct 3 experiments
Experiment : 10 classification of CIFAR-10 with 45k train data
3. Multiple classes
40. 4. Results
4.3. Discovering Algorithm Adaptations
1. Few training examples
2. Fast training
With the architecture, called SGD model, we already obtain, conduct 3 experiments
41. 4. Results
4.3. Discovering Algorithm Adaptations
With the architecture, called SGD model, we already obtain, conduct 3 experiments
3. Multiple classes
43. 5. Conclusion
Search method can be improved by more sophisticated evolutionary
approaches such as Bayesian optimization, Reinforcement learning.
Search space can be enhanced allowing for AutoML model to choose
more complicated operations.
Interpreting evolved algorithms is required effort due to the complexity
of their raw code.
AutoML can go further