SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
1 / 22
On First-Order Meta-Learning Algorithms
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
May 17, 2018
2 / 22
MAML
1
1
C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of
Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML)
(2017).
3 / 22
MAML
weakness
Say we want to use MAML with 3 gradient steps. We compute
θ0 = θmeta
θ1 = θ0 − α θL(θ) θ0
θ2 = θ1 − α θL(θ) θ1
θ3 = θ2 − α θL(θ) θ2
and then we backpropagate
θmeta = θmeta − β θL(θ) θ3
= θmeta − β θL(θ) θ2−α θL(θ)|θ2
= θmeta − · · ·
4 / 22
MAML
weakness
A weakness of MAML is that the need for memory and computation scales
linearly with the number of gradient updates.
5 / 22
MAML
weakness
The original paper2
actually suggested first-order MAML, a way to reduce
computation while not sacrificing much performance.
2
C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of
Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML)
(2017).
6 / 22
MAML
First-Order Approximation
Consider MAML with one inner gradient step. We start with a relationship
between θ and θ , and remove the term in the MAML update that requires
second-order derivatives:
θ = θ − α θL(θ)
gMAML = θL(θ ) = ( θ L(θ )) · ( θθ )
= ( θ L(θ )) · ( θ (θ − α θL(θ)))
≈ ( θ L(θ )) · ( θθ)
= θ L(θ )
7 / 22
MAML
First-Order Approximation
gMAML = θL(θ ) ≈ θ L(θ )
This paper3
was the first to observe that FOMAML is equivalent to simply
remembering the last gradient and applying it to the initial parameters. The
implementation of the original paper4
built the computation graph for MAML
and then simply skipped the computations for second-order terms.
3
John Schulman Alex Nichol Joshua Achiam. “On First-Order Meta-Learning Algorithms”.
In: (2018). Preprint arXiv:1803.02999.
4
C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of
Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML)
(2017).
8 / 22
On First-Order Meta-Learning Algorithms
Alex Nichol, Joshua Achiam, John Schulman
9 / 22
Reptile
10 / 22
Comparison
11 / 22
Analysis
Definitions
We assume that we get a sequence of loss functions (L1, L2, · · · , Ln). We
introduce the following symbols for convenience:
gi = Li (φi )
¯gi = Li (φ1)
¯Hi = Li (φ1)
Ui (φ) = φ − αLi (φ)
φi+1 = φi − αgi = Ui (φi )
We want to express everything in terms of ¯gi and ¯Hi to analyze what each
update means from the point of view of the initial parameters.
12 / 22
Analysis
We begin by expressing gi using ¯gi and ¯Hi :
gi = Li (φi ) = Li (φ1) + Li (φ1)(φi − φ1) + O(α2
)
= ¯gi + ¯Hi (φi − φ1) + O(α2
)
= ¯gi − α ¯Hi
i−1
j=1
gj + O(α2
)
= ¯gi − α ¯Hi
i−1
j=1
¯gj + O(α2
)
13 / 22
Analysis
The MAML update is:
gMAML =
∂
∂φ1
Lk(φk)
=
∂
∂φ1
Lk(Uk−1(Uk−2(· · · (U1(φ1)))))
= U1(φ1) · · · Uk−1(φk−1)Lk(φk)
= (I − αL1(φ1)) · · · (I − αLk−1(φk−1))Lk(φk)
=
k−1
j=1
(I − αLj (φj )) gk
14 / 22
Analysis
gMAML =
k−1
j=1
(I − αLj (φj )) gk
=
k−1
j=1
(I − α ¯Hj ) ¯gk − α ¯Hk
k−1
j=1
¯gj + O(α2
)
= I − α
k−1
j=1
¯Hj ¯gk − α ¯Hk
k−1
j=1
¯gj + O(α2
)
= ¯gk − α
k−1
j=1
¯Hj ¯gk − α ¯Hk
k−1
j=1
¯gj + O(α2
)
15 / 22
Analysis
Assuming k = 2,
gMAML = ¯g2 − α ¯H2 ¯g1 − α ¯H1 ¯g2 + O(α2
)
gFOMAML = g2 = ¯g2 − α ¯H2 ¯g1 + O(α2
)
gReptile = g1 + g2 = ¯g1 + ¯g2 − α ¯H2 ¯g1 + O(α2
)
16 / 22
Analysis
Since loss functions are exchangeable (losses are typically computed over
minibatches randomly taken from a larger set),
E[¯g1] = E[¯g2] = · · ·
Similarly,
E[ ¯Hi ¯gj ] =
1
2
E[ ¯Hi ¯gj + ¯Hj ¯gi ]
=
1
2
E[
∂
∂φ1
(¯gi · ¯gj )]
17 / 22
Analysis
Therfore, in expectation, there are only two kinds of terms:
AvgGrad = E[¯g]
AvgGradInner = E[¯g · ¯g ]
We now return to gradient-based meta-learning for k steps:
E[gMAML] = 1AvgGrad + (2k − 2)αAvgGradInner
E[gFOMAML] = 1AvgGrad + (k − 1)αAvgGradInner
E[gReptile] = kAvgGrad +
1
2
k(k − 1)αAvgGradInner
18 / 22
Experiments
Gradient Combinations
19 / 22
Experiments
Few-shot Classification
20 / 22
Experiments
Reptile vs FOMAML
21 / 22
Summary
Gradient-based meta-learning works because of AvgGradInner, a term that
minimizes the inner product of updates w.r.t. different minibatches
Performance is similar to FOMAML/MAML
Analysis assumes α → 0
The authors say that since Reptile is similar to SGD, SGD may generalize
well because it is an approximation to MAML. They also suggest that may
be why finetuning from ImageNet works well.
22 / 22
References I
[1] John Schulman Alex Nichol Joshua Achiam. “On First-Order Meta-Learning
Algorithms”. In: (2018). Preprint arXiv:1803.02999.
[2] C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast
Adaptation of Deep Networks”. In: Proceedings of the International
Conference on Machine Learning (ICML) (2017).
23 / 22
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
Vincenzo Lomonaco
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
MasanoriSuganuma
 

Was ist angesagt? (20)

[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
[DL輪読会]Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
 
Active learning lecture
Active learning lectureActive learning lecture
Active learning lecture
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Trust Region Policy Optimization
Trust Region Policy OptimizationTrust Region Policy Optimization
Trust Region Policy Optimization
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learning【DL輪読会】Scaling laws for single-agent reinforcement learning
【DL輪読会】Scaling laws for single-agent reinforcement learning
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
[DL Hacks]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Network
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement LearningFeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement Learning
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + αNIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
NIPS2017読み会@PFN: Hierarchical Reinforcement Learning + α
 
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
スパース性に基づく機械学習(機械学習プロフェッショナルシリーズ) 2.3節〜2.5節
 
ガウス過程入門
ガウス過程入門ガウス過程入門
ガウス過程入門
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 

Ähnlich wie On First-Order Meta-Learning Algorithms

A review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementationA review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementation
ssuserfa7e73
 
Performance analysis of a second order system using mrac
Performance analysis of a second order system using mracPerformance analysis of a second order system using mrac
Performance analysis of a second order system using mrac
IAEME Publication
 
Performance analysis of a second order system using mrac
Performance analysis of a second order system using mracPerformance analysis of a second order system using mrac
Performance analysis of a second order system using mrac
iaemedu
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 

Ähnlich wie On First-Order Meta-Learning Algorithms (20)

MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
Self-sampling Strategies for Multimemetic Algorithms in Unstable Computationa...
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
A review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementationA review of automatic differentiationand its efficient implementation
A review of automatic differentiationand its efficient implementation
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
Lecture9 xing
Lecture9 xingLecture9 xing
Lecture9 xing
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
Performance analysis of a second order system using mrac
Performance analysis of a second order system using mracPerformance analysis of a second order system using mrac
Performance analysis of a second order system using mrac
 
Performance analysis of a second order system using mrac
Performance analysis of a second order system using mracPerformance analysis of a second order system using mrac
Performance analysis of a second order system using mrac
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
01_CS115 Machine Learning Overview .pdf
01_CS115 Machine Learning Overview  .pdf01_CS115 Machine Learning Overview  .pdf
01_CS115 Machine Learning Overview .pdf
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
Optimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-peripheryOptimal L-shaped matrix reordering, aka graph's core-periphery
Optimal L-shaped matrix reordering, aka graph's core-periphery
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATIONLOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
LOGNORMAL ORDINARY KRIGING METAMODEL IN SIMULATION OPTIMIZATION
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse Problems
 

Mehr von Yoonho Lee

Mehr von Yoonho Lee (11)

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
 
Meta Learning Shared Hierarchies
Meta Learning Shared HierarchiesMeta Learning Shared Hierarchies
Meta Learning Shared Hierarchies
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
 
The Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and PlanningThe Predictron: End-to-end Learning and Planning
The Predictron: End-to-end Learning and Planning
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Modular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy SketchesModular Multitask Reinforcement Learning with Policy Sketches
Modular Multitask Reinforcement Learning with Policy Sketches
 
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement LearningEvolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 

Kürzlich hochgeladen

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Kürzlich hochgeladen (20)

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

On First-Order Meta-Learning Algorithms

  • 1. 1 / 22 On First-Order Meta-Learning Algorithms Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology May 17, 2018
  • 2. 2 / 22 MAML 1 1 C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML) (2017).
  • 3. 3 / 22 MAML weakness Say we want to use MAML with 3 gradient steps. We compute θ0 = θmeta θ1 = θ0 − α θL(θ) θ0 θ2 = θ1 − α θL(θ) θ1 θ3 = θ2 − α θL(θ) θ2 and then we backpropagate θmeta = θmeta − β θL(θ) θ3 = θmeta − β θL(θ) θ2−α θL(θ)|θ2 = θmeta − · · ·
  • 4. 4 / 22 MAML weakness A weakness of MAML is that the need for memory and computation scales linearly with the number of gradient updates.
  • 5. 5 / 22 MAML weakness The original paper2 actually suggested first-order MAML, a way to reduce computation while not sacrificing much performance. 2 C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML) (2017).
  • 6. 6 / 22 MAML First-Order Approximation Consider MAML with one inner gradient step. We start with a relationship between θ and θ , and remove the term in the MAML update that requires second-order derivatives: θ = θ − α θL(θ) gMAML = θL(θ ) = ( θ L(θ )) · ( θθ ) = ( θ L(θ )) · ( θ (θ − α θL(θ))) ≈ ( θ L(θ )) · ( θθ) = θ L(θ )
  • 7. 7 / 22 MAML First-Order Approximation gMAML = θL(θ ) ≈ θ L(θ ) This paper3 was the first to observe that FOMAML is equivalent to simply remembering the last gradient and applying it to the initial parameters. The implementation of the original paper4 built the computation graph for MAML and then simply skipped the computations for second-order terms. 3 John Schulman Alex Nichol Joshua Achiam. “On First-Order Meta-Learning Algorithms”. In: (2018). Preprint arXiv:1803.02999. 4 C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML) (2017).
  • 8. 8 / 22 On First-Order Meta-Learning Algorithms Alex Nichol, Joshua Achiam, John Schulman
  • 11. 11 / 22 Analysis Definitions We assume that we get a sequence of loss functions (L1, L2, · · · , Ln). We introduce the following symbols for convenience: gi = Li (φi ) ¯gi = Li (φ1) ¯Hi = Li (φ1) Ui (φ) = φ − αLi (φ) φi+1 = φi − αgi = Ui (φi ) We want to express everything in terms of ¯gi and ¯Hi to analyze what each update means from the point of view of the initial parameters.
  • 12. 12 / 22 Analysis We begin by expressing gi using ¯gi and ¯Hi : gi = Li (φi ) = Li (φ1) + Li (φ1)(φi − φ1) + O(α2 ) = ¯gi + ¯Hi (φi − φ1) + O(α2 ) = ¯gi − α ¯Hi i−1 j=1 gj + O(α2 ) = ¯gi − α ¯Hi i−1 j=1 ¯gj + O(α2 )
  • 13. 13 / 22 Analysis The MAML update is: gMAML = ∂ ∂φ1 Lk(φk) = ∂ ∂φ1 Lk(Uk−1(Uk−2(· · · (U1(φ1))))) = U1(φ1) · · · Uk−1(φk−1)Lk(φk) = (I − αL1(φ1)) · · · (I − αLk−1(φk−1))Lk(φk) = k−1 j=1 (I − αLj (φj )) gk
  • 14. 14 / 22 Analysis gMAML = k−1 j=1 (I − αLj (φj )) gk = k−1 j=1 (I − α ¯Hj ) ¯gk − α ¯Hk k−1 j=1 ¯gj + O(α2 ) = I − α k−1 j=1 ¯Hj ¯gk − α ¯Hk k−1 j=1 ¯gj + O(α2 ) = ¯gk − α k−1 j=1 ¯Hj ¯gk − α ¯Hk k−1 j=1 ¯gj + O(α2 )
  • 15. 15 / 22 Analysis Assuming k = 2, gMAML = ¯g2 − α ¯H2 ¯g1 − α ¯H1 ¯g2 + O(α2 ) gFOMAML = g2 = ¯g2 − α ¯H2 ¯g1 + O(α2 ) gReptile = g1 + g2 = ¯g1 + ¯g2 − α ¯H2 ¯g1 + O(α2 )
  • 16. 16 / 22 Analysis Since loss functions are exchangeable (losses are typically computed over minibatches randomly taken from a larger set), E[¯g1] = E[¯g2] = · · · Similarly, E[ ¯Hi ¯gj ] = 1 2 E[ ¯Hi ¯gj + ¯Hj ¯gi ] = 1 2 E[ ∂ ∂φ1 (¯gi · ¯gj )]
  • 17. 17 / 22 Analysis Therfore, in expectation, there are only two kinds of terms: AvgGrad = E[¯g] AvgGradInner = E[¯g · ¯g ] We now return to gradient-based meta-learning for k steps: E[gMAML] = 1AvgGrad + (2k − 2)αAvgGradInner E[gFOMAML] = 1AvgGrad + (k − 1)αAvgGradInner E[gReptile] = kAvgGrad + 1 2 k(k − 1)αAvgGradInner
  • 19. 19 / 22 Experiments Few-shot Classification
  • 21. 21 / 22 Summary Gradient-based meta-learning works because of AvgGradInner, a term that minimizes the inner product of updates w.r.t. different minibatches Performance is similar to FOMAML/MAML Analysis assumes α → 0 The authors say that since Reptile is similar to SGD, SGD may generalize well because it is an approximation to MAML. They also suggest that may be why finetuning from ImageNet works well.
  • 22. 22 / 22 References I [1] John Schulman Alex Nichol Joshua Achiam. “On First-Order Meta-Learning Algorithms”. In: (2018). Preprint arXiv:1803.02999. [2] C. Finn, P. Abbeel, and S. Levine. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”. In: Proceedings of the International Conference on Machine Learning (ICML) (2017).