The document discusses few-shot learning approaches. It begins with an introduction explaining that current deep learning models require large datasets but humans can learn from just a few examples. It then discusses the problem of few-shot learning, where models must perform classification, detection, or regression on novel categories represented by only a few samples. Popular approaches discussed include meta-learning methods like MAML and prototypical networks, metric learning methods like relation networks, and data augmentation methods. The document provides an overview of the goals and techniques of few-shot learning.
2. Contents
⢠Introduction
⢠Problem statement, Why?
⢠Approaches
â Meta learning
⢠Matching network
⢠MAML
â Metric Learning
⢠Relation Networks
⢠Prototypical Networks
â AUGMENTAION BASED
⢠Delta encoder
⢠Few shot learning through informative retrieval lens
3. Introduction
⢠The ability of deep neural networks to extract complex statistics and learn high level features from vast
datasets is proven. Yet current deep learning approaches suffer from poor sample efficiency in stark
contrast to human perceptionâââeven a child could recognise a giraffe after seeing a single picture.
⢠Fine-tuning a pre-trained model is a popular strategy to achieve high sample efficiency but it is a post-hoc
hack
Can machine learning do better?
Few-shot learning aims to solve these issues
4. Few shot learning
⢠Whereas most machine learning based object categorization algorithms require
training on hundreds or thousands of samples/images and very large datasets,
one/FEW-shot learning aims to learn information about object categories from
one, or only a few, training samples/images.
⢠It is estimated that a child has learned almost all of the 10 ~ 30 thousand object
categories in the world by the age of six. This is due not only to the human mind's
computational power, but also to its ability to synthesize and learn new object
classes from existing information about different, previously learned classes.
5. Problem statement
Using a large annotated offline dataset,
dog
elephant
monkey
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
knowledge
transfer
lemur
rabbit
mongoose
model
for novel
categories
given task
Online
training
âŚ
6. Problem statement
Using a large annotated offline dataset,
knowledge
transfer
classifier
for novel
categories
classification
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
âŚ
lemur
rabbit
mongoose
7. Problem statement
Using a large annotated offline dataset,
knowledge
transfer
detector
for novel
categories
detection
Online
training
Offline
trainin
g
perform for novel categories,
represented by just a few samples each.
dog
elephant
monkey
âŚ
lemur
rabbit
mongoose
8. Problem statement
Using a large annotated offline dataset,
knowledge
transfer
regressor
for novel
categories
Online
training
Offline
trainin
g
perform regression for novel categories,
represented by just a few samples each.
dog
elephant
monkey
âŚ
lemur
rabbit
mongoose
9. Why work on few-shot learning?
1. It brings the DL closer to real-world business usecases.
⢠Companies hesitate to spend much time and money on
annotated data for a solution that they may profit.
⢠Relevant objects are continuously replaced with new ones. DL
has to be agile.
2. It involves a bunch of exciting cutting-edge technologies.
Meta-
learning
methods
Networks
generating
networks
Data
synthesizers
Semantic
metric spaces
Graph neural
networks
Neural Turing
Machines
GANs
10. Meta-learning
Learn a learning strategy
to adjust well to a new
few-shot learning task
Data augmentation
Synthesize more data from the novel
classes to facilitate the regular
learning
Metric learning
Learn a `semantic` embedding
space using a distance loss
function
Few-
shot
learning Each category is
represented by just a
few examples
Learn to perform
classification,
detection, regression
11. The n-shot, k-way task
⢠The ability of a algorithm to perform few-shot learning is typically measured by its
performance on n-shot, k-way tasks. These are run as follows:
1. A model is given a query sample belonging to a new, previously unseen class
2. It is also given a support set, S, consisting of n examples each from k different unseen classes.
3. The algorithm then has to determine which of the support set classes the query sample belongs to
12. Training a
meta learner
to learn on
each task
Meta-Learning
Standard learning: datadatadata
instances
training a
learner on
the data
model
Meta learning: datadatatasks
mod
el
mod
el
learning
strategy
data knowledge
task-
specific
learner
task-agnostic
specific
classes
training data
target data
task-specific
meta-
learner
datadatatask
data
meta-
learner
New
task
13. Recurrent meta-learners
Matching Networks in Vinyals et.al., NIPS 2016
Distance-based classification: based on similarity between
the query and support samples in the embedding space
(adaptive metric):
đŚ =
đ
đ đĽ, đĽđ đŚđ , đ đĽ, đĽđ = đ đđđđđđđđĄđŚ đ đĽ, đ , đ đĽđ, đ
đ, đ - LSTM embeddings of đĽ dependent on the support set S
⢠Embedding space is class-agnostic
⢠LSTM attention mechanism adjusts the embedding to the task
to be elaborated later
Concept of episodes: test
conditions in the training.
⢠N new categories
⢠M training examples per
category
⢠one query example in {1..N}
categories.
⢠Typically, N=5, M=1, 5.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
14. Optimization as a model for few-shot learning
⢠META-LEARN LSTM learn a general initialization
of the learner (classifier) network that allows for
quick convergence of training.
Problem: Gradient-based
optimization in high
capacity classifiers requires
many iterative steps over
many examples to perform
well.
Solution: an LSTM-based
meta-learner model to
learn the exact optimization
algorithm to train another
learner neural network
classifier in the few-shot
learning.
15. Optimizers
Optimize the learner to perform well after fine-tuning on the task data done by
a single (or few) step(s) of Gradient Descent.
MAML(Model-Agnostic Meta-Learning) Finn et.al., ICML 2017
Standard objective (task-specific, for task T):
min
θ
âT θ , learned via update θâ˛
= θ â Îą â đťÎ¸âT(θ)
Meta-objective (across tasks):
min
θ T~p(â) âT θⲠ, learned via an update θ â θ â βđťÎ¸ T~p(â) âT θâ˛
reprinted from
Li et.al., 2017
Meta-SGD Li et.al., 2017
âInterestingly, the learning process can continue forever, thus enabling life-long learning,
and at any moment, the meta-learner can be applied to learn a learner for any new task.â
Meta-SGD Li et.al., 2017
Render ι as a vector of size θ.
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Meta-SGD 54.24 / 70.86
17. Metric Learning
Relation networks, Sung et.al., CVPR 2018
Use the Siamese Networks principle :
⢠Concatenate embeddings of query and support samples
⢠Relation module is trained to produces score 1 for correct class and 0 for others
⢠Extends to zero-shot learning by replacing support embeddings with semantic features.
replicated from Sung et.al., Learning to
Compare - Relation network for few-shot
learning, CVPR 2018
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
18. Metric Learning
Matching Networks,Vinyals et.al., NIPS 2016
Objective: maximize the cross-entropy for the non-parametric softmax
classifier (đĽ,đŚ) đđđđ đ đŚ đĽ, đ , with
đ đ đŚ đĽ, đ = đ đđđĄđđđĽ đđđ đ đĽ, đ , đ đĽđ, đ
Each ca
by a sin
Prototypical Networks, Snell et al, 2016:
Each category is represented by it mean sample (prototype).
Objective: maximize the cross-entropy with the prototypes-based
Method
miniImageNet
classification
accuracy 1/5
shot
Matching
networks
43.56 / 55.31
MAML 48.70 / 63.11
Relation
networks
50.44 / 65.32
Prototypical
Networks
49.42 / 68.20
Meta-SGD 54.24 / 70.86
LEO 61.76 / 77.59
19. Prototypical Networks
⢠In Prototypical Networks Snell et al. apply a compelling inductive bias in the form of class
prototypes to achieve impressive few-shot performanceâââexceeding Matching Networks
without the complication of FCE. The key assumption is made is that there exists an
embedding in which samples from each class cluster around a single prototypical
representation which is simply the mean of the individual samples
20. Sample synthesis
Offline stage
datadata
data
instan
ces
train a
synthesizer
sampling from
class distribution
synthesizer
model
data knowledge
On new task data
datafew data
instances
synthesizer
model
novel classes
datadata
many
data
instances
train a
model
task
model
datadataoffline
data
21. More augmentation approaches
Î-encoder Schwartz et.al., NeurIPS 2018
⢠Use a variant of autoencoder to capture the intra-class
difference between two class samples in the latent space.
⢠Transfer class distributions from training to novel classes.
Encoder
Decoder
đ
Sampled
target
Sampled
reference
Sampled
delta
New class
reference
Synthesized
new class
example
Synthesis
Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an
effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
22. Few shot learning through Information Retrieval lens
Goal: Ranking For classification
We want to classify the points by finding out which class is most similar one
So we are going to rank all the other w.r.t to some similarity measure
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
23. Mean Average Precision
Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
24.
25. PROBLEMS AHEAD
The mean Average Precision is a terrible loss function (for gradient descent purposes)
26.
27.
28. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
29. Eleni Triantafillou, Richard Zemel, and Raquel Urtasun. Few-Shot Learning Through an Information Retrieval Lens, In Advances in Neural Information Processing Systems, 2252-2262, 2017.
https://arxiv.org/abs/1707.02610
30. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky Submitted on 20 May 2019
Meta learning: âlearn on other problems how to improve learning for our target problemâ
References:
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra, Matching networks for one shot learning. In NIPS 2016
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap, Meta-Learning with Memory-Augmented Neural Networks, ICML 2016
References:
Finn, Chelsea, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017
Li2017 - Z. Li, F. Zhou, F. Chen, and H. Li. Meta-SGD: Learning to Learn Quickly for Few-Shot Learning. arXiv:1707.09835, 2017
References
Sung, Flood & Yang, Yongxin & Zhang, Li & Xiang, Tao & H. S. Torr, Philip & Hospedales, Timothy. Learning to Compare: Relation Network for Few-Shot Learning. CVPR 2018
Eliyahu Schwartz, Leonid Karlinsky, Joseph Shtok, Sivan Harary, Mattias Marder, Rogerio Feris, Abhishek Kumar, Raja Giryes and Alex M. Bronstein, 'Delta-encoder: an effective sample synthesis method for few-shot object recognition', NeurIPS 2018.
Chen, Z., Fu, Y., Zhang, Y., Jiang, Y. G., Xue, X., & Sigal, L. (2018). Semantic feature augmentation in few-shot learning. arXiv preprint arXiv:1804.05298