This document discusses various transfer learning techniques for machine learning, including domain adaptation and small sample learning. It proposes three methods for unsupervised domain adaptation that use graph or hypergraph matching to minimize domain discrepancy: 1) Graph Matching, 2) Hypergraph Matching, and 3) Graph Matching with representation learning. For small sample learning, it discusses approaches for few-shot learning and zero-shot learning, and proposes a two-stage solution for few-shot learning that learns a discriminative low-dimensional space and estimates class variance, and a method for zero-shot learning that matches features to semantics. Evaluation on standard datasets shows the proposed methods achieve competitive performance.
Multiple time frame trading analysis -brianshannon.pdf
Transfer Learning Techniques
1. 1
Click to edit Master title style
On Transfer Learning Techniques for Machine Learning
Assistive Robotics Technology Laboratory
School of Electrical and Computer Engineering
Purdue University, West Lafayette, IN, USA
Debasmit Das
Advisory Committee
C.S. George Lee (Chair)
Stanley Chan
Guang Lin
Guang Cheng
2. 2
Outline
• INTRODUCTION
• DOMAIN ADAPTATION
- Motivation
- Introduction
- Method 1 (Graph Matching)
- Method 2 (Hyper-graph Matching)
- Method 3 (Graph Matching & Representation Learning)
- Summary
• SMALL SAMPLE LEARNING
- Introduction
- Few shot Learning
- Zero shot Learning
- Hypothesis Transfer Learning
• CONCLUSION
Completed Work
Ongoing & Future
Work
4. 4
IntroductionSupervised Learning (SL)
Collect lot’s of data
and then annotate
Choose your model
depending on task
Train your model
against a loss
Evaluate model on
unseen data
MODEL
Annotate
OR
SVMANN
Optimize cross
entropy/least
squares.
Output
Test samples
5. 5
IntroductionSL v/s Human Learning
• Humans learn very fast compared to machine.
• Human learning allows recognizing new objects
and new domains with very less data.
• Current state-of-the-art models are closed set.
• Transfer Learning can benefit supervised learning.
SL performance increases with more
training data and more complex models.
E.g. deeper neural networks.
[Canziani et al. ISCAS’17]
Evolution of Deep Architectures
Close to human ability ?
Really ?
6. 6
IntroductionTransfer Learning (TL)
• Allows pre-trained machine learning models to be
adapted and applied to new tasks and new
domains.
• New tasks can be novel categories.
• New domain can be a novel variety of the same
categories.
• Automatic Annotation : Reduces human effort
of labelling new domains/tasks.
• Faster Learning : Learning novel tasks from
less data prevents long training time.
• Data Efficiency : In some domains, obtaining
data is cumbersome. E.g. Medical tests, Robotics.
Added Benefits
7. 7
Introduction
Real World Applications
Recognizing rare
novel objects.
Model transfer from
simulation to real world.
[Google AI Blog]
Dealing with changing appearances
and environment.
[Wulfmeier et al. IROS 2017]
9. 9
Training and Testing conditions Introduction
UDA HTL
FSL ZSL
• Distribution discrepancy between training and testing
conditions.
• Testing data unlabeled but same categories as training. • Base (novel) categories contain models/prototypes
(few labelled data).
• Base categories used as training and novel categories
used for testing.
• Base categories used as training and novel categories
used for testing.
• Base categories contain abundant labelled data.
Novel categories contain few labelled data.
• Base categories used as training and novel categories
used for testing.
• Base (novel) categories contain abundant labelled
(unlabeled) data. Class-level semantic information
available.
10. 10
Unsupervised Domain Adaptation (UDA) Introduction
Results
Minimize distribution discrepancy by
matching graphs/ hyper-graphs.
Domain adaptation with/without
representation learning.
Source
Domain
Target
Domain
Without representation learning
With representation learning
Create maximum margin classifier
Class 2Class 1
Class 3
Sample 1
Sample 2
Sample 3
• Produce better recognition
performance with respect to
global methods.
• Representation learning is
slower but produces better
recognition performance.
• Third order matching produces
better results than second
order matching.
11. 11
Few Shot Learning (FSL) Introduction
Preliminary ResultsInfer the class statistics of the
data-starved novel classes
Use a discriminative low
dimensional space
1
2
3
4
5
Pair-wise distances between
points used as features
Produces more accurate
classification
Produces more dense
feature space
• Produce competitive performance
with respect to previous methods.
• Most important component of
framework not clear. Ablation
studies required.
12. 12
Zero Shot Learning (ZSL) Introduction
Preliminary Results
Structurally match feature
and semantics
Adapt the semantic space
and classification scores
Domain Adaptation
Score calibration
• Produce recognition performance
much better than previous
methods.
• DA step most important
because of better generalization to
novel data.
• Calibration not as effective as DA
because it does not use novel test
data.
13. 13
Hypothesis Transfer Learning (HTL) Introduction
Expected ResultsMatch source models with
target samples
Properly constrain the
correspondence matrix
• Correspondences should be positive to
prevent negative transfer.
• Correspondences should be bounded
to select relevant source models.
• Possibly, explore first and second-order
matching between models and samples.
• Should produce performance
better than no adaptation
baseline and close to oracle
baseline.
• Selected models should have
semantic similarity with the
samples.
14. 14
Introduction
Common theme of using relations/matching and structures to Transfer Learning tasks.
Still each task has uniqueness in the methodology
• Sample to sample relation for UDA.
• Sample to prototype relation for FSL.
• Sample to semantics relation for ZSL.
• Sample to model relation for HTL.
Relations
Sample (Level 1)
Prototype (Level 2)
Semantics (Level 3)
Category (Level 4)
Increasing level of information
Structures
• Structural alignment between domains
for UDA.
• Structural alignment between sample
and semantics for ZSL.
• Structure between samples yield
new representation for FSL.
Unified approach to TL
15. 15
IntroductionImpact beyond TL
UDA HTL
FSL ZSL
Core Idea : Match distribution
Using graphs/hyper-graphs.
Impact : Generative Models,
Anomaly detection.
Core Idea : Discriminative
Low-dimensional space and
generating statistics.
Impact : Discriminative/
Generative Learning.
Core Idea : Adaptive matching
Between features and semantics.
Impact : Media Retrieval,
Description generation.
Core Idea : Match Model
and samples.
Impact : Recommendation
System, Learn-ware.
16. 16
Publications Introduction
Unsupervised Domain Adaptation
Zero Shot Learning
Few Shot Learning
• Debasmit, Das, and C. S. George Lee. "Sample-to-sample correspondence for unsupervised domain adaptation."
Engineering Applications of Artificial Intelligence (EAAI) (73) (2018): 80-91.
• Debasmit, Das, and C. S. George Lee. "Graph Matching and Pseudo-Label Guided Deep Unsupervised Domain
Adaptation." Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2018, pp. 342-352.
• Debasmit, Das, and C. S. George Lee. "Unsupervised Domain Adaptation Using Regularized Hyper-Graph Matching,"
Proceedings of the IEEE International Conference on Image Processing (ICIP), 2018, pp. 3758-3762.
• Debasmit, Das, and C. S. George Lee. "Zero-shot Image Recognition Using Relational Matching, Adaptation and
Calibration," Accepted at the International Joint Conference on Neural Networks (IJCNN), 2019.
• Debasmit, Das, and C. S. George Lee. "A Two-Stage Approach to Few-Shot Learning for Image Recognition." Under
review at the IEEE Transactions on Image Processing (TIP).
19. 19
Classifying a Dog and a Cat
Training Samples Testing Samples
Training and testing
distribution different!!
Domain Adaptation
Required!!
Motivation
20. 20
Domain Adaptation Methods
Non-Deep Methods Deep Methods
• Instance Re-weighting
[Dai et al. ICML’07].
• Parameter Adaptation
[Bruzzone et al. TPAMI’10].
• Feature Transformation
[Fernando et al. ICCV’13]
[Sun et al. AAAI’16].
• Discrepancy Based.
[Long et al. ICML’15]
[Sun et al. ECCV’16]
• Adversarial Based.
[Ganin et al. JMLR’16]
[Tzeng et al. CVPR’17]
Motivation
21. 21
Motivation
Discrepancy Based Methods
Mostly global metrics. Minimizes
statistics of data like
covariance [Sun et al. ECCV’16] or
maximum mean discrepancy
[Long et al. ICML’15].
Local Method
Optimal Transport
[Courty et al. TPAMI’17]. Basically
point-point matching. Using first
order information
might be misleading.
Higher order method
Uses structural
information. Relation
between data is used
to match domains.
23. 23
Qualitative Comparison
Methods 1st order
Matching
2nd order
matching
3rd order
Matching
Representation
Learning
Method 1
[Das & Lee, EAAI’18]
Method 2
[Das & Lee, ICIP’18]
Method 3
[Das & Lee, ICANN’18]
Yes No
No
No
No
Yes
Yes
Yes Yes
YesYesYes
Introduction
• Each method has an unique optimization.
• Method 3 has additional training stage.
- Method 1 (Graph Matching)
- Method 2 (Hyper-graph Matching)
- Method 3 (Graph Matching &
Representation Learning)
24. 24
Proposed Approach
Construct graphs
from source & target
samples
Find Matching
between sample
graphs
Map source domain
to target domain
Method 1
Debasmit Das and C.S. George Lee, “Sample-to-Sample Correspondence for Unsupervised
Domain Adaptation," Engineering Applications of Artificial Intelligence, Vol. 73, pp. 80-91,
May 2018.
For details refer :
(Graph Matching)
First-order Matching
Second-order
Matching
Class
Regularization
Optimization : Conditional Gradient + Network Simplex
25. 27
Real Data : SURF Features Method 1
Comparison with previous work
CalTech (C)
MNIST (M)
USPS (U)
Amazon (A)
DSLR (D) Webcam (W)
Deep FeaturesSURF Features
26. 28
Proposed Approach
Find Exemplars from
both Domains
Find Matching
between exemplar
hyper-graphs
Map source domain
to target domain
Method 2
Debasmit Das and C.S. George Lee, “Unsupervised Domain Adaptation Using Regularized
Hyper-Graph Matching,” Proceedings of 2018 IEEE International Conference on Image
Processing (ICIP), Athens, Greece, pp. 3758-3762, October 7-10, 2018.
For details refer :
(Hyper-graph Matching)
Optimization : Conditional Gradient + ADMM
Affinity
Propagation
28. 32
Proposed Approach
Construct graphs
from source &
target representation
Method 3
Find matching
between source &
target representation
Optimize the shared
domain representation
and classifier
Stage 1 Result
Select unlabeled
target samples with
confident outputs
Apply novel large
margin loss on these
samples
Optimize the shared
domain representation
and classifier
Stage 2
DOMAIN
DISCREPNACY
REDUCED
LARGE
MARGIN
CLASSIFIER
Debasmit Das and CS George Lee. “Graph Matching and Pseudo-Label Guided Deep
Unsupervised Domain Adaptation,” Proceedings of 2018 International Conference on Artificial
Neural Networks (ICANN), Rhodes, Greece, pp. 342-352, October 4-7, 2018.
For details refer :
(Graph Matching & Representation Learning)
Optimization : Stochastic Gradient Descent
32. 37
Conclusions
Recognition Performance
Computational Efficiency
Method 1
Method 1
Method 2
Method 2
Method 3
Method 3
• Proposed three methods on
Unsupervised Domain Adaptation.
• Use graph/hyper-graph matching to
minimize domain discrepancy.
• Competitive results on standard DA
datasets for image recognition.
Summary
Impact
• Localized and structure based matching for
data distributions. Extensible to time series
as well.
• Beyond DA to any method requiring
distribution matching.
• Generative Modelling – Use GM loss
instead of KL divergence for GANs.
• Anomaly Detection – Samples with higher
GM losses are anomalies.
34. 39
Small Sample Learning (SSL) Introduction
DA SSL
• Same Task but different domain. • Same Domain but different task.
• Source Task and Target task have
same set of categories.
• Source Task and Target task have
different set of categories.
• Source domain has abundant
data but target domain has
few/zero labelled data.
• Source domain has abundant
data but target domain has
few/zero labelled data.
• Distribution discrepancy less
between source and target task.
• Distribution discrepancy more
between source and target task.
35. 40
Few Shot Learning (FSL)
Feature Space
• Base Categories (source domain) contains
abundant labelled data.
• Novel Categories (target domain) contains
few labelled data.
• Need to extract useful knowledge from
source domain.
• Apply that to recognize novel categories.
36. 41
FSLRelated Work of FSL
Few-shot
Learning
Metric
Learning
Meta
Learning
Generative
approaches
Alternative
approaches
Matching Net
[Vinayls et al. NIPS’16]
Proto Net
[Snell et al. NIPS’17]
LSTM Optimization
[Ravi et al. ICLR’17]
Model Agnositc
[Finn et al. ICML’17]
GAN Hallucination
[Wang et al. CVPR ’18]
Autoencoder
[Schwartz et al. NIPS’18]
Model Regression
[Wang et al. ECCV ’16]
Memory Augmented
[Santoro et al. ICML’16]
[Learn a metric]
[Learn to learn] [Generate data]
37. 42
Challenges of FSL
Curse of dimensionality Uncertain Class Variance Ill sampling of data
Given novel data sample
Unknown prototype location
Given class mean location
σ
Unknown class variance
Sparser feature space
with increasing
dimensionality
FSL
38. 43
Proposed Solution
Find Discriminative low
Dimensional space
Estimate Class Variance
from class mean location
Learn category-agnostic
transformation
Given novel data sample
Unknown prototype location
Given class mean location
σ
Unknown class variance
Transformation
Use relative distances
1
2
3
4
5
FSL
40. 45
Preliminary Results
MiniImageNet [Vinayls et al. NIPS’16]
Omniglot [Lake et al. CogSci’11]
Datasets
FSL
Comparison with previous work on Omniglot.
Comparison with previous work on MiniImageNet.
41. 46
Zero Shot Learning (ZSL)
Feature Space
Semantic Space
• Base Categories (source domain) contains
abundant labelled data.
• Novel Categories (target domain) contains
unlabeled data.
• However, class level semantic information
available for all categories.
• Need to relate the feature space and space.
42. 47
ZSLRelated Work of ZSL
Zero-shot
Learning
Embedding
Methods
Transductive
approaches
Generative
approaches
Hybrid
approaches
Linear embedding
[Bernardino et al.
ICML’15]
Deep Embedding
[Zhang et al. CVPR’17]
Multiview
[Fu et al. TPAMI’15]
Dictionary Learning
[Kodirov et al. ICCV’15]
Constrained VAE
[Verma et al. CVPR’18]
Feature GAN
[Xian et al. CVPR’18]
Semantic Similarity
[Zhang et al. CVPR ’15]
Convex Combo
[Norouzi et al. ICLR’13]
[Relate feature & semantics ]
[Use unlabeled test data] [Generate data]
[Novel class from old class]
43. 48
Challenges of ZSL
Hubness Domain Shift Seen Class Biasedness
• In the GZSL Setting ,
test data can be from
both seen and
unseen categories.
• Most unseen test data
predicted as seen
categories.
• Initially studied by
Chao et al. ECCV’16.
• Domain shift between
unseen test data and
unseen semantic
embeddings.
• Since unseen test data
not used in training.
• Phenomenon where only
a few candidates
become nearest
neighbor predictions.
• Due to curse of
dimensionality.
• Initially studied by
Radovanovic et al.
JMLR’10.
ZSL
44. 49
Proposed Solution
One-one and pair-wise
regression
Domain Adaptation Calibration
• Need to adapt semantic
embeddings to unseen
test data.
• Use previous DA
approach [Das & Lee
EAAI’18].
• Find correspondences
between semantic
embedding and unseen
test samples.
• Scaled calibration to
reduce scores of seen
classes.
• Implicit reduction of
variance of seen
classes.
• Structural matching
between semantics
and feature.
• Implicit
reduction of
dimensionality.
ZSL
46. 51
Preliminary Results
• Animals with Atrributes (AwA2)
[Lampert et al. TPAMI’14
• Pascal & Yahoo (aPY)
[Farhadi et al. CVPR’09]
• Caltech-UCSD Birds (CUB)
[Welinder et al. ‘10]
• Scene Understanding (SUN)
[Patterson et al. CVPR’12]
Datasets
ZSL
Comparison with previous work on four datasets.
47. 52
Hypothesis Transfer Learning (HTL)
Feature Space
• No access to base categories (source domain)
data.
• Only high-level information about source
categories available. E.g. Model parameters, class
prototypes etc.
• Novel Categories (target domain) contains
unlabeled data.
• Need to relate the source domain models and
target domain samples.
HTL
48. 53
HTLRelated Work of HTL
Linear
Combination
Kernel
Method
Feature
selection
[ Orabona et al.
ICRA’09, Tommasi et
al. TPAMI’14]
[ Jie et al. ICCV’11] [ Kurborskij et al. CVIU’17].
Relatively unexplored Topic. Constrained Target Models to
be some combination of source models.
[Linear Models] [Non-linear Models] [Greedy Method]
49. 54
Proposed Direction
• Previous works only consider constant contribution of
source models across target domain.
• Need to consider variable contribution of source model
across target domain.
• Need to find model-to-sample correspondences similar
to sample-to-sample correspondences.
• Need to constrain correspondences to obtain variable
solutions. E.g. sparser correspondences to ensure
redundant contribution of source models on the same
target sample or to prevent negative transfer.
HTL
50. 55
Conclusion
• Justified the importance of transfer learning for machine learning and real
world applications.
• Discussed three methods on unsupervised domain adaptation which
produced competitive results with respect to previous methods.
• Described ongoing work about few/zero shot learning with some
preliminary results. More analyses of the methods required.
• Proposed future work in which would consist of small sample learning in a
more realistic scenario.
• Insight : Common theme of using structure, relations and matching in all
the methods.