SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Members : 김상현, 고형권, 허다운, 전선영, 김준철, 조경진
Presenter : 조경진
2021.10.10
Joint Contrastive Learning with Infinite Possibilities
(Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei , NeurIPS, 2020)
https://arxiv.org/abs/2009.14776
Contents
1. Introduction
2. Related Work
3. Methods
4. Experiments
5. Conclusion
1
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
https://amitness.com/2020/03/illustrated-simclr/ 2
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
https://amitness.com/2020/03/illustrated-simclr/ 3
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
CNN
Supervised learning
1. Label cost
1. Task-specific solution (poor generalizability) 4
Introduction Related Work Methods Experiments Conclusions
❖ What is Contrastive learning?
Contrastive learning is a machine learning technique used to learn the general features of a dataset
without labels by teaching the model which data points are similar or different.
5
Contrastive learning
1. No label cost
1. Task-agnostic solution (Good generalizability)
How can we train model using contrastive learning method?
Introduction Related Work Methods Experiments Conclusions
6
Encoder
Mechanism to get representations that allow the machine to
understand an image.
Introduction Related Work Methods Experiments Conclusions
❖ How do we train contrastive learning?
CNN
Image Representation
Similarity measure
Mechanism to compute the similarity of two images.
Similarity ( , )
7
Similar and dissimilar images
Example pairs of similar and dissimilar images.
Image Same
Different
Different
Introduction Related Work Methods Experiments Conclusions
❖ How do we train contrastive learning?
Positive sample to be recognized as a similar image, Negative sample to be recognized as a different image.
Score ( , )
maximize
https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html
Score ( , )
Score ( , ) Score ( , )
+ 𝚺K-1
we can change softmax and NLL in this probability.
InfoNCE
8
Score ( , )
minimize
Introduction Related Work Methods Experiments Conclusions
❖ InfoNCE loss
Maximize mutual Information
https://arxiv.org/pdf/1807.03748.pdf
Mutual Information
MI maximize
9
InfoNCE
❖ InfoNCE loss
Maximize mutual Information
https://arxiv.org/pdf/1807.03748.pdf
MI maximize
Mutual Information
Introduction Related Work Methods Experiments Conclusions
10
InfoNCE
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning
Pretext Task : pre-designed tasks for networks to solve,
and visual features are learned by learning objective
functions of pretext tasks.
Downstream Task : computer vision applications that are
used to evaluate the quality of features learned by self-
supervised learning.
No label
Label
11
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf 12
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf
Rotation
0, 90, 180, 270 degree random rotation
4-class classification problem
13
Introduction Related Work Methods Experiments Conclusions
❖ Self supervised learning in Pretext task
https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf
Jigsaw Puzzle
Random shuffle patches,
Permutation prediction
Context Prediction
Relative patches location prediction
14
Introduction Related Work Methods Experiments Conclusions
❖ MoCo
https://arxiv.org/abs/1911.05722 ,
https://arxiv.org/abs/2003.04297
● Dictionary as a queue
(JCL adopted)
● Momentum update
● Memory bank
● Contrastive loss function
○ one positive pair
○ many negative pair
❖ SimCLR
https://arxiv.org/abs/2002.05709
● Random cropping resize to
original, color distortion,
Gaussian blur
● Encoder f (ᐧ)
● Projection head g (ᐧ) mapping
● Contrastive loss function
○ one positive pair
○ many negative pair
15
● SimCLR augmentation strategy
● MLP projection head
● Cosine LR scheduler
● Same loss function as Moco v1
❖ MoCo v2
How does JCL actually work?
Introduction Related Work Methods Experiments Conclusions
16
Introduction Related Work Methods Experiments Conclusions
● Most existing contrastive learning methods only consider independently penalizing the incompatibility of
each single positive query-key pair at a time.
● This does not fully leverage the assumption that all augmentations corresponding to a specific
image are statistically dependent on each other, and are simultaneously similar to the query.
❖ Positive, Negative key pairs
17
Introduction Related Work Methods Experiments Conclusions
● Instance xi to be firstly augmented M+1 times (M for positive keys and 1 extra for the query itself).
● Unfortunately, this is not computational applicable, as carrying all (M +1)×N pairs in a mini-batch would
quickly drain the GPU memory when M is even moderately small.
● Capitalizing on this application of infinity limit, the statistics of the data become sufficient to reach the
same goal of multiple pairing.
❖ Positive, Negative key pairs
18
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 19
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 20
● Jensen's inequality
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
https://en.wikipedia.org/wiki/Jensen%27s_inequality 21
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
Implicit
Explicit
22
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
23
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
24
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
25
● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8).
● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x,
we have the moment generation function that satisfy.
● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
Question?
Introduction Related Work Methods Experiments Conclusions
26
Introduction Related Work Methods Experiments Conclusions
❖ Implicit Semantic Data Augmentation
https://arxiv.org/abs/1909.12220
https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/
● We hope that directions corresponding to meaningful transformations for each class
are well represented by the principal components of the covariance matrix of that class.
● Consider training a deep network G with weights Θ on a training set
D = {(xi,yi)}Ni=1, where yi ∈ {1, . . . , C} is the label of the i-th sample xi over C classes.
● Let the A-dimensional vector ai = [ai1, . . . , aiA]T = G(xi, Θ) denote the deep features of xi
learned by G, and aij indicate the j-th element of ai.
● To obtain semantic directions to augment ai, we randomly sample vectors from
a zero-mean multi-variate normal distribution N(0 , Σyi),
where Σyi is the class-conditional covariance matrix estimated from the features of all
the samples in class yi.
● During training, C covariance matrices are computed, one for each class.
The augmented feature a ̃i is obtained by translating ai along a random direction
sampled from N(0 , λΣyi).
27
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
● Variable k+
i follows a Gaussian distribution
,where μk+ and Σk+
are respectively the mean and the covariance matrix
of the positive keys for qi+.
● this assumption is legitimate as
positive keys more or less share similarities in the
embedding space around some mean value as they
all mirror the nature of the query to some extent.
Explicit representation in negative keys
https://blog.daum.net/shksjy/228
https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/
Implicit representation in positive keys
1 2 3 4
28
Introduction Related Work Methods Experiments Conclusions
❖ Joint Contrastive Learning
29
JCL pre-trained , MoCo v2 pre-trained ResNet-18 network ,
directly extract the features.
Fig(3(a)): calculating the cosine similarities of each pair of
features (every pair of 2 out of 32 augmentations) belonging
to the same identity image.
Fig(3(b)): Accordingly, the variance of the obtained
features belonging to the same image is much smaller.
non-contrastive
contrastive
supervised
unsupervised
Positive key number M 𝜏 λ Batch size N
hyper-parameter 5 0.2 4.0 512
Introduction Related Work Methods Experiments Conclusions
❖ Conclusion in Joint Contrastive Learning
● JCL implicitly involves the joint learning of an infinite number of query-key pairs for each instance.
● By applying rigorous bounding techniques on the proposed formulation, we transfer the originally intractable loss
function into practical implementations.
● Most notably, although JCL is an unsupervised algorithm, the JCL pre-trained networks even outperform its
supervised counterparts in many scenarios.
30
Question?
Introduction Related Work Methods Experiments Conclusions
31

Weitere ähnliche Inhalte

Was ist angesagt?

Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 
Algoritmos probabilísticos
Algoritmos probabilísticosAlgoritmos probabilísticos
Algoritmos probabilísticosRodrigo Ferreira
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learningRidge-i, Inc.
 
face recognition system using LBP
face recognition system using LBPface recognition system using LBP
face recognition system using LBPMarwan H. Noman
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...Jihwan Bang
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019Pluribus One
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkNAVER Engineering
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learningBabu Priyavrat
 
2015 Deep learning and fuzzy logic
2015 Deep learning and fuzzy logic2015 Deep learning and fuzzy logic
2015 Deep learning and fuzzy logicJan Eite Bullema
 

Was ist angesagt? (20)

Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Algoritmos probabilísticos
Algoritmos probabilísticosAlgoritmos probabilísticos
Algoritmos probabilísticos
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
 
Introduction to Few shot learning
Introduction to Few shot learningIntroduction to Few shot learning
Introduction to Few shot learning
 
face recognition system using LBP
face recognition system using LBPface recognition system using LBP
face recognition system using LBP
 
Meta-Learning Presentation
Meta-Learning PresentationMeta-Learning Presentation
Meta-Learning Presentation
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
fitness function
fitness functionfitness function
fitness function
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
 
Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Research of adversarial example on a deep neural network
Research of adversarial example on a deep neural networkResearch of adversarial example on a deep neural network
Research of adversarial example on a deep neural network
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
2015 Deep learning and fuzzy logic
2015 Deep learning and fuzzy logic2015 Deep learning and fuzzy logic
2015 Deep learning and fuzzy logic
 

Ähnlich wie Joint contrastive learning with infinite possibilities

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolationEasyStudy3
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?Dhafer Malouche
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptAnshika865276
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)NYversity
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
An introduction to machine learning for particle physics
An introduction to machine learning for particle physicsAn introduction to machine learning for particle physics
An introduction to machine learning for particle physicsAndrew Lowe
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdfshampy kamboj
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problemstheijes
 
A0311010106
A0311010106A0311010106
A0311010106theijes
 

Ähnlich wie Joint contrastive learning with infinite possibilities (20)

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
An introduction to machine learning for particle physics
An introduction to machine learning for particle physicsAn introduction to machine learning for particle physics
An introduction to machine learning for particle physics
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf1-s2.0-S0047259X16300689-main (1).pdf
1-s2.0-S0047259X16300689-main (1).pdf
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Distributed ADMM
Distributed ADMMDistributed ADMM
Distributed ADMM
 
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
 
A0311010106
A0311010106A0311010106
A0311010106
 

Mehr von taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

Mehr von taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Kürzlich hochgeladen

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 

Kürzlich hochgeladen (20)

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 

Joint contrastive learning with infinite possibilities

  • 1. Members : 김상현, 고형권, 허다운, 전선영, 김준철, 조경진 Presenter : 조경진 2021.10.10 Joint Contrastive Learning with Infinite Possibilities (Qi Cai, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei , NeurIPS, 2020) https://arxiv.org/abs/2009.14776
  • 2. Contents 1. Introduction 2. Related Work 3. Methods 4. Experiments 5. Conclusion 1
  • 3. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. https://amitness.com/2020/03/illustrated-simclr/ 2
  • 4. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. https://amitness.com/2020/03/illustrated-simclr/ 3
  • 5. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. CNN Supervised learning 1. Label cost 1. Task-specific solution (poor generalizability) 4
  • 6. Introduction Related Work Methods Experiments Conclusions ❖ What is Contrastive learning? Contrastive learning is a machine learning technique used to learn the general features of a dataset without labels by teaching the model which data points are similar or different. 5 Contrastive learning 1. No label cost 1. Task-agnostic solution (Good generalizability)
  • 7. How can we train model using contrastive learning method? Introduction Related Work Methods Experiments Conclusions 6
  • 8. Encoder Mechanism to get representations that allow the machine to understand an image. Introduction Related Work Methods Experiments Conclusions ❖ How do we train contrastive learning? CNN Image Representation Similarity measure Mechanism to compute the similarity of two images. Similarity ( , ) 7 Similar and dissimilar images Example pairs of similar and dissimilar images. Image Same Different Different
  • 9. Introduction Related Work Methods Experiments Conclusions ❖ How do we train contrastive learning? Positive sample to be recognized as a similar image, Negative sample to be recognized as a different image. Score ( , ) maximize https://ankeshanand.com/blog/2020/01/26/contrative-self-supervised-learning.html Score ( , ) Score ( , ) Score ( , ) + 𝚺K-1 we can change softmax and NLL in this probability. InfoNCE 8 Score ( , ) minimize
  • 10. Introduction Related Work Methods Experiments Conclusions ❖ InfoNCE loss Maximize mutual Information https://arxiv.org/pdf/1807.03748.pdf Mutual Information MI maximize 9 InfoNCE
  • 11. ❖ InfoNCE loss Maximize mutual Information https://arxiv.org/pdf/1807.03748.pdf MI maximize Mutual Information Introduction Related Work Methods Experiments Conclusions 10 InfoNCE
  • 12. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning Pretext Task : pre-designed tasks for networks to solve, and visual features are learned by learning objective functions of pretext tasks. Downstream Task : computer vision applications that are used to evaluate the quality of features learned by self- supervised learning. No label Label 11
  • 13. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf 12
  • 14. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf Rotation 0, 90, 180, 270 degree random rotation 4-class classification problem 13
  • 15. Introduction Related Work Methods Experiments Conclusions ❖ Self supervised learning in Pretext task https://arxiv.org/pdf/1803.07728.pdf,https://arxiv.org/pdf/1603.09246.pdf https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Doersch_Unsupervised_Visual_Representation_ICCV_2015_paper.pdf Jigsaw Puzzle Random shuffle patches, Permutation prediction Context Prediction Relative patches location prediction 14
  • 16. Introduction Related Work Methods Experiments Conclusions ❖ MoCo https://arxiv.org/abs/1911.05722 , https://arxiv.org/abs/2003.04297 ● Dictionary as a queue (JCL adopted) ● Momentum update ● Memory bank ● Contrastive loss function ○ one positive pair ○ many negative pair ❖ SimCLR https://arxiv.org/abs/2002.05709 ● Random cropping resize to original, color distortion, Gaussian blur ● Encoder f (ᐧ) ● Projection head g (ᐧ) mapping ● Contrastive loss function ○ one positive pair ○ many negative pair 15 ● SimCLR augmentation strategy ● MLP projection head ● Cosine LR scheduler ● Same loss function as Moco v1 ❖ MoCo v2
  • 17. How does JCL actually work? Introduction Related Work Methods Experiments Conclusions 16
  • 18. Introduction Related Work Methods Experiments Conclusions ● Most existing contrastive learning methods only consider independently penalizing the incompatibility of each single positive query-key pair at a time. ● This does not fully leverage the assumption that all augmentations corresponding to a specific image are statistically dependent on each other, and are simultaneously similar to the query. ❖ Positive, Negative key pairs 17
  • 19. Introduction Related Work Methods Experiments Conclusions ● Instance xi to be firstly augmented M+1 times (M for positive keys and 1 extra for the query itself). ● Unfortunately, this is not computational applicable, as carrying all (M +1)×N pairs in a mini-batch would quickly drain the GPU memory when M is even moderately small. ● Capitalizing on this application of infinity limit, the statistics of the data become sufficient to reach the same goal of multiple pairing. ❖ Positive, Negative key pairs 18
  • 20. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 19
  • 21. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 20
  • 22. ● Jensen's inequality Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning https://en.wikipedia.org/wiki/Jensen%27s_inequality 21
  • 23. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning Implicit Explicit 22
  • 24. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 23 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 25. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 24 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 26. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 25 ● Under this Gaussian assumption, Eq.(7) eventually reduces Eq.(8). ● For any random variable x that follows Gaussian distribution x∼N (μ, Σ), where μ is the expectation of x, Σ is the variance of x, we have the moment generation function that satisfy. ● We scale the influence of Σk+ by multiplying it with a scalar λ. This tuning of λ hopefully stabilizes the training.
  • 27. Question? Introduction Related Work Methods Experiments Conclusions 26
  • 28. Introduction Related Work Methods Experiments Conclusions ❖ Implicit Semantic Data Augmentation https://arxiv.org/abs/1909.12220 https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/ ● We hope that directions corresponding to meaningful transformations for each class are well represented by the principal components of the covariance matrix of that class. ● Consider training a deep network G with weights Θ on a training set D = {(xi,yi)}Ni=1, where yi ∈ {1, . . . , C} is the label of the i-th sample xi over C classes. ● Let the A-dimensional vector ai = [ai1, . . . , aiA]T = G(xi, Θ) denote the deep features of xi learned by G, and aij indicate the j-th element of ai. ● To obtain semantic directions to augment ai, we randomly sample vectors from a zero-mean multi-variate normal distribution N(0 , Σyi), where Σyi is the class-conditional covariance matrix estimated from the features of all the samples in class yi. ● During training, C covariance matrices are computed, one for each class. The augmented feature a ̃i is obtained by translating ai along a random direction sampled from N(0 , λΣyi). 27
  • 29. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning ● Variable k+ i follows a Gaussian distribution ,where μk+ and Σk+ are respectively the mean and the covariance matrix of the positive keys for qi+. ● this assumption is legitimate as positive keys more or less share similarities in the embedding space around some mean value as they all mirror the nature of the query to some extent. Explicit representation in negative keys https://blog.daum.net/shksjy/228 https://neurohive.io/en/news/semantic-data-augmentation-improves-neural-network-s-generalization/ Implicit representation in positive keys 1 2 3 4 28
  • 30. Introduction Related Work Methods Experiments Conclusions ❖ Joint Contrastive Learning 29 JCL pre-trained , MoCo v2 pre-trained ResNet-18 network , directly extract the features. Fig(3(a)): calculating the cosine similarities of each pair of features (every pair of 2 out of 32 augmentations) belonging to the same identity image. Fig(3(b)): Accordingly, the variance of the obtained features belonging to the same image is much smaller. non-contrastive contrastive supervised unsupervised Positive key number M 𝜏 λ Batch size N hyper-parameter 5 0.2 4.0 512
  • 31. Introduction Related Work Methods Experiments Conclusions ❖ Conclusion in Joint Contrastive Learning ● JCL implicitly involves the joint learning of an infinite number of query-key pairs for each instance. ● By applying rigorous bounding techniques on the proposed formulation, we transfer the originally intractable loss function into practical implementations. ● Most notably, although JCL is an unsupervised algorithm, the JCL pre-trained networks even outperform its supervised counterparts in many scenarios. 30
  • 32. Question? Introduction Related Work Methods Experiments Conclusions 31

Hinweis der Redaktion

  1. Joint Contrastive Learning with Infinite Possibilities : https://arxiv.org/abs/2009.14776 contrastive learning
  2. implicit semantic augmentation은 explicit 하게 여러 augmentation으로 직접 모델에게 가르치는게 아니라 latent feature map에 대한 의미있는 direction을 가르치기 위하여 가우시안 분포라고 가정하여 샘플링 하도록 합니다. 그리고 그에 해당하는 가우시안 분포의 mean, var를 학습시키도록 합니다. 그리고 PC들을 잘 꺼내어 의미있는 direction을 뽑히도록 한다.