A Simple Framework for Contrastive Learning of Visual Representations

•

0 gefällt mir•262 views

Review : A Simple Framework for Contrastive Learning of Visual Representat - by Seunghyun Hwang (Yonsei University, Severance Hospital, Center for Clinical Data Science)

Technologie

A Simple Framework for Contrastive Learning of
Visual Representations
Hwang seung hyun
Yonsei University Severance Hospital CCIDS
Google Research Team, Geoffrey Hinton | ICML 2020
2020.07.19

Introduction Related Work Methods and
Experiments
01 02 03
Conclusion
04
Yonsei Unversity Severance Hospital CCIDS
Contents

SimCLR
Introduction – Proposal
• Most mainstream approaches for unsupervised visual representations fall into one
of two classes: Generative or Discriminative
Introduction / Related Work / Methods and Experiments / Conclusion
01Predict rotation
Autoencoder
Jigsaw Puzzle

SimCLR
Introduction – Proposal
• Discriminative approaches based on Contrastive Learning in the latent space have
recently shown state-of-the-art results.
Introduction / Related Work / Methods and Experiments / Conclusion
02[AMDIM]

SimCLR
Introduction – Proposal
Introduction / Related Work / Methods and Experiments / Conclusion
• SimCLR outperform previous
work but is simpler
• SimCLR achieves 76.5% top-1
accuracy which is a 7% relative
improvement over previous SOTA
method.
• When fine-tuned with only 1% of
the ImageNet labels, SimCLR
achieved 85.8% top-5 accuracy.
03

SimCLR
Introduction – Contributions
• Composition of multiple data augmentation operations is crucial in unsupervised
contrastive learning.
• Learnable nonlinear transformation between the representation and the
contrastive loss substantially improves the quality of the learned representations.
• Contrastive learning benefits from larger batch sizes and longer training.
• Like supervised learning, contrastive learning benefits from deeper and wider
networks.
• Representation learning with contrastive cross entropy loss benefits from
normalized embeddings and temperature parameter.
Introduction / Related Work / Methods and Experiments / Conclusion
04

Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
05
Handcrafted pretext tasks
• Relative patch prediction
• Jigsaw puzzles
• Rotation Prediction
• Colorization Prediction
.
.
. Limits the GENERALITY of
learned Representations!

Related Work
Introduction / Related Work / Methods and Experiments / Conclusion
06
Contrastive Visual Representation learning
• CPC V2
• AMDIM
• Rotation Prediction
• MoCo (by Facebook)
.
.
. “SimCLR” is their composition!

Methods and Experiments
Overall Architecture
Introduction / Related Work / Methods and Experiments / Conclusion
07
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Architecture – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
08
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Architecture – loss function
Introduction / Related Work / Methods and Experiments / Conclusion
09
https://www.youtube.com/watch?v=5lsmGWtxnKA

Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
10
https://www.youtube.com/watch?v=5lsmGWtxnKA
Final Loss
Architecture – loss function
[Normalized temperature-scaled cross entropy loss]

Methods and Experiments
Introduction / Related Work / Methods and Experiments / Conclusion
11
Algorithm

Methods and Experiments
Other Methods
Introduction / Related Work / Methods and Experiments / Conclusion
12
• Large Batch Size
- Use Train batch 4096
- Use LARS optimizer, since using standard SGD/Momentum optimizer
might be unstable within large batch.
• Global BN
- When training with data parallelism, BN mean and variance are
typically aggregated locally per device.
- Aggregated BN mean and variance over all devices during the training.

Methods and Experiments
Evaluation Protocal
Introduction / Related Work / Methods and Experiments / Conclusion
13
• Dataset and Metrics
- ImageNet
- Transfer Learning on wide range of datasets (Cifar10, Cifar100, etc)
• Default Setting
- Random crop and resize, Color distortions, Gaussian blur
- ResNet-50 as base encoder network
- 2-layer MLP projection head to project the representation to a 128-
dimensional latent space
- Trained at batch size 4096 for 100 epochs

Methods and Experiments
Ablation Studies – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
14
“Coloring”, “Crop” = Crucial

Methods and Experiments
Ablation Studies – Data Augmentation
Introduction / Related Work / Methods and Experiments / Conclusion
15

Methods and Experiments
Ablation Studies – Nonlinear Projection head
Introduction / Related Work / Methods and Experiments / Conclusion
16
• The hidden layer before the projection head is a better representation
than the layer after

Methods and Experiments
Ablation Studies – Batch Size
Introduction / Related Work / Methods and Experiments / Conclusion
17

Methods and Experiments
Results – ImageNet
Introduction / Related Work / Methods and Experiments / Conclusion
18

Methods and Experiments
Results – semi-supervised learning
Introduction / Related Work / Methods and Experiments / Conclusion
19

Methods and Experiments
Results – Transfer Learning
Introduction / Related Work / Methods and Experiments / Conclusion
20

Conclusion
Introduction / Related Work / Methods and Experiments / Conclusion
• Improved considerably over previous methods for self-
supervised, semi-supervised, and transfer learning.
• SimCLR Differs from standard supervised learning on
ImageNet only in the choice of data augmentation, the use
of a nonlinear head, and the loss function.
• Despite a recent surge in interest, self-supervised learning
remains undervalued.
21

Empfohlen

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee

VAEs for multimodal disentanglementAntonio Tejero de Pablos

GANs and ApplicationsHoang Nguyen

Basic Generative Adversarial NetworksDong Heon Cho

Variational AutoencoderMark Chang

Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsJoonyoung Yi

DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion ModelBoahKim2

Empfohlen

PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee

VAEs for multimodal disentanglementAntonio Tejero de Pablos

GANs and ApplicationsHoang Nguyen

Basic Generative Adversarial NetworksDong Heon Cho

Variational AutoencoderMark Chang

Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsJoonyoung Yi

DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion ModelBoahKim2

PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest

Meta-Learning PresentationAkshayaNagarajan10

Emerging Properties in Self-Supervised Vision TransformersSungchul Kim

GANs Presentation.pptxMAHMOUD729246

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Universitat Politècnica de Catalunya

GAN - Theory and ApplicationsEmanuele Ghelfi

AutoML lectures (ACDL 2019)Joaquin Vanschoren

Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo

PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee

Transfer LearningHichem Felouat

“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance

Introduction to Generative Adversarial NetworksBennoG1

(文献紹介) 画像復元：Plug-and-Play ADMMMorpho, Inc.

Variational Autoencoders For Image GenerationJason Anderson

Resnetashwinjoseph95

[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...Deep Learning JP

Multi-Task Learning in Deep Neural Networks.pptxibrahimalshareef3

Evolution of the StyleGAN familyVitaly Bondar

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club

How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Seunghyun Hwang

Weitere ähnliche Inhalte

Was ist angesagt?

PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIWithTheBest

Meta-Learning PresentationAkshayaNagarajan10

Emerging Properties in Self-Supervised Vision TransformersSungchul Kim

GANs Presentation.pptxMAHMOUD729246

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Universitat Politècnica de Catalunya

GAN - Theory and ApplicationsEmanuele Ghelfi

AutoML lectures (ACDL 2019)Joaquin Vanschoren

Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo

PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee

Transfer LearningHichem Felouat

“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance

Introduction to Generative Adversarial NetworksBennoG1

(文献紹介) 画像復元：Plug-and-Play ADMMMorpho, Inc.

Variational Autoencoders For Image GenerationJason Anderson

Resnetashwinjoseph95

[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...Deep Learning JP

Multi-Task Learning in Deep Neural Networks.pptxibrahimalshareef3

Evolution of the StyleGAN familyVitaly Bondar

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club

Was ist angesagt? (20)

PR-355: Masked Autoencoders Are Scalable Vision Learners

Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI

Meta-Learning Presentation

Emerging Properties in Self-Supervised Vision Transformers

GANs Presentation.pptx

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)

GAN - Theory and Applications

AutoML lectures (ACDL 2019)

Masked Autoencoders Are Scalable Vision Learners.pptx

PR-284: End-to-End Object Detection with Transformers(DETR)

Transfer Learning

“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...

Introduction to Generative Adversarial Networks

(文献紹介) 画像復元：Plug-and-Play ADMM

Variational Autoencoders For Image Generation

Resnet

[DL輪読会]"CyCADA: Cycle-Consistent Adversarial Domain Adaptation"&"Learning Se...

Multi-Task Learning in Deep Neural Networks.pptx

Evolution of the StyleGAN family

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI

Ähnlich wie A Simple Framework for Contrastive Learning of Visual Representations

How useful is self-supervised pretraining for Visual tasks?Seunghyun Hwang

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Seunghyun Hwang

Performance of Go on Multicore SystemsNo J

MSCV Capstone Spring 2020 Presentation - RL for ADMayank Gupta

Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev

modelling-and-simulation-made-easy-with-simulink.pdfGBBarrios

Large Scale GAN Training for High Fidelity Natural Image SynthesisSeunghyun Hwang

Toward a Traceable, Explainable and fair JD/Resume Recommendation SystemAmine Barrak

“Houston, we have a model...” Introduction to MLOpsRui Quintino

ASS_SDM2012_AliMDO_Lab

Multi-core Real-time Simulation of High-Fidelity Vehicle Models using Open St...Modelon

Cp04invitedslideJean-Francois Puget

Bart Knaack - The Truth About Model-Based Quality ImprovementsTEST Huddle

深度學習在AOI的應用CHENHuiMei

Single Camera Calibration Using Partially Visible Calibration Objects Based o...Yuji Oyamada

BC 504-Operation ResearchPCTE

AIAA-SDM-SequentialSampling-2012OptiModel

Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark

Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt

Face Identification for Humanoid Robotthomaswangxin

Ähnlich wie A Simple Framework for Contrastive Learning of Visual Representations (20)

How useful is self-supervised pretraining for Visual tasks?

FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...

Performance of Go on Multicore Systems

MSCV Capstone Spring 2020 Presentation - RL for AD

Troubleshooting Deep Neural Networks - Full Stack Deep Learning

modelling-and-simulation-made-easy-with-simulink.pdf

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Toward a Traceable, Explainable and fair JD/Resume Recommendation System

“Houston, we have a model...” Introduction to MLOps

ASS_SDM2012_Ali

Multi-core Real-time Simulation of High-Fidelity Vehicle Models using Open St...

Cp04invitedslide

Bart Knaack - The Truth About Model-Based Quality Improvements

深度學習在AOI的應用

Single Camera Calibration Using Partially Visible Calibration Objects Based o...

BC 504-Operation Research

AIAA-SDM-SequentialSampling-2012

Using Bayesian Optimization to Tune Machine Learning Models

Face Identification for Humanoid Robot

Mehr von Seunghyun Hwang

An annotation sparsification strategy for 3D medical image segmentation via r...Seunghyun Hwang

Do wide and deep networks learn the same things? Uncovering how neural networ...Seunghyun Hwang

Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...Seunghyun Hwang

Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model Seunghyun Hwang

Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...Seunghyun Hwang

End-to-End Object Detection with TransformersSeunghyun Hwang

Deep Generative model-based quality control for cardiac MRI segmentation Seunghyun Hwang

Segmenting Medical MRI via Recurrent Decoding CellSeunghyun Hwang

Progressive learning and Disentanglement of hierarchical representationsSeunghyun Hwang

Learning Sparse Networks using Targeted DropoutSeunghyun Hwang

ResNeSt: Split-Attention NetworksSeunghyun Hwang

DeepStrip: High Resolution Boundary RefinementSeunghyun Hwang

Your Classifier is Secretly an Energy based model and you should treat it lik...Seunghyun Hwang

A Probabilistic U-Net for Segmentation of Ambiguous ImagesSeunghyun Hwang

Mix Conv: Mixed Depthwise Convolutional KernelsSeunghyun Hwang

Mehr von Seunghyun Hwang (15)

An annotation sparsification strategy for 3D medical image segmentation via r...

Do wide and deep networks learn the same things? Uncovering how neural networ...

Deep Learning-based Fully Automated Detection and Quantification of Acute Inf...

Diagnosis of Maxillary Sinusitis in Water’s view based on Deep learning model

Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...

End-to-End Object Detection with Transformers

Deep Generative model-based quality control for cardiac MRI segmentation

Segmenting Medical MRI via Recurrent Decoding Cell

Progressive learning and Disentanglement of hierarchical representations

Learning Sparse Networks using Targeted Dropout

ResNeSt: Split-Attention Networks

DeepStrip: High Resolution Boundary Refinement

Your Classifier is Secretly an Energy based model and you should treat it lik...

A Probabilistic U-Net for Segmentation of Ambiguous Images

Mix Conv: Mixed Depthwise Convolutional Kernels

Kürzlich hochgeladen

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

CNIC Information System with Pakdata Cf In Pakistandanishmna97

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede

MINDCTI Revenue Release Quarter One 2024MIND CTI

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Exploring Multimodal Embeddings with MilvusZilliz

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

presentation ICT roal in 21st century education

Axa Assurance Maroc - Insurer Innovation Award 2024

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

How to Troubleshoot Apps for the Modern Connected Worker

Boost Fertility New Invention Ups Success Rates.pdf

CNIC Information System with Pakdata Cf In Pakistan

Ransomware_Q4_2023. The report. [EN].pdf

Strategies for Landing an Oracle DBA Job as a Fresher

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

MINDCTI Revenue Release Quarter One 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Exploring Multimodal Embeddings with Milvus

A Simple Framework for Contrastive Learning of Visual Representations

1. A Simple Framework for Contrastive Learning of Visual Representations Hwang seung hyun Yonsei University Severance Hospital CCIDS Google Research Team, Geoffrey Hinton | ICML 2020 2020.07.19

2. Introduction Related Work Methods and Experiments 01 02 03 Conclusion 04 Yonsei Unversity Severance Hospital CCIDS Contents

3. SimCLR Introduction – Proposal • Most mainstream approaches for unsupervised visual representations fall into one of two classes: Generative or Discriminative Introduction / Related Work / Methods and Experiments / Conclusion 01Predict rotation Autoencoder Jigsaw Puzzle

4. SimCLR Introduction – Proposal • Discriminative approaches based on Contrastive Learning in the latent space have recently shown state-of-the-art results. Introduction / Related Work / Methods and Experiments / Conclusion 02[AMDIM]

5. SimCLR Introduction – Proposal Introduction / Related Work / Methods and Experiments / Conclusion • SimCLR outperform previous work but is simpler • SimCLR achieves 76.5% top-1 accuracy which is a 7% relative improvement over previous SOTA method. • When fine-tuned with only 1% of the ImageNet labels, SimCLR achieved 85.8% top-5 accuracy. 03

6. SimCLR Introduction – Contributions • Composition of multiple data augmentation operations is crucial in unsupervised contrastive learning. • Learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations. • Contrastive learning benefits from larger batch sizes and longer training. • Like supervised learning, contrastive learning benefits from deeper and wider networks. • Representation learning with contrastive cross entropy loss benefits from normalized embeddings and temperature parameter. Introduction / Related Work / Methods and Experiments / Conclusion 04

7. Related Work Introduction / Related Work / Methods and Experiments / Conclusion 05 Handcrafted pretext tasks • Relative patch prediction • Jigsaw puzzles • Rotation Prediction • Colorization Prediction . . . Limits the GENERALITY of learned Representations!

8. Related Work Introduction / Related Work / Methods and Experiments / Conclusion 06 Contrastive Visual Representation learning • CPC V2 • AMDIM • Rotation Prediction • MoCo (by Facebook) . . . “SimCLR” is their composition!

9. Methods and Experiments Overall Architecture Introduction / Related Work / Methods and Experiments / Conclusion 07 https://www.youtube.com/watch?v=5lsmGWtxnKA

10. Methods and Experiments Architecture – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 08 https://www.youtube.com/watch?v=5lsmGWtxnKA

11. Methods and Experiments Architecture – loss function Introduction / Related Work / Methods and Experiments / Conclusion 09 https://www.youtube.com/watch?v=5lsmGWtxnKA

12. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion 10 https://www.youtube.com/watch?v=5lsmGWtxnKA Final Loss Architecture – loss function [Normalized temperature-scaled cross entropy loss]

13. Methods and Experiments Introduction / Related Work / Methods and Experiments / Conclusion 11 Algorithm

14. Methods and Experiments Other Methods Introduction / Related Work / Methods and Experiments / Conclusion 12 • Large Batch Size - Use Train batch 4096 - Use LARS optimizer, since using standard SGD/Momentum optimizer might be unstable within large batch. • Global BN - When training with data parallelism, BN mean and variance are typically aggregated locally per device. - Aggregated BN mean and variance over all devices during the training.

15. Methods and Experiments Evaluation Protocal Introduction / Related Work / Methods and Experiments / Conclusion 13 • Dataset and Metrics - ImageNet - Transfer Learning on wide range of datasets (Cifar10, Cifar100, etc) • Default Setting - Random crop and resize, Color distortions, Gaussian blur - ResNet-50 as base encoder network - 2-layer MLP projection head to project the representation to a 128- dimensional latent space - Trained at batch size 4096 for 100 epochs

16. Methods and Experiments Ablation Studies – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 14 “Coloring”, “Crop” = Crucial

17. Methods and Experiments Ablation Studies – Data Augmentation Introduction / Related Work / Methods and Experiments / Conclusion 15

18. Methods and Experiments Ablation Studies – Nonlinear Projection head Introduction / Related Work / Methods and Experiments / Conclusion 16 • The hidden layer before the projection head is a better representation than the layer after

19. Methods and Experiments Ablation Studies – Batch Size Introduction / Related Work / Methods and Experiments / Conclusion 17

20. Methods and Experiments Results – ImageNet Introduction / Related Work / Methods and Experiments / Conclusion 18

21. Methods and Experiments Results – semi-supervised learning Introduction / Related Work / Methods and Experiments / Conclusion 19

22. Methods and Experiments Results – Transfer Learning Introduction / Related Work / Methods and Experiments / Conclusion 20

23. Conclusion Introduction / Related Work / Methods and Experiments / Conclusion • Improved considerably over previous methods for self- supervised, semi-supervised, and transfer learning. • SimCLR Differs from standard supervised learning on ImageNet only in the choice of data augmentation, the use of a nonlinear head, and the loss function. • Despite a recent surge in interest, self-supervised learning remains undervalued. 21