SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Intriguing Properties of Contrastive Losses
1
2021 NIPS
Google research Ting Chen et al.
Presenter
이재윤
Fundamental Team
김동현,김채현,박종익,송헌,양현모,오대환,이근배,조남경,최재규
Contents
1. Motivation
2. Generalized Contrastive Loss
3. Contrastive Learning with multiple objects
4. Feature Suppression
5. Conclusion
1. Motivation
Paper Selection Background
• 2021년 NIPS에서 발표가 되었고 이미 20의 인용 수
• 본 논문을 서술한 Ting Chen은 SimCLR의 주저자
• Self-Supervision / Contrastive Learning 은 최근 많은 각광을 받고 있다[1]
• Contrastive loss로 학습 시켰을 때 나타나는 3가지 특성
1) Generalize contrastive loss, check the performance
2) Learning representation of image with multiple objects
3) Feature suppression of contrastive learning
[1] Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR 2021)
5
Contrastive Learning
CNN
CNN
𝑓𝑖
𝑓𝑗
“Negative” Pairs
CNN
CNN
𝑓𝑖
𝑓𝑗
“Positive” Pairs
• 학습 데이터들을 대조
✓ Euclidian distance / Cosine similarity etc,
✓ Negative pairs 는 서로 밀어내도록
✓ Positive pairs 는 서로 가까워지도록
• 많은 종류의 Contrastive Learning이 존재
✓ N-pair, InfoNCE, Triplet, Lifted Structured
• Label정보 없이도 supervised learning성능과 비슷
6
SimCLR framework
• Contrastive Learning을 통해 representation 학습
• Training Procedure
I. Learn representation with unlabeled data
II. For classification task, fine-tune the network with small
amount of labeled data
• Image로부터 여러 view를 만들기 위해 augmentation 사용
✓ Random Cropping
✓ Color Distortion
2. Generalized Contrastive Loss
8
Generalize Contrastive Loss
zi , zj : representation of two augmented view
sim u, v = uT
v/( u v )
𝜏 ∶ scaler
ℳℬ ∶ randomly sampled mini − batch
• Cross-Entropy기반으로 한 contrastive loss가 많이 사용
ℒNT−Xent = −
1
n
෍
i,j∈ℳℬ
log
exp(sim(zi, zj)/𝜏)
σk=1
2n
1{i ≠ j}exp(sim(zi, zj)/𝜏)
• 위 식을 아래와 같은 형태로 일반화
✓ ℒalignment은 augment된 view끼리 일치하도록
✓ ℒdistribution은 representation이 prior distribution과 일치하도록
ℒgeneralized contrastive = ℒalignment + 𝜆ℒdistribution
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n
1{i ≠ j}exp(sim(zi, zj)/𝜏)
9
Generalize Contrastive Loss
• Mutual Information과의 관계
𝐼 𝑈; 𝑉 = −𝐻 𝑈 𝑉 + 𝐻 𝑈
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n
1{i ≠ j}exp(sim(zi, zj)/𝜏)
• Maximize ① = Minimize 𝐮𝐧𝐜𝐞𝐫𝐭𝐚𝐢𝐧𝐭𝐲
• Minimize ② = Maiximize 𝐞𝐧𝐭𝐫𝐨𝐩𝐲
uncertainty entropy
uncertainty
① ②
10
Generalize Contrastive Loss
• 여기서 가정하는 prior distribution은 uniform hypersphere
✓ LogSumExp 항은 representation이 hypersphere에 uniform하게 분포되도록 한다.
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n
1{i ≠ j}exp(sim(zi, zj)/𝜏)
• Uniform hypersphere외 다양한 prior distribution을 사용 시 어떻게 학습되는가
✓ Uniform hypersphere를 제외한 prior들은 LogSumExp로 계산 불가
✓ SWD(Sliced Wasserstein Distance)를 사용하여 Loss계산
11
Generalize Contrastive Loss
• SimCLR 실험세팅 하에서 다양한 prior distribution에 따른 성능 조사
✓ CIFAR-10실험에서 200 epoch 이상 학습 시 차이가 거의 없음
✓ ImageNet실험에서 2-layer의 projection head에선 차이가 존재
✓ 그러나 3+layer / non-linear projection head 에선 차이가 사라짐
12
Generalize Contrastive Loss
• 깊은 projection head는 batch size와 유사한 기능
Contrastive Loss Different Losses with 3-Layer
projection head
Question
3. Contrastive Learning with
multiple objects
15
Instance-based objectives
• 대부분의 contrastive learning method(SimCLR, BYOL) instance level에서 objective정의
✓ Image가 단일 representation vector로 변환
✓ Objective는 local region보단 global representation에 대해 동작
Q1. Image에 여러 객체가 있는 경우 instance-based objective는 잘 동작 할까?
Q2. Global representation뿐만 아니라 객체의 부분적인 특징점도 학습 할 수 있을까?
16
SimCLR learn on multiple objects
• Self-supervised learning을 위한 데이터셋은 중앙에 물체가 위치
✓ MNIST, CIFAR-10, ImageNet
✓ 통제가능한 환경을 만들기 위한 MultiDigits dataset
• MultiDigit dataset
✓ Two placement strategies
Random
In-Grid
17
SimCLR learn on multiple objects
• SimCLR vs Supervised Learning
① 동일한 augmentation policy(Random Cropping/Resize)로 ResNet-18 사전학습
② 28x28크기의 MNIST로 classifier 학습, 평가
• SimCLR도 여러 물체가 동시에 있는 이미지로부터 학습이 가능
✓ Supervised와 마찬가지로 Digit의 개수가 8개까지는 높은 정확도
18
SimCLR learn local features
• Intermediate feature에 K-means 적용
✓ Representation이 잘 학습되었다면 grouping이 잘 될 것
✓ ImageNet으로 사전학습시킨 ResNet 50 2x 으로 ImageNet/COCO데이터 추론
✓ ResNet의 2,3,4번째 block으로부터의 feature들에 대해 grouping 진행
• Supervised Learning 및 RGB raw pixel의 clustering 결과와 비교
SimCLR learn local features
19
SimCLR learn local features
20
Question
4. Feature Suppression
23
Feature Suppression
• SimCLR[1] 에서 color distortion이 없으면 성능이 하락
✓ Color feature가 다른 feature들을 억제
✓ Positive pair끼리 consistency를 높이는데 color정보만으로도 상승
✓ Contrastive learning은 진행되어도 좋은 representation은 얻을 수 없음
[1] A Simple framework for contrastive learning of visual representations,
Feature Suppression
color feature suppresses object class
24
Feature Suppression
• Augmentation말고도 feature suppression을 발생시키는 알려지지 않은 요인
• 제어가능한 실험환경을 위한 3종류의 3데이터셋
1) DigitOnImageNet dataset(MNIST feature vs ImageNet feature)
25
DigitOnImageNet Experiments
• 60k의 MNIST데이터 중 unique하게 사용한 개수에 따라 성능 조사
• 사용한 고유한 MNIST데이터와 ImageNet데이터에 대한 성능은 trade-off
✓ 간단한 feature(MNIST)가 복잡한 feature(ImageNet)를 억제
✓ SimCLR와 같은 contrastive losses를 통해선 competing features들을 모두 학습할 수 없음
26
MultiDigit Experiments
2) MultiDigits dataset(Bigger Digit feature vs Smaller Digit feature)
✓ 1st Digit 20X20,
✓ 2nd Digit [20~80]X[20~80]
✓ MultiDigit으로 SimCLR 및 Supervised Learning으로 ResNet-18 사전학습
✓ 단일 MNIST데이터로 classifier학습, 평가
27
MultiDigit Experiments
• Supervised learning은 2nd 숫자의 크기에 관계없이 학습이 1st 숫자 구별 능력 변동 없음
• SimCLR는 2nd 숫자의 크기가 커짐에 따라 20X20크기의 1st 의 숫자 구별 능력 하락
✓ Dominant object는 Smaller object의 학습을 억제
28
RandomBit Experiments
3) RandomBit dataset(Ch. RGB feature vs Ch. RandomBit feature)
✓ RGB channel + RandomBit channel
✓ RandomBit는 [1,log2 n]에서 랜덤으로 sample, n은 통제변인
✓ Augmentation은 RGB채널에만 적용
29
RandomBit Experiments
• 모든 종류의 데이터셋(MNIST,CIFAR-10,ImageNet)이 RandomBit channel에 의해 성능이 악화
✓ n의 값이 10일때부터 급격하게 classification성능 악화
✓ 학습하기 쉬운 random bit를 학습함으로써 다른 유용한 feature들을 억제
✓ 다른 종류의 contrastive loss(BYOL), batch size, momentum contrast를 사용해도 방지X
Question
5. Conclusion
32
Summary
I. 다양한 prior에서의 Contrastive Learning성능
✓ Projection-head가 깊어지고 큰 batch size에서는 큰 차이 없다.
II. Contrastive learning은 여러 객체가 있는 Image로부터 학습할 수 있다.
III. Feature Suppression
✓ Simple feature > complicate feature
✓ Bigger feature > Smaller feature
✓ Easy-to-learn mutual information > all features in RGB
❖ Contrastive Learning은 Supervised Learning만큼 좋은 특성 및 성능을 가지고 있으나,
Contrastive Learning의 Mutual Information을 높이는 방식이 꼭 좋은 representation을 보장하지 않는다.
Thank you
33

Weitere ähnliche Inhalte

Was ist angesagt?

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceSungchul Kim
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningSangwoo Mo
 
adversarial robustness through local linearization
 adversarial robustness through local linearization adversarial robustness through local linearization
adversarial robustness through local linearizationtaeseon ryu
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper reviewtaeseon ryu
 
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsConcept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsLightbend
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)NamHyuk Ahn
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Optimizers
OptimizersOptimizers
OptimizersIl Gu Yi
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detectionDaeHeeKim31
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksYoonho Lee
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
 

Was ist angesagt? (20)

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
adversarial robustness through local linearization
 adversarial robustness through local linearization adversarial robustness through local linearization
adversarial robustness through local linearization
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
Concept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML ApplicationsConcept Drift: Monitoring Model Quality In Streaming ML Applications
Concept Drift: Monitoring Model Quality In Streaming ML Applications
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Optimizers
OptimizersOptimizers
Optimizers
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
 
cnn ppt.pptx
cnn ppt.pptxcnn ppt.pptx
cnn ppt.pptx
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 

Ähnlich wie Intriguing properties of contrastive losses

Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhleeDongheon Lee
 
인공신경망
인공신경망인공신경망
인공신경망종열 현
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQNCurt Park
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 
Nationality recognition
Nationality recognitionNationality recognition
Nationality recognition준영 박
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장Sunggon Song
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive LearningSungchul Kim
 
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdf
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdfWHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdf
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdftaeseon ryu
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링Edward Yoon
 
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AITae Young Lee
 
[Paper review] neural production system
[Paper review] neural production system[Paper review] neural production system
[Paper review] neural production systemSeonghoon Jung
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)MYEONGGYU LEE
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basicsJinho Lee
 
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13Gyubin Son
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
딥러닝을 위한 Tensor flow(skt academy)
딥러닝을 위한 Tensor flow(skt academy)딥러닝을 위한 Tensor flow(skt academy)
딥러닝을 위한 Tensor flow(skt academy)Tae Young Lee
 

Ähnlich wie Intriguing properties of contrastive losses (20)

Deep learning overview
Deep learning overviewDeep learning overview
Deep learning overview
 
Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhlee
 
인공신경망
인공신경망인공신경망
인공신경망
 
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 
Nationality recognition
Nationality recognitionNationality recognition
Nationality recognition
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 
Supervised Constrastive Learning
Supervised Constrastive LearningSupervised Constrastive Learning
Supervised Constrastive Learning
 
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdf
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdfWHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdf
WHAT DO VISION TRANSFORMERS LEARN A VISUAL EXPLORATION.pdf
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
Naive ML Overview
Naive ML OverviewNaive ML Overview
Naive ML Overview
 
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
 
[Paper review] neural production system
[Paper review] neural production system[Paper review] neural production system
[Paper review] neural production system
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)
 
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
 
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
딥러닝을 위한 Tensor flow(skt academy)
딥러닝을 위한 Tensor flow(skt academy)딥러닝을 위한 Tensor flow(skt academy)
딥러닝을 위한 Tensor flow(skt academy)
 

Mehr von taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 

Mehr von taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 

Intriguing properties of contrastive losses

  • 1. Intriguing Properties of Contrastive Losses 1 2021 NIPS Google research Ting Chen et al. Presenter 이재윤 Fundamental Team 김동현,김채현,박종익,송헌,양현모,오대환,이근배,조남경,최재규
  • 2. Contents 1. Motivation 2. Generalized Contrastive Loss 3. Contrastive Learning with multiple objects 4. Feature Suppression 5. Conclusion
  • 4. Paper Selection Background • 2021년 NIPS에서 발표가 되었고 이미 20의 인용 수 • 본 논문을 서술한 Ting Chen은 SimCLR의 주저자 • Self-Supervision / Contrastive Learning 은 최근 많은 각광을 받고 있다[1] • Contrastive loss로 학습 시켰을 때 나타나는 3가지 특성 1) Generalize contrastive loss, check the performance 2) Learning representation of image with multiple objects 3) Feature suppression of contrastive learning [1] Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR 2021)
  • 5. 5 Contrastive Learning CNN CNN 𝑓𝑖 𝑓𝑗 “Negative” Pairs CNN CNN 𝑓𝑖 𝑓𝑗 “Positive” Pairs • 학습 데이터들을 대조 ✓ Euclidian distance / Cosine similarity etc, ✓ Negative pairs 는 서로 밀어내도록 ✓ Positive pairs 는 서로 가까워지도록 • 많은 종류의 Contrastive Learning이 존재 ✓ N-pair, InfoNCE, Triplet, Lifted Structured • Label정보 없이도 supervised learning성능과 비슷
  • 6. 6 SimCLR framework • Contrastive Learning을 통해 representation 학습 • Training Procedure I. Learn representation with unlabeled data II. For classification task, fine-tune the network with small amount of labeled data • Image로부터 여러 view를 만들기 위해 augmentation 사용 ✓ Random Cropping ✓ Color Distortion
  • 8. 8 Generalize Contrastive Loss zi , zj : representation of two augmented view sim u, v = uT v/( u v ) 𝜏 ∶ scaler ℳℬ ∶ randomly sampled mini − batch • Cross-Entropy기반으로 한 contrastive loss가 많이 사용 ℒNT−Xent = − 1 n ෍ i,j∈ℳℬ log exp(sim(zi, zj)/𝜏) σk=1 2n 1{i ≠ j}exp(sim(zi, zj)/𝜏) • 위 식을 아래와 같은 형태로 일반화 ✓ ℒalignment은 augment된 view끼리 일치하도록 ✓ ℒdistribution은 representation이 prior distribution과 일치하도록 ℒgeneralized contrastive = ℒalignment + 𝜆ℒdistribution ℒNT−Xent = − 1 n ෍ i,j sim(zi , zj ) + 𝜏 n 𝜆 ෍ i log ෍ k=1 2n 1{i ≠ j}exp(sim(zi, zj)/𝜏)
  • 9. 9 Generalize Contrastive Loss • Mutual Information과의 관계 𝐼 𝑈; 𝑉 = −𝐻 𝑈 𝑉 + 𝐻 𝑈 ℒNT−Xent = − 1 n ෍ i,j sim(zi , zj ) + 𝜏 n 𝜆 ෍ i log ෍ k=1 2n 1{i ≠ j}exp(sim(zi, zj)/𝜏) • Maximize ① = Minimize 𝐮𝐧𝐜𝐞𝐫𝐭𝐚𝐢𝐧𝐭𝐲 • Minimize ② = Maiximize 𝐞𝐧𝐭𝐫𝐨𝐩𝐲 uncertainty entropy uncertainty ① ②
  • 10. 10 Generalize Contrastive Loss • 여기서 가정하는 prior distribution은 uniform hypersphere ✓ LogSumExp 항은 representation이 hypersphere에 uniform하게 분포되도록 한다. ℒNT−Xent = − 1 n ෍ i,j sim(zi , zj ) + 𝜏 n 𝜆 ෍ i log ෍ k=1 2n 1{i ≠ j}exp(sim(zi, zj)/𝜏) • Uniform hypersphere외 다양한 prior distribution을 사용 시 어떻게 학습되는가 ✓ Uniform hypersphere를 제외한 prior들은 LogSumExp로 계산 불가 ✓ SWD(Sliced Wasserstein Distance)를 사용하여 Loss계산
  • 11. 11 Generalize Contrastive Loss • SimCLR 실험세팅 하에서 다양한 prior distribution에 따른 성능 조사 ✓ CIFAR-10실험에서 200 epoch 이상 학습 시 차이가 거의 없음 ✓ ImageNet실험에서 2-layer의 projection head에선 차이가 존재 ✓ 그러나 3+layer / non-linear projection head 에선 차이가 사라짐
  • 12. 12 Generalize Contrastive Loss • 깊은 projection head는 batch size와 유사한 기능 Contrastive Loss Different Losses with 3-Layer projection head
  • 14. 3. Contrastive Learning with multiple objects
  • 15. 15 Instance-based objectives • 대부분의 contrastive learning method(SimCLR, BYOL) instance level에서 objective정의 ✓ Image가 단일 representation vector로 변환 ✓ Objective는 local region보단 global representation에 대해 동작 Q1. Image에 여러 객체가 있는 경우 instance-based objective는 잘 동작 할까? Q2. Global representation뿐만 아니라 객체의 부분적인 특징점도 학습 할 수 있을까?
  • 16. 16 SimCLR learn on multiple objects • Self-supervised learning을 위한 데이터셋은 중앙에 물체가 위치 ✓ MNIST, CIFAR-10, ImageNet ✓ 통제가능한 환경을 만들기 위한 MultiDigits dataset • MultiDigit dataset ✓ Two placement strategies Random In-Grid
  • 17. 17 SimCLR learn on multiple objects • SimCLR vs Supervised Learning ① 동일한 augmentation policy(Random Cropping/Resize)로 ResNet-18 사전학습 ② 28x28크기의 MNIST로 classifier 학습, 평가 • SimCLR도 여러 물체가 동시에 있는 이미지로부터 학습이 가능 ✓ Supervised와 마찬가지로 Digit의 개수가 8개까지는 높은 정확도
  • 18. 18 SimCLR learn local features • Intermediate feature에 K-means 적용 ✓ Representation이 잘 학습되었다면 grouping이 잘 될 것 ✓ ImageNet으로 사전학습시킨 ResNet 50 2x 으로 ImageNet/COCO데이터 추론 ✓ ResNet의 2,3,4번째 block으로부터의 feature들에 대해 grouping 진행 • Supervised Learning 및 RGB raw pixel의 clustering 결과와 비교
  • 19. SimCLR learn local features 19
  • 20. SimCLR learn local features 20
  • 23. 23 Feature Suppression • SimCLR[1] 에서 color distortion이 없으면 성능이 하락 ✓ Color feature가 다른 feature들을 억제 ✓ Positive pair끼리 consistency를 높이는데 color정보만으로도 상승 ✓ Contrastive learning은 진행되어도 좋은 representation은 얻을 수 없음 [1] A Simple framework for contrastive learning of visual representations, Feature Suppression color feature suppresses object class
  • 24. 24 Feature Suppression • Augmentation말고도 feature suppression을 발생시키는 알려지지 않은 요인 • 제어가능한 실험환경을 위한 3종류의 3데이터셋 1) DigitOnImageNet dataset(MNIST feature vs ImageNet feature)
  • 25. 25 DigitOnImageNet Experiments • 60k의 MNIST데이터 중 unique하게 사용한 개수에 따라 성능 조사 • 사용한 고유한 MNIST데이터와 ImageNet데이터에 대한 성능은 trade-off ✓ 간단한 feature(MNIST)가 복잡한 feature(ImageNet)를 억제 ✓ SimCLR와 같은 contrastive losses를 통해선 competing features들을 모두 학습할 수 없음
  • 26. 26 MultiDigit Experiments 2) MultiDigits dataset(Bigger Digit feature vs Smaller Digit feature) ✓ 1st Digit 20X20, ✓ 2nd Digit [20~80]X[20~80] ✓ MultiDigit으로 SimCLR 및 Supervised Learning으로 ResNet-18 사전학습 ✓ 단일 MNIST데이터로 classifier학습, 평가
  • 27. 27 MultiDigit Experiments • Supervised learning은 2nd 숫자의 크기에 관계없이 학습이 1st 숫자 구별 능력 변동 없음 • SimCLR는 2nd 숫자의 크기가 커짐에 따라 20X20크기의 1st 의 숫자 구별 능력 하락 ✓ Dominant object는 Smaller object의 학습을 억제
  • 28. 28 RandomBit Experiments 3) RandomBit dataset(Ch. RGB feature vs Ch. RandomBit feature) ✓ RGB channel + RandomBit channel ✓ RandomBit는 [1,log2 n]에서 랜덤으로 sample, n은 통제변인 ✓ Augmentation은 RGB채널에만 적용
  • 29. 29 RandomBit Experiments • 모든 종류의 데이터셋(MNIST,CIFAR-10,ImageNet)이 RandomBit channel에 의해 성능이 악화 ✓ n의 값이 10일때부터 급격하게 classification성능 악화 ✓ 학습하기 쉬운 random bit를 학습함으로써 다른 유용한 feature들을 억제 ✓ 다른 종류의 contrastive loss(BYOL), batch size, momentum contrast를 사용해도 방지X
  • 32. 32 Summary I. 다양한 prior에서의 Contrastive Learning성능 ✓ Projection-head가 깊어지고 큰 batch size에서는 큰 차이 없다. II. Contrastive learning은 여러 객체가 있는 Image로부터 학습할 수 있다. III. Feature Suppression ✓ Simple feature > complicate feature ✓ Bigger feature > Smaller feature ✓ Easy-to-learn mutual information > all features in RGB ❖ Contrastive Learning은 Supervised Learning만큼 좋은 특성 및 성능을 가지고 있으나, Contrastive Learning의 Mutual Information을 높이는 방식이 꼭 좋은 representation을 보장하지 않는다.