SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
김 성 철
Contents
1. Introduction
2. Related Work
3. Method
4. Experiments
5. Discussion
2
https://github.com/rlatjcj/Paper-code-review/tree/master/%5BArxiv%202020%5D%20Supervised%20Contrastive%20Learning
Introduction
• Cross-entropy
• Supervised learning에서 많이 사용됨
• Label distribution과 empirical distribution의 KL-divergence로 생각할 수 있음
• 개선 : label smoothing, self-distillation, mixup, …
• 같은 클래스는 가깝게, 다른 클래스는 멀게 하는 supervised training loss
• Self-supervised learning에서 좋은 성능을 보이는 metric learning과 연관이 있는 contrastive objective functions
3
Introduction
• Supervised contrastive leaning
• Multiple positives를 적용한 contrastive loss로 full supervised setting에서 contrastive learning 진행
• Cross-entropy와 비교했을 때 top-1 accuracy와 robustness에서 SOTA
• Cross-entropy보다 hyperparameter 범위에 덜 민감
• Hard positive, negative의 학습을 촉진하는 gradient
• Single positive, negative가 사용되었을 때 triplet loss와 연관성
4
Related Work
• Self-supervised representation learning + Metric learning + Supervised learning
• Cross-entropy loss는 deep network를 학습하기 위한 좋은 loss function
• 하지만, 왜 target label이 정답이 되어야 하는지 명확하지 않음
• 더 좋은 target label vector가 존재함을 증명 (S. Yang et al., 2015)
• Cross-entropy loss의 다른 단점
• Sensitivity to noise labels (Z. Zhang et al., 2018., S. Sukhbaatar et al., 2014.)
• Adversarial examples (G Elsayed et al., 2018., K. Nar et al., 2019.)
• Poor margins (K. Cao et al., 2019.)
• 다른 loss제안, reference label distribution을 바꾸는 것이 더 효과적
• Label smoothing, Mixup, CutMix, Knowledge Distillation
5
S. Yang et al., 2015.
K. Cao et al., 2019.
Related Work
• 최근에 self-supervised representation learning이 각광받는 중
• Language domain
• Pre-trained embedding (BERT, Xlnet, …)
• Image domain
• Embedding을 배우기 위해 사용 (e.g. 가려진 signal부분을 가려지지 않은 부분으로 예측)
• Self-supervised representation learning은 contrastive learning으로 바뀌는 중
• Noise contrastive estimation, N-pair loss
• 학습할 때 deep network의 마지막 레이어에 loss를 적용
• 테스트 시 downstream transfer task, fine tuning, direct retrieval task를 위해 이전 레이어를 활용
6
Related Work
• Contrastive learning은 metric learning과 triplet loss와 연관있음
• 공통점은 powerful representation을 학습
• Triplet loss와 contrastive loss의 차이점
• Data point당 positive, negative pair 수
• Triplet loss : one positive, one negative
• Supervised metric learning : positive in same class, negative in different class (hard negative mining)
• Self-supervised contrastive loss : one positive pair selected using either co-occurrence or using data augmentation
• Supervised contrastive와 가장 유사한 것은 soft-nearest neighbor loss
• 공통점 : embedding을 normalize, euclidean distance를 inner product로 교체
• 개선 : data augmentation, disposable contrastive head, two-stage training
7
Method
• Representation Learning Framework
• Self-supervised contrastive learning을 사용한 Contrastive Multiview Coding, SimCLR과 구조적으로 유사
• 각 input image에 대해 두 개의 randomly augmented image 생성
• First stage : random crop + resize to the image’s native resolution
• Second stage : AutoAugment, RandAugment, SimAugment
• Encoder network(𝐸(⋅)) : augmented image ෥𝒙 를 representation vector 𝒓 = 𝐸 ෥𝒙 ∈ ℛ 𝐷 𝐸 로 나타냄
• E.g. ResNet50의 마지막 pooling layer(𝐷 𝐸)를 representation vector로 사용, unit hypersphere로 항상 normalize
• Projection network (𝑃(⋅)) : normalized representation vector 𝒓 을 contrastive loss 계산에 적합한 vector 𝒛 =
𝑃 𝒓 ∈ ℛ 𝐷 𝑃로 나타냄
• E.g. 𝐷 𝐸의 output vector size를 가진 multi-layer perceptron 사용, unit hypersphere로 항상 normalize
• 학습이 완료되면, projection network를 없애고 single linear layer로 교체
8
Method
• Contrastive Losses: Self-supervised and Supervised
• Contrastive loss의 이점은 유지한 채 labeled data를 효과적으로 사용하는 contrastive loss
• Randomly sampling으로 N개의 image/label pairs 생성 ( 𝑥 𝑘, 𝑦 𝑘 𝑘=1…𝑁)
• 실질적으로 학습에 사용되는 minibatch는 2N pairs ( ෤𝑥 𝑘, ෤𝑦 𝑘 𝑘=1…2𝑁)
• ෤𝑥2𝑘, ෤𝑥2𝑘−1 : two random augmentation of 𝑥 𝑘 𝑘 = 1 … 𝑁
• ෤𝑦2𝑘−1 = ෤𝑦2𝑘 = 𝑦 𝑘
9
https://towardsdatascience.com/exploring-simclr-a-simple-framework-for-contrastive-learning-of-visual-representations-158c30601e7e
Method
• Contrastive Losses: Self-supervised and Supervised
• Self-Supervised Contrastive Loss
• 𝑖 ∈ 1 … 2𝑁 : the index of an arbitrary augmented image
• 𝑗 𝑖 : the index of the other augmented image originating from the same source image
• 𝑧𝑙 = 𝑃 𝐸 ෥𝒙𝒍
• 𝑖 : anchor / 𝑗(𝑖) : positive / 𝑘 𝑘 = 1 … 2𝑁, 𝑘 ∉ 𝑖, 𝑗 : negatives
• 𝑧𝑖 ⋅ 𝑧𝑗 𝑖 : an inner product between the normalized vectors
• Similar view는 neighboring representation, dissimilar view는 non-neighbor representation으로 학습
10
Method
• Supervised Contrastive Loss
• Supervised learning은 같은 클래스에 속한 샘플이 하나보다 많기 때문에, Eq.2의 contrastive loss 사용할 수 없음
→ 임의의 positive를 사용하기 위한 loss제안
• 𝑁෤𝑦 𝑖
: the total number of images in the minibatch that have the same label, ෤𝑦𝑖, as the anchor, 𝑖
• Generalization to an arbitrary number of positives
• Loss는 encoder가 같은 클래스 내 모든 데이터를 Eq.2보다 더 robust clustering으로 가깝게 표현하도록 함
• Contrastive power increases with more negatives
• 분모에 positive는 물론 negative를 많이 추가하여 positive와 멀리 떨어지도록 함
11
Method
• Supervised Contrastive Loss Gradient Properties
• Supervised contrastive loss가 weak positive, negative보다 hard positive, negative에 초점을 둔 학습방법임을 증명
• Projection network 마지막에 normalization layer 추가 → gradient를 inner product할 때 조절
• 𝑤 : the projection network output immediately prior to normalize (e.g. 𝑧 = 𝑤/| 𝑤 |)
12
positives
negatives
Method
• Supervised Contrastive Loss Gradient Properties (Cont.)
• Easy positive, negative : small gradient / Hard positive, negative : large gradient
• Easy positive : 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 1, 𝑃𝑖𝑗 is large
• Hard positive : 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 0, 𝑃𝑖𝑗 is moderate
→ weak positive는 ||∇ 𝑧 𝑖
ℒ𝑖,𝑝𝑜𝑠
𝑠𝑢𝑝
|| 가 작고, hard positive는 큼
• Weak negative는 𝑧𝑖 ⋅ 𝑧𝑗 ≈ −1, hard negative는 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 0이므로 𝑧 𝑘 − 𝑧𝑖 ⋅ 𝑧 𝑘 ⋅ 𝑧𝑖 ⋅ 𝑃𝑖𝑘에 영향을 미침
• ( 𝑧𝑖 ⋅ 𝑧𝑙 ⋅ 𝑧𝑙 − 𝑧𝑙)은 normalization layer가 projection network 마지막 에 붙어있을 때만 그 역할을 할 수 있음
→ network에 normalization을 사용해야함!
13
Method
• Connections to Triplet Loss
• Contrastive learning은 triplet loss와 밀접한 연관이 있음
• Triplet loss는 positive와 negative를 각각 하나씩 사용
→ Positive와 negative가 각각 하나 씩인 contrastive loss로 생각할 수 있음
• Anchor, positive의 representation이 anchor, negative보다 더 잘 맞춰졌다고 가정 (𝑧 𝑎 ⋅ 𝑧 𝑝 ≫ 𝑧 𝑎 ⋅ 𝑧 𝑛)
• Margin 𝛼 = 2𝜏인 triplet loss와 동일한 형태
• 물론 성능이 더 좋고 계산량도 적음
14
Experiments
• ImageNet Classification Accuracy
• Supervised contrastive loss가 ImageNet SOTA기록
15
Experiments
• Robustness to Image Corruptions and Calibration
• ImageNet-C를 이용한 robustness에서도 SOTA기록
16
Experiments
• Hyperparameter Stability
• Hyperparameter의 변화에서도 안정된 성능 유지
• Different optimizer, data augmentation, learning rates
• Optimizer와 augmentation의 변화에서 작은 분산을 보임
• Hypersphere의 smoother geometry 덕분이라고 추측
17
Experiments
• Effect of Number of Positives
• Positive의 수의 영향에 대해 ablation study
• Positive의 수를 늘릴수록 성능이 좋아짐
• Positive의 수를 늘릴수록 computational cost가 높아지는 trade-off
• Positive에는 같은 데이터지만 다른 augmentation한 경우도 포함되고, 나머지는 같은 클래스지만 다른 샘플
• Self-supervised learning은 1positive
18
Experiments
• Training Details
• Epochs : 700 (pre-training stage)
• 각 스텝마다 cross-entropy보다 50%정도 느렸는데, 이는 미니배치 내 모든 요소들의 cross-product를 계산해야 했기 때문
• Batch size : ~8192 (2048도 충분)
• ResNet50은 8192, ResNet200은 2048 (큰 네트워크일수록 작은 batch size가 필요)
• 고정된 batch size에서 cross-entropy와 비슷한 성능을 갖는데 larger learning rate와 적은 epoch로 가능
• Temperature : 𝜏 = 0.07
• Temperature은 작을수록 좋은 성능을 보이지만, numerical stability 문제로 학습이 어려울 수 있음
• AutoAugment가 Supervised Contrastive와 Cross Entropy 모두에서 좋은 성능을 보임
• LARS, RMSProp, SGD with momentum을 different permutation으로 initial pre-training step, dense layer 학습
• ResNet을 cross entropy로 학습할 땐 momentum optimizer, supervised contrastive loss로 학습할 땐 pre-training에는 LARS,
dense layer 학습은 RMSProp
19
감 사 합 니 다
20

Weitere ähnliche Inhalte

Was ist angesagt?

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝Haesun Park
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper ReviewLEE HOSEONG
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련Haesun Park
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류Haesun Park
 
5.model evaluation and improvement(epoch#2) 2
5.model evaluation and improvement(epoch#2) 25.model evaluation and improvement(epoch#2) 2
5.model evaluation and improvement(epoch#2) 2Haesun Park
 
13.앙상블학습
13.앙상블학습13.앙상블학습
13.앙상블학습Minchul Jung
 
3.unsupervised learing(epoch#2)
3.unsupervised learing(epoch#2)3.unsupervised learing(epoch#2)
3.unsupervised learing(epoch#2)Haesun Park
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)홍배 김
 
5.model evaluation and improvement
5.model evaluation and improvement5.model evaluation and improvement
5.model evaluation and improvementHaesun Park
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리SANG WON PARK
 
딥러닝의 기본
딥러닝의 기본딥러닝의 기본
딥러닝의 기본deepseaswjh
 
4.representing data and engineering features(epoch#2)
4.representing data and engineering features(epoch#2)4.representing data and engineering features(epoch#2)
4.representing data and engineering features(epoch#2)Haesun Park
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsseungwoo kim
 
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...Haezoom Inc.
 
2.supervised learning(epoch#2)-2
2.supervised learning(epoch#2)-22.supervised learning(epoch#2)-2
2.supervised learning(epoch#2)-2Haesun Park
 
Boosting_suman
Boosting_sumanBoosting_suman
Boosting_sumansuman_lim
 

Was ist angesagt? (20)

[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 1장. 한눈에 보는 머신러닝
 
"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review"How does batch normalization help optimization" Paper Review
"How does batch normalization help optimization" Paper Review
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 4장. 모델 훈련
 
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류
[홍대 머신러닝 스터디 - 핸즈온 머신러닝] 3장. 분류
 
5.model evaluation and improvement(epoch#2) 2
5.model evaluation and improvement(epoch#2) 25.model evaluation and improvement(epoch#2) 2
5.model evaluation and improvement(epoch#2) 2
 
13.앙상블학습
13.앙상블학습13.앙상블학습
13.앙상블학습
 
DL from scratch(6)
DL from scratch(6)DL from scratch(6)
DL from scratch(6)
 
3.unsupervised learing(epoch#2)
3.unsupervised learing(epoch#2)3.unsupervised learing(epoch#2)
3.unsupervised learing(epoch#2)
 
Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)Focal loss의 응용(Detection & Classification)
Focal loss의 응용(Detection & Classification)
 
DL from scratch(4~5)
DL from scratch(4~5)DL from scratch(4~5)
DL from scratch(4~5)
 
5.model evaluation and improvement
5.model evaluation and improvement5.model evaluation and improvement
5.model evaluation and improvement
 
DL from scratch(1~3)
DL from scratch(1~3)DL from scratch(1~3)
DL from scratch(1~3)
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리
 
딥러닝의 기본
딥러닝의 기본딥러닝의 기본
딥러닝의 기본
 
4.representing data and engineering features(epoch#2)
4.representing data and engineering features(epoch#2)4.representing data and engineering features(epoch#2)
4.representing data and engineering features(epoch#2)
 
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
 
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...
 
CS294-112 18
CS294-112 18CS294-112 18
CS294-112 18
 
2.supervised learning(epoch#2)-2
2.supervised learning(epoch#2)-22.supervised learning(epoch#2)-2
2.supervised learning(epoch#2)-2
 
Boosting_suman
Boosting_sumanBoosting_suman
Boosting_suman
 

Ähnlich wie Supervised Constrastive Learning

"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...LEE HOSEONG
 
Guided policy search
Guided policy searchGuided policy search
Guided policy searchJaehyeon Park
 
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceSungchul Kim
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning성재 최
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...gohyunwoong
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionGyubin Son
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
The fastalgorithmfordeepbeliefnets
The fastalgorithmfordeepbeliefnetsThe fastalgorithmfordeepbeliefnets
The fastalgorithmfordeepbeliefnetsLee Gyeong Hoon
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...LEE HOSEONG
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term ReviewMario Cho
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropoutWuhyun Rico Shin
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터JEEHYUN PAIK
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningSang Jun Lee
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Networkagdatalab
 
Deep Learning from scratch 4장 : neural network learning
Deep Learning from scratch 4장 : neural network learningDeep Learning from scratch 4장 : neural network learning
Deep Learning from scratch 4장 : neural network learningJinSooKim80
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks ISang Jun Lee
 

Ähnlich wie Supervised Constrastive Learning (20)

"From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ..."From image level to pixel-level labeling with convolutional networks" Paper ...
"From image level to pixel-level labeling with convolutional networks" Paper ...
 
Review MLP Mixer
Review MLP MixerReview MLP Mixer
Review MLP Mixer
 
Guided policy search
Guided policy searchGuided policy search
Guided policy search
 
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and ConfidenceFixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
Deep learning overview
Deep learning overviewDeep learning overview
Deep learning overview
 
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttribution
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
The fastalgorithmfordeepbeliefnets
The fastalgorithmfordeepbeliefnetsThe fastalgorithmfordeepbeliefnets
The fastalgorithmfordeepbeliefnets
 
"simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r..."simple does it weakly supervised instance and semantic segmentation" Paper r...
"simple does it weakly supervised instance and semantic segmentation" Paper r...
 
ESM Mid term Review
ESM Mid term ReviewESM Mid term Review
ESM Mid term Review
 
[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout[한글] Tutorial: Sparse variational dropout
[한글] Tutorial: Sparse variational dropout
 
Ml for 정형데이터
Ml for 정형데이터Ml for 정형데이터
Ml for 정형데이터
 
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
 
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
 
GoogLenet
GoogLenetGoogLenet
GoogLenet
 
Deep Learning from scratch 4장 : neural network learning
Deep Learning from scratch 4장 : neural network learningDeep Learning from scratch 4장 : neural network learning
Deep Learning from scratch 4장 : neural network learning
 
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
 

Mehr von Sungchul Kim

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionSungchul Kim
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksSungchul Kim
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorSungchul Kim
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Sungchul Kim
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with ConvolutionsSungchul Kim
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationSungchul Kim
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Sungchul Kim
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic SegmentationSungchul Kim
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondSungchul Kim
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksSungchul Kim
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksSungchul Kim
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design SpacesSungchul Kim
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSungchul Kim
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningSungchul Kim
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Sungchul Kim
 

Mehr von Sungchul Kim (20)

PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo SupervisionPR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
PR-343: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
 
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural NetworksRevisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Revisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object DetectorRevisiting the Sibling Head in Object Detector
Revisiting the Sibling Head in Object Detector
 
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
Do Wide and Deep Networks Learn the Same Things: Uncovering How Neural Networ...
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Going Deeper with Convolutions
Going Deeper with ConvolutionsGoing Deeper with Convolutions
Going Deeper with Convolutions
 
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based LocalizationGrad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
 
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Reg...
 
Panoptic Segmentation
Panoptic SegmentationPanoptic Segmentation
Panoptic Segmentation
 
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and BeyondOn the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
 
A Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural NetworksA Benchmark for Interpretability Methods in Deep Neural Networks
A Benchmark for Interpretability Methods in Deep Neural Networks
 
KDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial NetworksKDGAN: Knowledge Distillation with Generative Adversarial Networks
KDGAN: Knowledge Distillation with Generative Adversarial Networks
 
Designing Network Design Spaces
Designing Network Design SpacesDesigning Network Design Spaces
Designing Network Design Spaces
 
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the EyesSearch to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
 
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised LearningBootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 

Kürzlich hochgeladen

데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법JMP Korea
 
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP Korea
 
공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화JMP Korea
 
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP Korea
 
JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP Korea
 
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석JMP Korea
 
JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP Korea
 

Kürzlich hochgeladen (7)

데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법데이터 분석 문제 해결을 위한 나의 JMP 활용법
데이터 분석 문제 해결을 위한 나의 JMP 활용법
 
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology
 
공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화공학 관점에서 바라본 JMP 머신러닝 최적화
공학 관점에서 바라본 JMP 머신러닝 최적화
 
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개
 
JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP가 걸어온 여정, 새로운 도약 JMP 18!
JMP가 걸어온 여정, 새로운 도약 JMP 18!
 
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석
 
JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례JMP를 활용한 가속열화 분석 사례
JMP를 활용한 가속열화 분석 사례
 

Supervised Constrastive Learning

  • 2. Contents 1. Introduction 2. Related Work 3. Method 4. Experiments 5. Discussion 2 https://github.com/rlatjcj/Paper-code-review/tree/master/%5BArxiv%202020%5D%20Supervised%20Contrastive%20Learning
  • 3. Introduction • Cross-entropy • Supervised learning에서 많이 사용됨 • Label distribution과 empirical distribution의 KL-divergence로 생각할 수 있음 • 개선 : label smoothing, self-distillation, mixup, … • 같은 클래스는 가깝게, 다른 클래스는 멀게 하는 supervised training loss • Self-supervised learning에서 좋은 성능을 보이는 metric learning과 연관이 있는 contrastive objective functions 3
  • 4. Introduction • Supervised contrastive leaning • Multiple positives를 적용한 contrastive loss로 full supervised setting에서 contrastive learning 진행 • Cross-entropy와 비교했을 때 top-1 accuracy와 robustness에서 SOTA • Cross-entropy보다 hyperparameter 범위에 덜 민감 • Hard positive, negative의 학습을 촉진하는 gradient • Single positive, negative가 사용되었을 때 triplet loss와 연관성 4
  • 5. Related Work • Self-supervised representation learning + Metric learning + Supervised learning • Cross-entropy loss는 deep network를 학습하기 위한 좋은 loss function • 하지만, 왜 target label이 정답이 되어야 하는지 명확하지 않음 • 더 좋은 target label vector가 존재함을 증명 (S. Yang et al., 2015) • Cross-entropy loss의 다른 단점 • Sensitivity to noise labels (Z. Zhang et al., 2018., S. Sukhbaatar et al., 2014.) • Adversarial examples (G Elsayed et al., 2018., K. Nar et al., 2019.) • Poor margins (K. Cao et al., 2019.) • 다른 loss제안, reference label distribution을 바꾸는 것이 더 효과적 • Label smoothing, Mixup, CutMix, Knowledge Distillation 5 S. Yang et al., 2015. K. Cao et al., 2019.
  • 6. Related Work • 최근에 self-supervised representation learning이 각광받는 중 • Language domain • Pre-trained embedding (BERT, Xlnet, …) • Image domain • Embedding을 배우기 위해 사용 (e.g. 가려진 signal부분을 가려지지 않은 부분으로 예측) • Self-supervised representation learning은 contrastive learning으로 바뀌는 중 • Noise contrastive estimation, N-pair loss • 학습할 때 deep network의 마지막 레이어에 loss를 적용 • 테스트 시 downstream transfer task, fine tuning, direct retrieval task를 위해 이전 레이어를 활용 6
  • 7. Related Work • Contrastive learning은 metric learning과 triplet loss와 연관있음 • 공통점은 powerful representation을 학습 • Triplet loss와 contrastive loss의 차이점 • Data point당 positive, negative pair 수 • Triplet loss : one positive, one negative • Supervised metric learning : positive in same class, negative in different class (hard negative mining) • Self-supervised contrastive loss : one positive pair selected using either co-occurrence or using data augmentation • Supervised contrastive와 가장 유사한 것은 soft-nearest neighbor loss • 공통점 : embedding을 normalize, euclidean distance를 inner product로 교체 • 개선 : data augmentation, disposable contrastive head, two-stage training 7
  • 8. Method • Representation Learning Framework • Self-supervised contrastive learning을 사용한 Contrastive Multiview Coding, SimCLR과 구조적으로 유사 • 각 input image에 대해 두 개의 randomly augmented image 생성 • First stage : random crop + resize to the image’s native resolution • Second stage : AutoAugment, RandAugment, SimAugment • Encoder network(𝐸(⋅)) : augmented image ෥𝒙 를 representation vector 𝒓 = 𝐸 ෥𝒙 ∈ ℛ 𝐷 𝐸 로 나타냄 • E.g. ResNet50의 마지막 pooling layer(𝐷 𝐸)를 representation vector로 사용, unit hypersphere로 항상 normalize • Projection network (𝑃(⋅)) : normalized representation vector 𝒓 을 contrastive loss 계산에 적합한 vector 𝒛 = 𝑃 𝒓 ∈ ℛ 𝐷 𝑃로 나타냄 • E.g. 𝐷 𝐸의 output vector size를 가진 multi-layer perceptron 사용, unit hypersphere로 항상 normalize • 학습이 완료되면, projection network를 없애고 single linear layer로 교체 8
  • 9. Method • Contrastive Losses: Self-supervised and Supervised • Contrastive loss의 이점은 유지한 채 labeled data를 효과적으로 사용하는 contrastive loss • Randomly sampling으로 N개의 image/label pairs 생성 ( 𝑥 𝑘, 𝑦 𝑘 𝑘=1…𝑁) • 실질적으로 학습에 사용되는 minibatch는 2N pairs ( ෤𝑥 𝑘, ෤𝑦 𝑘 𝑘=1…2𝑁) • ෤𝑥2𝑘, ෤𝑥2𝑘−1 : two random augmentation of 𝑥 𝑘 𝑘 = 1 … 𝑁 • ෤𝑦2𝑘−1 = ෤𝑦2𝑘 = 𝑦 𝑘 9 https://towardsdatascience.com/exploring-simclr-a-simple-framework-for-contrastive-learning-of-visual-representations-158c30601e7e
  • 10. Method • Contrastive Losses: Self-supervised and Supervised • Self-Supervised Contrastive Loss • 𝑖 ∈ 1 … 2𝑁 : the index of an arbitrary augmented image • 𝑗 𝑖 : the index of the other augmented image originating from the same source image • 𝑧𝑙 = 𝑃 𝐸 ෥𝒙𝒍 • 𝑖 : anchor / 𝑗(𝑖) : positive / 𝑘 𝑘 = 1 … 2𝑁, 𝑘 ∉ 𝑖, 𝑗 : negatives • 𝑧𝑖 ⋅ 𝑧𝑗 𝑖 : an inner product between the normalized vectors • Similar view는 neighboring representation, dissimilar view는 non-neighbor representation으로 학습 10
  • 11. Method • Supervised Contrastive Loss • Supervised learning은 같은 클래스에 속한 샘플이 하나보다 많기 때문에, Eq.2의 contrastive loss 사용할 수 없음 → 임의의 positive를 사용하기 위한 loss제안 • 𝑁෤𝑦 𝑖 : the total number of images in the minibatch that have the same label, ෤𝑦𝑖, as the anchor, 𝑖 • Generalization to an arbitrary number of positives • Loss는 encoder가 같은 클래스 내 모든 데이터를 Eq.2보다 더 robust clustering으로 가깝게 표현하도록 함 • Contrastive power increases with more negatives • 분모에 positive는 물론 negative를 많이 추가하여 positive와 멀리 떨어지도록 함 11
  • 12. Method • Supervised Contrastive Loss Gradient Properties • Supervised contrastive loss가 weak positive, negative보다 hard positive, negative에 초점을 둔 학습방법임을 증명 • Projection network 마지막에 normalization layer 추가 → gradient를 inner product할 때 조절 • 𝑤 : the projection network output immediately prior to normalize (e.g. 𝑧 = 𝑤/| 𝑤 |) 12 positives negatives
  • 13. Method • Supervised Contrastive Loss Gradient Properties (Cont.) • Easy positive, negative : small gradient / Hard positive, negative : large gradient • Easy positive : 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 1, 𝑃𝑖𝑗 is large • Hard positive : 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 0, 𝑃𝑖𝑗 is moderate → weak positive는 ||∇ 𝑧 𝑖 ℒ𝑖,𝑝𝑜𝑠 𝑠𝑢𝑝 || 가 작고, hard positive는 큼 • Weak negative는 𝑧𝑖 ⋅ 𝑧𝑗 ≈ −1, hard negative는 𝑧𝑖 ⋅ 𝑧𝑗 ≈ 0이므로 𝑧 𝑘 − 𝑧𝑖 ⋅ 𝑧 𝑘 ⋅ 𝑧𝑖 ⋅ 𝑃𝑖𝑘에 영향을 미침 • ( 𝑧𝑖 ⋅ 𝑧𝑙 ⋅ 𝑧𝑙 − 𝑧𝑙)은 normalization layer가 projection network 마지막 에 붙어있을 때만 그 역할을 할 수 있음 → network에 normalization을 사용해야함! 13
  • 14. Method • Connections to Triplet Loss • Contrastive learning은 triplet loss와 밀접한 연관이 있음 • Triplet loss는 positive와 negative를 각각 하나씩 사용 → Positive와 negative가 각각 하나 씩인 contrastive loss로 생각할 수 있음 • Anchor, positive의 representation이 anchor, negative보다 더 잘 맞춰졌다고 가정 (𝑧 𝑎 ⋅ 𝑧 𝑝 ≫ 𝑧 𝑎 ⋅ 𝑧 𝑛) • Margin 𝛼 = 2𝜏인 triplet loss와 동일한 형태 • 물론 성능이 더 좋고 계산량도 적음 14
  • 15. Experiments • ImageNet Classification Accuracy • Supervised contrastive loss가 ImageNet SOTA기록 15
  • 16. Experiments • Robustness to Image Corruptions and Calibration • ImageNet-C를 이용한 robustness에서도 SOTA기록 16
  • 17. Experiments • Hyperparameter Stability • Hyperparameter의 변화에서도 안정된 성능 유지 • Different optimizer, data augmentation, learning rates • Optimizer와 augmentation의 변화에서 작은 분산을 보임 • Hypersphere의 smoother geometry 덕분이라고 추측 17
  • 18. Experiments • Effect of Number of Positives • Positive의 수의 영향에 대해 ablation study • Positive의 수를 늘릴수록 성능이 좋아짐 • Positive의 수를 늘릴수록 computational cost가 높아지는 trade-off • Positive에는 같은 데이터지만 다른 augmentation한 경우도 포함되고, 나머지는 같은 클래스지만 다른 샘플 • Self-supervised learning은 1positive 18
  • 19. Experiments • Training Details • Epochs : 700 (pre-training stage) • 각 스텝마다 cross-entropy보다 50%정도 느렸는데, 이는 미니배치 내 모든 요소들의 cross-product를 계산해야 했기 때문 • Batch size : ~8192 (2048도 충분) • ResNet50은 8192, ResNet200은 2048 (큰 네트워크일수록 작은 batch size가 필요) • 고정된 batch size에서 cross-entropy와 비슷한 성능을 갖는데 larger learning rate와 적은 epoch로 가능 • Temperature : 𝜏 = 0.07 • Temperature은 작을수록 좋은 성능을 보이지만, numerical stability 문제로 학습이 어려울 수 있음 • AutoAugment가 Supervised Contrastive와 Cross Entropy 모두에서 좋은 성능을 보임 • LARS, RMSProp, SGD with momentum을 different permutation으로 initial pre-training step, dense layer 학습 • ResNet을 cross entropy로 학습할 땐 momentum optimizer, supervised contrastive loss로 학습할 땐 pre-training에는 LARS, dense layer 학습은 RMSProp 19
  • 20. 감 사 합 니 다 20