Intriguing properties of contrastive losses

Intriguing Properties of Contrastive Losses
1
2021 NIPS
Google research Ting Chen et al.
Presenter
이재윤
Fundamental Team
김동현,김채현,박종익,송헌,양현모,오대환,이근배,조남경,최재규

Contents
1. Motivation
2. Generalized Contrastive Loss
3. Contrastive Learning with multiple objects
4. Feature Suppression
5. Conclusion

Paper Selection Background
• 2021년 NIPS에서 발표가 되었고 이미 20의 인용 수
• 본 논문을 서술한 Ting Chen은 SimCLR의 주저자
• Self-Supervision / Contrastive Learning 은 최근 많은 각광을 받고 있다[1]
• Contrastive loss로 학습 시켰을 때 나타나는 3가지 특성
1) Generalize contrastive loss, check the performance
2) Learning representation of image with multiple objects
3) Feature suppression of contrastive learning
[1] Statistics and Visualization of acceptance rate, main keyword of CVPR 2021 accepted papers for the main Computer Vision conference (CVPR 2021)

5
Contrastive Learning
CNN
CNN
𝑓𝑖
𝑓𝑗
“Negative” Pairs
CNN
CNN
𝑓𝑖
𝑓𝑗
“Positive” Pairs
• 학습 데이터들을 대조
✓ Euclidian distance / Cosine similarity etc,
✓ Negative pairs 는 서로 밀어내도록
✓ Positive pairs 는 서로 가까워지도록
• 많은 종류의 Contrastive Learning이 존재
✓ N-pair, InfoNCE, Triplet, Lifted Structured
• Label정보 없이도 supervised learning성능과 비슷

6
SimCLR framework
• Contrastive Learning을 통해 representation 학습
• Training Procedure
I. Learn representation with unlabeled data
II. For classification task, fine-tune the network with small
amount of labeled data
• Image로부터 여러 view를 만들기 위해 augmentation 사용
✓ Random Cropping
✓ Color Distortion

2. Generalized Contrastive Loss

8
Generalize Contrastive Loss
zi , zj : representation of two augmented view
sim u, v = uT
v/( u v )
𝜏 ∶ scaler
ℳℬ ∶ randomly sampled mini − batch
• Cross-Entropy기반으로 한 contrastive loss가 많이 사용
ℒNT−Xent = −
1
n
෍
i,j∈ℳℬ
log
exp(sim(zi, zj)/𝜏)
σk=1
2n
1{i ≠ j}exp(sim(zi, zj)/𝜏)
• 위 식을 아래와 같은 형태로 일반화
✓ ℒalignment은 augment된 view끼리 일치하도록
✓ ℒdistribution은 representation이 prior distribution과 일치하도록
ℒgeneralized contrastive = ℒalignment + 𝜆ℒdistribution
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n

9
• Mutual Information과의 관계
𝐼 𝑈; 𝑉 = −𝐻 𝑈 𝑉 + 𝐻 𝑈
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n
• Maximize ① = Minimize 𝐮𝐧𝐜𝐞𝐫𝐭𝐚𝐢𝐧𝐭𝐲
• Minimize ② = Maiximize 𝐞𝐧𝐭𝐫𝐨𝐩𝐲
uncertainty entropy
uncertainty
① ②

10
• 여기서 가정하는 prior distribution은 uniform hypersphere
✓ LogSumExp 항은 representation이 hypersphere에 uniform하게 분포되도록 한다.
ℒNT−Xent = −
1
n
෍
i,j
sim(zi , zj ) +
𝜏
n
𝜆 ෍
i
log ෍
k=1
2n
• Uniform hypersphere외 다양한 prior distribution을 사용 시 어떻게 학습되는가
✓ Uniform hypersphere를 제외한 prior들은 LogSumExp로 계산 불가
✓ SWD(Sliced Wasserstein Distance)를 사용하여 Loss계산

11
• SimCLR 실험세팅 하에서 다양한 prior distribution에 따른 성능 조사
✓ CIFAR-10실험에서 200 epoch 이상 학습 시 차이가 거의 없음
✓ ImageNet실험에서 2-layer의 projection head에선 차이가 존재
✓ 그러나 3+layer / non-linear projection head 에선 차이가 사라짐

12
• 깊은 projection head는 batch size와 유사한 기능
Contrastive Loss Different Losses with 3-Layer
projection head

3. Contrastive Learning with
multiple objects

15
Instance-based objectives
• 대부분의 contrastive learning method(SimCLR, BYOL) instance level에서 objective정의
✓ Image가 단일 representation vector로 변환
✓ Objective는 local region보단 global representation에 대해 동작
Q1. Image에 여러 객체가 있는 경우 instance-based objective는 잘 동작 할까?
Q2. Global representation뿐만 아니라 객체의 부분적인 특징점도 학습 할 수 있을까?

16
SimCLR learn on multiple objects
• Self-supervised learning을 위한 데이터셋은 중앙에 물체가 위치
✓ MNIST, CIFAR-10, ImageNet
✓ 통제가능한 환경을 만들기 위한 MultiDigits dataset
• MultiDigit dataset
✓ Two placement strategies
Random
In-Grid

17
SimCLR learn on multiple objects
• SimCLR vs Supervised Learning
① 동일한 augmentation policy(Random Cropping/Resize)로 ResNet-18 사전학습
② 28x28크기의 MNIST로 classifier 학습, 평가
• SimCLR도 여러 물체가 동시에 있는 이미지로부터 학습이 가능
✓ Supervised와 마찬가지로 Digit의 개수가 8개까지는 높은 정확도

18
SimCLR learn local features
• Intermediate feature에 K-means 적용
✓ Representation이 잘 학습되었다면 grouping이 잘 될 것
✓ ImageNet으로 사전학습시킨 ResNet 50 2x 으로 ImageNet/COCO데이터 추론
✓ ResNet의 2,3,4번째 block으로부터의 feature들에 대해 grouping 진행
• Supervised Learning 및 RGB raw pixel의 clustering 결과와 비교

19

20

23
Feature Suppression
• SimCLR[1] 에서 color distortion이 없으면 성능이 하락
✓ Color feature가 다른 feature들을 억제
✓ Positive pair끼리 consistency를 높이는데 color정보만으로도 상승
✓ Contrastive learning은 진행되어도 좋은 representation은 얻을 수 없음
[1] A Simple framework for contrastive learning of visual representations,
Feature Suppression
color feature suppresses object class

24
Feature Suppression
• Augmentation말고도 feature suppression을 발생시키는 알려지지 않은 요인
• 제어가능한 실험환경을 위한 3종류의 3데이터셋
1) DigitOnImageNet dataset(MNIST feature vs ImageNet feature)

25
DigitOnImageNet Experiments
• 60k의 MNIST데이터 중 unique하게 사용한 개수에 따라 성능 조사
• 사용한 고유한 MNIST데이터와 ImageNet데이터에 대한 성능은 trade-off
✓ 간단한 feature(MNIST)가 복잡한 feature(ImageNet)를 억제
✓ SimCLR와 같은 contrastive losses를 통해선 competing features들을 모두 학습할 수 없음

26
MultiDigit Experiments
2) MultiDigits dataset(Bigger Digit feature vs Smaller Digit feature)
✓ 1st Digit 20X20,
✓ 2nd Digit [20~80]X[20~80]
✓ MultiDigit으로 SimCLR 및 Supervised Learning으로 ResNet-18 사전학습
✓ 단일 MNIST데이터로 classifier학습, 평가

27
MultiDigit Experiments
• Supervised learning은 2nd 숫자의 크기에 관계없이 학습이 1st 숫자 구별 능력 변동 없음
• SimCLR는 2nd 숫자의 크기가 커짐에 따라 20X20크기의 1st 의 숫자 구별 능력 하락
✓ Dominant object는 Smaller object의 학습을 억제

28
RandomBit Experiments
3) RandomBit dataset(Ch. RGB feature vs Ch. RandomBit feature)
✓ RGB channel + RandomBit channel
✓ RandomBit는 [1,log2 n]에서 랜덤으로 sample, n은 통제변인
✓ Augmentation은 RGB채널에만 적용

29
RandomBit Experiments
• 모든 종류의 데이터셋(MNIST,CIFAR-10,ImageNet)이 RandomBit channel에 의해 성능이 악화
✓ n의 값이 10일때부터 급격하게 classification성능 악화
✓ 학습하기 쉬운 random bit를 학습함으로써 다른 유용한 feature들을 억제
✓ 다른 종류의 contrastive loss(BYOL), batch size, momentum contrast를 사용해도 방지X

32
Summary
I. 다양한 prior에서의 Contrastive Learning성능
✓ Projection-head가 깊어지고 큰 batch size에서는 큰 차이 없다.
II. Contrastive learning은 여러 객체가 있는 Image로부터 학습할 수 있다.
III. Feature Suppression
✓ Simple feature > complicate feature
✓ Bigger feature > Smaller feature
✓ Easy-to-learn mutual information > all features in RGB
❖ Contrastive Learning은 Supervised Learning만큼 좋은 특성 및 성능을 가지고 있으나,
Contrastive Learning의 Mutual Information을 높이는 방식이 꼭 좋은 representation을 보장하지 않는다.

Intriguing properties of contrastive losses

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Intriguing properties of contrastive losses

Ähnlich wie Intriguing properties of contrastive losses (20)

Mehr von taeseon ryu

Mehr von taeseon ryu (20)

Intriguing properties of contrastive losses