SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
High-Performance Large-Scale Image
Recognition Without Normalization
New SOTA validation accuracies on ImageNet by DeepMind
이미지 팀 : 김병현, 박동훈, 안종식, 이찬혁, 홍은기
발표자 : 박동훈
https://arxiv.org/abs/2102.06171
Contents
1. Performance
2. Batch Normalization: pros and cons
3. Previous Normalizer-Free Networks
4. Proposed Method : Adaptive gradient clipping
5. Experimental Results
6. Conclusion
2
Performance
• EfficientNet-B7 on ImageNet 8.7x faster to train
• New state-of-the-art top-1 accuracy of 86.5%.
- After finetuning on ImageNet after pretraining on 300 million labeled images,
It achieves 89.2%
• Image Classification on ImageNet[1]
[1] https://paperswithcode.com/sota/image-classification-on-imagenet
3
Batch Normalization(previous knowledge)
• The change in the distributions of
each layers occurs ‘Covariant Shift’[2]
Batch Normalization 효과
·Downscales residual branch
· Regularizing effect
· Eliminates mean-shift
· Efficient large-batch training
4
[2] Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture07.pdf
Train
Also calculate exponential moving
avg & var
𝝁 ← 1 − 𝛼 𝜇 + 𝛼𝜇𝛽
𝝈𝟐
← 1 − 𝛼 𝜎2
+ 𝛼𝜎𝛽
2
Test
BNtest x =
x − 𝝁
𝝈𝟐 +𝜖
Batch Normalization - Bad
• First, it is a surprisingly expensive computational primitive, which
incurs memory overhead
→ 계산 과부하
• Discrepancy between the behaviors of the model during training and at
inference time.
→ 학습과 추론 동작 상이
• Breaks the independence between training examples in the minibatch
→ Mini-batch의 독립성을 깨트린다.
5
Batch Normalization - Bad
• Can train residual networks with large learning rate, But only benefit if
batch size is also large.
→ 배치 정규화는 큰 lr를 사용 가능하게 하지만, 배치 사이즈도 커야 효과 있다.
• Batch normalization is often the cause of subtle implementation errors,
especially during distributed training(Pham et al., 2019)
→(실무자들의 의견에 의하면) 배치 정규화를 하면 HW에 따라 결과가 매번 달랐으며,
특히 분산 학습에서 미세한 구현 에러가 발생하였다.
6
https://www.youtube.com/watch?v=rNkHjZtH0RQ
𝛴
Q & A
Previous Normalizer-Free Networks
8
De, S. and Smith, S. Batch normalization biases residual blocks towards the identity function in deep networks. In NIPS 2020
“If our theory is correct, it should be possible to train deep residual networks without norm
alization, simply by downscaling the residual branch.”
Fully connected linear
unnormalized residual network
Fully connected linear
normalized residual network
Normalized convolutional
residual network
Residual
branch
분산 감소
"Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks”
(In NIPS 2020)
Previous Normalizer-Free Networks
• Residual block : ℎ𝑖+1 = ℎ𝑖 + 𝛼𝑓𝑖(ℎ𝑖 / 𝛽𝑖)
• ℎ𝑖 : residual block의 입력
• 𝑉𝑎𝑟 𝑓𝑖 𝑧 = 𝑉𝑎𝑟(𝑧)
• 𝛼는 Residual Block이후 곱해주는 값. 분산을 줄여 주기 위함. e.g. 𝛼 = 0.2
• 𝛽𝑖 = √𝑉𝑎𝑟(ℎ𝑖) where 𝑉𝑎𝑟 ℎ𝑖+1 = 𝑉𝑎𝑟 ℎ𝑖 + 𝛼2
9
Original residual unit proposed by K. He et al
“Normalizer-Free ResNets”(NF-ResNets)(Brock et al., 2021)
1/𝜷
𝜶
Brock, A., De, S., and Smith, S. L. Characterizing signal propagation to close the performance gap in unnormalized resnets. In ICLR, 2021
Previous Normalizer-Free Networks
그외
• Weight Standardization Standardization (Huang et al., 2017; Qiao et al., 2019))
𝑊𝑖𝑗 =
𝑊𝑖𝑗 − 𝜇𝑖
𝑁𝜎𝑖
𝜇𝑖 = (1/N) σj Wij , 𝜎i
2
=(1/N) σj(Wij − 𝜇𝑖)2
• Dropout(Srivastava et al.,2014)
• Stochastic Depth(Huang et al., 2016))
10
Huang, L., Liu, X., Liu, Y., Lang, B., and Tao, D. Centered weight normalization in accelerating training of deep neural networks. In ICCV 2017.
Q & A
Gradient Clipping(previous knowledge)
• 경사 하강(gradient descent)이 가파른
절벽에서 합리적으로 수행될 수 있도록 돕
는다.
• RNN 계열의 모델 학습에 널리 쓰인다.
• Hard to tuning threshold(𝜆)
• But enable us to train higher batch
size
기울기 norm ||g||이 thres 보다 클 경우,
→ 정규화 해서 thres 크기 만큼으로 벡터 크기를 수정
Pascanu, R., Mikolov, T., and Bengio, Y. On the difficulty of training recurrent neural networks. In ICML, 2013.
ො
𝑔 : Gradient vector
𝜖 ∶ Loss
𝜃 ∶ Vector
12
Adaptive Gradient Clipping
13
• 비율
| 𝐺𝑙 |
||𝑊𝑙||
이 학습의 단위가 될 수 있다는 것에 영감을 받았다.
• 𝑊𝑙
∈ 𝑅𝑁×𝑀
: 𝑙𝑡ℎ
번째 계층의 가중치 행렬
• 𝐺𝑙 ∈ 𝑅𝑁×𝑀 : 𝑊𝑙에 대응하는 기울기
• | 𝑊𝑙 |𝐹 = σ𝑖
𝑁 σ𝑗
𝑀
𝑊𝑖,𝑗
𝑙 2
, | 𝑊𝑙 | = max(| 𝑊𝑙 |𝐹, 𝜖) , 𝜖 = 10−3
• 𝜆 = [0.01, 0.02, 0.04, 0.08, 0.16]
∆𝑊𝑙 = −ℎ𝐺𝑙
| ∆𝑊𝑙 |
||𝑊𝑙||
= h
| 𝐺𝑙 |
||𝑊𝑙||
Training[1]
1/ | 𝑊𝑙
|
Adaptive
[1] https://www.youtube.com/watch?v=o_peo6U7IRM
Experimental Result
14
Model Detail
15
https://github.com/deepmind/deepmind-research/tree/master/nfnets
[Training Detail]
• Softmax cross-entrophy loss with
label smoothing of 0.1
• Stochastic gradient descent with
Nesterov’s momentum 0.9
• Weight decay coefficient 2 x 10−5
• Dropout, Stochastic Depth(0.25)
Conclusion
• 배치 정규화를 적용하지 않고도, 큰 배치 사이즈로 학습 할 때 배치 정
규화를 적용한 모델의 성능을 뛰어 넘는 최초의 모델
• 배치 정규화 적용한 모델과 성능은 비슷하면서도 빠르게 학습할 수 있
다.
• AGC 기법을 적용한 family models을 만들었다
• 정규화 없는 모델이 (이미지 넷과 같은 모델을 학습 한 후) Finetuning
할 때 되려 더 좋은 성능을 나타낸 다는 것을 보였다.
16
개인적인 생각
• 해당 팀은 정규화 없는 학습 방식 관련해서 연구를 많이 한 팀.
• 학습 속도를 빠르게 해서 모델 검증을 하고 싶음.
• 정규화 없이 학습을 하려 했으나, 배치 사이즈를 크게 하니 성능이
되려 안 좋게 나옴.
• 여러 아이디어로 실험을 해보고 나온 결과.
17
감사합니다

Weitere ähnliche Inhalte

Was ist angesagt?

Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From DataSungjoon Choi
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesSungjoon Choi
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before trainingTaegyun Jeon
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningRyo Iwaki
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingTomoya Oda
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用Ryo Iwaki
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningMLAI2
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...MLAI2
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR OptimizationniveditJain
 
KDD Poster Nurjahan Begum
KDD Poster Nurjahan BegumKDD Poster Nurjahan Begum
KDD Poster Nurjahan BegumNurjahan Begum
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 

Was ist angesagt? (20)

Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
TensorFlow in 3 sentences
TensorFlow in 3 sentencesTensorFlow in 3 sentences
TensorFlow in 3 sentences
 
Kernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian ProcessesKernel, RKHS, and Gaussian Processes
Kernel, RKHS, and Gaussian Processes
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training[PR12] PR-063: Peephole predicting network performance before training
[PR12] PR-063: Peephole predicting network performance before training
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)Deep Learning for Computer Vision: Visualization (UPC 2016)
Deep Learning for Computer Vision: Visualization (UPC 2016)
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
KDD Poster Nurjahan Begum
KDD Poster Nurjahan BegumKDD Poster Nurjahan Begum
KDD Poster Nurjahan Begum
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 

Ähnlich wie 4 high performance large-scale image recognition without normalization

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesNamkug Kim
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...Edge AI and Vision Alliance
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...IJECEIAES
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer VisionDavid Dao
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET Journal
 

Ähnlich wie 4 high performance large-scale image recognition without normalization (20)

Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
N ns 1
N ns 1N ns 1
N ns 1
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...Background Estimation Using Principal Component Analysis Based on Limited Mem...
Background Estimation Using Principal Component Analysis Based on Limited Mem...
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
IRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog ImagesIRJET- Image Classification – Cat and Dog Images
IRJET- Image Classification – Cat and Dog Images
 

Kürzlich hochgeladen

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Kürzlich hochgeladen (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

4 high performance large-scale image recognition without normalization

  • 1. High-Performance Large-Scale Image Recognition Without Normalization New SOTA validation accuracies on ImageNet by DeepMind 이미지 팀 : 김병현, 박동훈, 안종식, 이찬혁, 홍은기 발표자 : 박동훈 https://arxiv.org/abs/2102.06171
  • 2. Contents 1. Performance 2. Batch Normalization: pros and cons 3. Previous Normalizer-Free Networks 4. Proposed Method : Adaptive gradient clipping 5. Experimental Results 6. Conclusion 2
  • 3. Performance • EfficientNet-B7 on ImageNet 8.7x faster to train • New state-of-the-art top-1 accuracy of 86.5%. - After finetuning on ImageNet after pretraining on 300 million labeled images, It achieves 89.2% • Image Classification on ImageNet[1] [1] https://paperswithcode.com/sota/image-classification-on-imagenet 3
  • 4. Batch Normalization(previous knowledge) • The change in the distributions of each layers occurs ‘Covariant Shift’[2] Batch Normalization 효과 ·Downscales residual branch · Regularizing effect · Eliminates mean-shift · Efficient large-batch training 4 [2] Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture07.pdf Train Also calculate exponential moving avg & var 𝝁 ← 1 − 𝛼 𝜇 + 𝛼𝜇𝛽 𝝈𝟐 ← 1 − 𝛼 𝜎2 + 𝛼𝜎𝛽 2 Test BNtest x = x − 𝝁 𝝈𝟐 +𝜖
  • 5. Batch Normalization - Bad • First, it is a surprisingly expensive computational primitive, which incurs memory overhead → 계산 과부하 • Discrepancy between the behaviors of the model during training and at inference time. → 학습과 추론 동작 상이 • Breaks the independence between training examples in the minibatch → Mini-batch의 독립성을 깨트린다. 5
  • 6. Batch Normalization - Bad • Can train residual networks with large learning rate, But only benefit if batch size is also large. → 배치 정규화는 큰 lr를 사용 가능하게 하지만, 배치 사이즈도 커야 효과 있다. • Batch normalization is often the cause of subtle implementation errors, especially during distributed training(Pham et al., 2019) →(실무자들의 의견에 의하면) 배치 정규화를 하면 HW에 따라 결과가 매번 달랐으며, 특히 분산 학습에서 미세한 구현 에러가 발생하였다. 6 https://www.youtube.com/watch?v=rNkHjZtH0RQ 𝛴
  • 8. Previous Normalizer-Free Networks 8 De, S. and Smith, S. Batch normalization biases residual blocks towards the identity function in deep networks. In NIPS 2020 “If our theory is correct, it should be possible to train deep residual networks without norm alization, simply by downscaling the residual branch.” Fully connected linear unnormalized residual network Fully connected linear normalized residual network Normalized convolutional residual network Residual branch 분산 감소 "Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks” (In NIPS 2020)
  • 9. Previous Normalizer-Free Networks • Residual block : ℎ𝑖+1 = ℎ𝑖 + 𝛼𝑓𝑖(ℎ𝑖 / 𝛽𝑖) • ℎ𝑖 : residual block의 입력 • 𝑉𝑎𝑟 𝑓𝑖 𝑧 = 𝑉𝑎𝑟(𝑧) • 𝛼는 Residual Block이후 곱해주는 값. 분산을 줄여 주기 위함. e.g. 𝛼 = 0.2 • 𝛽𝑖 = √𝑉𝑎𝑟(ℎ𝑖) where 𝑉𝑎𝑟 ℎ𝑖+1 = 𝑉𝑎𝑟 ℎ𝑖 + 𝛼2 9 Original residual unit proposed by K. He et al “Normalizer-Free ResNets”(NF-ResNets)(Brock et al., 2021) 1/𝜷 𝜶 Brock, A., De, S., and Smith, S. L. Characterizing signal propagation to close the performance gap in unnormalized resnets. In ICLR, 2021
  • 10. Previous Normalizer-Free Networks 그외 • Weight Standardization Standardization (Huang et al., 2017; Qiao et al., 2019)) 𝑊𝑖𝑗 = 𝑊𝑖𝑗 − 𝜇𝑖 𝑁𝜎𝑖 𝜇𝑖 = (1/N) σj Wij , 𝜎i 2 =(1/N) σj(Wij − 𝜇𝑖)2 • Dropout(Srivastava et al.,2014) • Stochastic Depth(Huang et al., 2016)) 10 Huang, L., Liu, X., Liu, Y., Lang, B., and Tao, D. Centered weight normalization in accelerating training of deep neural networks. In ICCV 2017.
  • 11. Q & A
  • 12. Gradient Clipping(previous knowledge) • 경사 하강(gradient descent)이 가파른 절벽에서 합리적으로 수행될 수 있도록 돕 는다. • RNN 계열의 모델 학습에 널리 쓰인다. • Hard to tuning threshold(𝜆) • But enable us to train higher batch size 기울기 norm ||g||이 thres 보다 클 경우, → 정규화 해서 thres 크기 만큼으로 벡터 크기를 수정 Pascanu, R., Mikolov, T., and Bengio, Y. On the difficulty of training recurrent neural networks. In ICML, 2013. ො 𝑔 : Gradient vector 𝜖 ∶ Loss 𝜃 ∶ Vector 12
  • 13. Adaptive Gradient Clipping 13 • 비율 | 𝐺𝑙 | ||𝑊𝑙|| 이 학습의 단위가 될 수 있다는 것에 영감을 받았다. • 𝑊𝑙 ∈ 𝑅𝑁×𝑀 : 𝑙𝑡ℎ 번째 계층의 가중치 행렬 • 𝐺𝑙 ∈ 𝑅𝑁×𝑀 : 𝑊𝑙에 대응하는 기울기 • | 𝑊𝑙 |𝐹 = σ𝑖 𝑁 σ𝑗 𝑀 𝑊𝑖,𝑗 𝑙 2 , | 𝑊𝑙 | = max(| 𝑊𝑙 |𝐹, 𝜖) , 𝜖 = 10−3 • 𝜆 = [0.01, 0.02, 0.04, 0.08, 0.16] ∆𝑊𝑙 = −ℎ𝐺𝑙 | ∆𝑊𝑙 | ||𝑊𝑙|| = h | 𝐺𝑙 | ||𝑊𝑙|| Training[1] 1/ | 𝑊𝑙 | Adaptive [1] https://www.youtube.com/watch?v=o_peo6U7IRM
  • 15. Model Detail 15 https://github.com/deepmind/deepmind-research/tree/master/nfnets [Training Detail] • Softmax cross-entrophy loss with label smoothing of 0.1 • Stochastic gradient descent with Nesterov’s momentum 0.9 • Weight decay coefficient 2 x 10−5 • Dropout, Stochastic Depth(0.25)
  • 16. Conclusion • 배치 정규화를 적용하지 않고도, 큰 배치 사이즈로 학습 할 때 배치 정 규화를 적용한 모델의 성능을 뛰어 넘는 최초의 모델 • 배치 정규화 적용한 모델과 성능은 비슷하면서도 빠르게 학습할 수 있 다. • AGC 기법을 적용한 family models을 만들었다 • 정규화 없는 모델이 (이미지 넷과 같은 모델을 학습 한 후) Finetuning 할 때 되려 더 좋은 성능을 나타낸 다는 것을 보였다. 16
  • 17. 개인적인 생각 • 해당 팀은 정규화 없는 학습 방식 관련해서 연구를 많이 한 팀. • 학습 속도를 빠르게 해서 모델 검증을 하고 싶음. • 정규화 없이 학습을 하려 했으나, 배치 사이즈를 크게 하니 성능이 되려 안 좋게 나옴. • 여러 아이디어로 실험을 해보고 나온 결과. 17