SlideShare ist ein Scribd-Unternehmen logo
1 von 38
InternImage: Exploring Large-Scale Vision
Foundation Models with
Deformable Convolutions
발표자: 김병현
이미지처리팀:
강인하, 김현진, 안종식, 이주영, 이희재, 현청천
InterImage
 State-of-the-Art (COCO test-dev) Backbone Network
Released by OpenGVLab
2
Introduction
 Success of Transformers in Computer Vision Tasks
 CNN-based foundation models can also achieve
comparable or even better performance than ViTs when
equipped with similar operator-/architecture-level designs,
scaling-up parameters, and massive data.
3
Introduction
 Gap between CNNs and ViTs
Operator Level
• Long-range dependency
• Adaptive spatial aggregation
Architecture View
• Advanced components
– Layer Normalization
– Feed Forward Networks
– GELU
Recent Long-Range CNNs
• Very large kernels (31x31)
• Gap with SOTA ViTs
4
Introduction
 Comparison of different core operators
5
Introduction
 Global Attention: Vision Transformer
6
Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers
for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020).
Architecture of ViT
Introduction
 Local Attention: Swin Transformer
7
Architecture of Swin Transformer
Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using
shifted windows." Proceedings of the IEEE/CVF international conference
on computer vision. 2021.
Introduction
 Large Kernel: SLaK
8
Liu, Shiwei, et al. "More convnets in the 2020s: Scaling up kernels beyond
51x51 using sparsity." arXiv preprint arXiv:2207.03620 (2022).
Introduction
 Concentration on CNN-based Model
InterImage
• Brand-New CNN-based Backbone Network
• Characteristics
– Dynamic sparse convolutional layer
» Only with 3x3 kernels
» Adaptive spatial aggregation
» Reduce inductive bias
» Low computational cost compared to large convolutional layers
– Overall Architecture of ViT
9
Introduction
 Contributions
1st CNN-based backbone with more than 1 billion params.
Add long-rage dependencies and adaptive spatial aggregation
with 3x3 DCN
SOTA accuracy in COCO dataset
10
Q & A
Q & A
11
Overall Architecture
12
1. Deformable
Convolutional Layer V3
2. Architecture Design
Deformable Convolutional Layer v3
 Revisiting DCNv2
13
Deformable Convolutional Layer v3
14
https://www.taokong.org/report/DCN_v2_paper_reading.pdf
Deformable Convolutional Layer v3
15
https://www.taokong.org/report/DCN_v2_paper_reading.pdf
Deformable Convolutional Layer v3
1. Sharing weights among convolutional neurons.
• Heavy computational cost of DCNv2
• independent linear projection weights
• memory complexity is linear with the total number of sampling points
• To remedy this problem, we borrow the idea from the separable
convolution and detach the original convolution weights into
depth-wise and point-wise parts
2. Introducing multi-group mechanism
• Split the spatial aggregation process into G groups
3. Normalizing modulation scalars along sampling points
• Change element-wise sigmoid normalization to softmax
normalization along sample points.
16
Deformable Convolutional Layer v3
17
Deformable Convolutional Layer v3
 Normal Convolutional Layer
18
ℎ
𝑤
𝑑
𝑘
𝐷
𝑘
𝐻
𝑊
𝐷
Input Tensor
Convolutional Kernel
Output Tensor
Deformable Convolutional Layer v3
 Deformable Convolutional Layer v1
19
ℎ
𝑤
𝑑
Input Tensor
𝑘
𝐷
𝑘
Convolutional Kernel
𝐻
𝑊
𝐷
Output Tensor
𝑘
2𝑁(𝑁 = 𝑘2
)
𝑘
Offset Layer
(Convolutional Kernel)
𝐻
𝑊
Offset map
2𝑁(𝑁 = 𝑘2
)
Deformable Convolutional Layer v3
 Deformable Convolutional Layer v2
20
𝑘
2𝑁(𝑁 = 𝑘2
)
𝑘
Offset Layer
(Convolutional Kernel)
𝐻
𝑊
Offset map
2𝑁(𝑁 = 𝑘2
)
ℎ
𝑤
𝑑
Input Tensor
𝑘
𝐷
𝑘
Convolutional Kernel
𝐻
𝑊
𝐷
Output
Tensor
𝑘
𝑁(𝑁 = 𝑘2
)
𝑘
Modulation Layer
(Convolutional Kernel)
𝐻
𝑊
Modulation map
𝑁(𝑁 = 𝑘2
)
Sigmoid
Deformable Convolutional Layer v3
 Deformable Convolutional Layer v3
21
𝑘
2𝑁(𝑁 = 𝑘2
)
𝑘
Offset Layer
(Convolutional Kernel*)
𝐻
𝑊
Offset map
2𝑁(𝑁 = 𝑘2
)
ℎ
𝑤
𝑑 = 𝐶′ × 𝐺
Input Tensor
𝑘
𝐶′
𝑘 Convolutional Kernel
𝐻
𝑊
𝐷
Output
Tensor
𝑘
𝑁(𝑁 = 𝑘2
)
𝑘
Modulation Layer
(Convolutional Kernel*)
𝐻
𝑊
Modulation map
𝑁(𝑁 = 𝑘2
)
Softmax
(Axis -> Depth)
× 𝐺
*Details not found in the paper
Shared Params
in Kernels
InterImage Model
22
InterImage Model
23
1. Basic Block
InterImage Model
24
2. Stem layer & Downsampling
InterImage Model
25
3. Stacking Rules
InterImage Model
26
1. Basic Block
InterImage Model
27
2. Stem & Downsampling
InterImage Model
28
3. Stacking Rules
InterImage Model
29
4. Scaling Rules
InterImage Model
30
5. Hyper-parameters for models of different scales
4 Stages Models are prevalent since DETR
• Swin Transformer
• MetaFormer
Q & A
Q & A
31
Experiments
 Image Classification (Tiny Model)
32
Experiments
 Image Classification (Large Model)
33
Experiments
 Object Detection & Instance Segmentation
34
Experiments
 Object Detection & Instance Segmentation
35
Experiments
 Semantic Segmentation
36
이미지처리팀 리뷰 의견
 Deformable Conv V3에 대한 분석이 부재
Ablation Study 부재
• ResNet 등 기존 Conv 기반 모델에서 성능향상을 가지는지 확인해줬으면
정말 좋았을 듯
Deformable Conv V3의 장점에 대한 정성적인 분석 부재
• Convolution을 Group으로 나누면서 생기는 단일 레이어의 다양한 Offset
Map의 장점에 대한 시각화 자료가 있었으면 좋았을 듯
Inductive Bias를 줄일 수 있었다는 주장에 대한 근거자료 부재
 코드가 아직 공개되지 않아 정확한 검증은 어려움
37
Q & A
Q & A
38

Weitere ähnliche Inhalte

Was ist angesagt?

[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and PoseDeep Learning JP
 
SPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
SPADE :Semantic Image Synthesis with Spatially-Adaptive NormalizationSPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
SPADE :Semantic Image Synthesis with Spatially-Adaptive NormalizationTenki Lee
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法Hiroshi Nakagawa
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)MasanoriSuganuma
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化Yusuke Uchida
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスYusuke Uchida
 
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...Toru Tamaki
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge
 
Long-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向についてLong-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向についてPlot Hong
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-ITmeownoisy
 
TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?Mr. Vengineer
 
tf,tf2完全理解
tf,tf2完全理解tf,tf2完全理解
tf,tf2完全理解Koji Terada
 
Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Yamato OKAMOTO
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...Deep Learning JP
 
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉Mori Ken
 

Was ist angesagt? (20)

[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose[DL輪読会]End-to-end Recovery of Human Shape and Pose
[DL輪読会]End-to-end Recovery of Human Shape and Pose
 
SPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
SPADE :Semantic Image Synthesis with Spatially-Adaptive NormalizationSPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
SPADE :Semantic Image Synthesis with Spatially-Adaptive Normalization
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法
 
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
SSII2022 [TS1] Transformerの最前線〜 畳込みニューラルネットワークの先へ 〜
 
0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)0から理解するニューラルネットアーキテクチャサーチ(NAS)
0から理解するニューラルネットアーキテクチャサーチ(NAS)
 
Swin transformer
Swin transformerSwin transformer
Swin transformer
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティス
 
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
文献紹介:SegFormer: Simple and Efficient Design for Semantic Segmentation with Tr...
 
【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models【メタサーベイ】基盤モデル / Foundation Models
【メタサーベイ】基盤モデル / Foundation Models
 
Long-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向についてLong-Tailed Classificationの最新動向について
Long-Tailed Classificationの最新動向について
 
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
SSII2021 [OS2-01] 転移学習の基礎:異なるタスクの知識を利用するための機械学習の方法
 
【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT【論文紹介】U-GAT-IT
【論文紹介】U-GAT-IT
 
TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?
 
tf,tf2完全理解
tf,tf2完全理解tf,tf2完全理解
tf,tf2完全理解
 
ResNetの仕組み
ResNetの仕組みResNetの仕組み
ResNetの仕組み
 
Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)Skip Connection まとめ(Neural Network)
Skip Connection まとめ(Neural Network)
 
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
[DL輪読会]Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Ima...
 
Depth Estimation論文紹介
Depth Estimation論文紹介Depth Estimation論文紹介
Depth Estimation論文紹介
 
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉
02 第3.1節-第3.5節 ROS2の基本機能(1/2) ROS2勉強合宿 @別府温泉
 

Ähnlich wie InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedSushant Gautam
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...IRJET Journal
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networksFörderverein Technische Fakultät
 
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...idescitation
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 

Ähnlich wie InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions (20)

MobileNet V3
MobileNet V3MobileNet V3
MobileNet V3
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
IRJET- Spatial Context Preservation and Propagation - Layer States in Convolu...
 
Standardising the compressed representation of neural networks
Standardising the compressed representation of neural networksStandardising the compressed representation of neural networks
Standardising the compressed representation of neural networks
 
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
Highly Parallel Pipelined VLSI Implementation of Lifting Based 2D Discrete Wa...
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 

Mehr von taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

Mehr von taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Kürzlich hochgeladen (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

  • 1. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions 발표자: 김병현 이미지처리팀: 강인하, 김현진, 안종식, 이주영, 이희재, 현청천
  • 2. InterImage  State-of-the-Art (COCO test-dev) Backbone Network Released by OpenGVLab 2
  • 3. Introduction  Success of Transformers in Computer Vision Tasks  CNN-based foundation models can also achieve comparable or even better performance than ViTs when equipped with similar operator-/architecture-level designs, scaling-up parameters, and massive data. 3
  • 4. Introduction  Gap between CNNs and ViTs Operator Level • Long-range dependency • Adaptive spatial aggregation Architecture View • Advanced components – Layer Normalization – Feed Forward Networks – GELU Recent Long-Range CNNs • Very large kernels (31x31) • Gap with SOTA ViTs 4
  • 5. Introduction  Comparison of different core operators 5
  • 6. Introduction  Global Attention: Vision Transformer 6 Dosovitskiy, Alexey, et al. "An image is worth 16x16 words: Transformers for image recognition at scale." arXiv preprint arXiv:2010.11929 (2020). Architecture of ViT
  • 7. Introduction  Local Attention: Swin Transformer 7 Architecture of Swin Transformer Liu, Ze, et al. "Swin transformer: Hierarchical vision transformer using shifted windows." Proceedings of the IEEE/CVF international conference on computer vision. 2021.
  • 8. Introduction  Large Kernel: SLaK 8 Liu, Shiwei, et al. "More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity." arXiv preprint arXiv:2207.03620 (2022).
  • 9. Introduction  Concentration on CNN-based Model InterImage • Brand-New CNN-based Backbone Network • Characteristics – Dynamic sparse convolutional layer » Only with 3x3 kernels » Adaptive spatial aggregation » Reduce inductive bias » Low computational cost compared to large convolutional layers – Overall Architecture of ViT 9
  • 10. Introduction  Contributions 1st CNN-based backbone with more than 1 billion params. Add long-rage dependencies and adaptive spatial aggregation with 3x3 DCN SOTA accuracy in COCO dataset 10
  • 11. Q & A Q & A 11
  • 12. Overall Architecture 12 1. Deformable Convolutional Layer V3 2. Architecture Design
  • 13. Deformable Convolutional Layer v3  Revisiting DCNv2 13
  • 14. Deformable Convolutional Layer v3 14 https://www.taokong.org/report/DCN_v2_paper_reading.pdf
  • 15. Deformable Convolutional Layer v3 15 https://www.taokong.org/report/DCN_v2_paper_reading.pdf
  • 16. Deformable Convolutional Layer v3 1. Sharing weights among convolutional neurons. • Heavy computational cost of DCNv2 • independent linear projection weights • memory complexity is linear with the total number of sampling points • To remedy this problem, we borrow the idea from the separable convolution and detach the original convolution weights into depth-wise and point-wise parts 2. Introducing multi-group mechanism • Split the spatial aggregation process into G groups 3. Normalizing modulation scalars along sampling points • Change element-wise sigmoid normalization to softmax normalization along sample points. 16
  • 18. Deformable Convolutional Layer v3  Normal Convolutional Layer 18 ℎ 𝑤 𝑑 𝑘 𝐷 𝑘 𝐻 𝑊 𝐷 Input Tensor Convolutional Kernel Output Tensor
  • 19. Deformable Convolutional Layer v3  Deformable Convolutional Layer v1 19 ℎ 𝑤 𝑑 Input Tensor 𝑘 𝐷 𝑘 Convolutional Kernel 𝐻 𝑊 𝐷 Output Tensor 𝑘 2𝑁(𝑁 = 𝑘2 ) 𝑘 Offset Layer (Convolutional Kernel) 𝐻 𝑊 Offset map 2𝑁(𝑁 = 𝑘2 )
  • 20. Deformable Convolutional Layer v3  Deformable Convolutional Layer v2 20 𝑘 2𝑁(𝑁 = 𝑘2 ) 𝑘 Offset Layer (Convolutional Kernel) 𝐻 𝑊 Offset map 2𝑁(𝑁 = 𝑘2 ) ℎ 𝑤 𝑑 Input Tensor 𝑘 𝐷 𝑘 Convolutional Kernel 𝐻 𝑊 𝐷 Output Tensor 𝑘 𝑁(𝑁 = 𝑘2 ) 𝑘 Modulation Layer (Convolutional Kernel) 𝐻 𝑊 Modulation map 𝑁(𝑁 = 𝑘2 ) Sigmoid
  • 21. Deformable Convolutional Layer v3  Deformable Convolutional Layer v3 21 𝑘 2𝑁(𝑁 = 𝑘2 ) 𝑘 Offset Layer (Convolutional Kernel*) 𝐻 𝑊 Offset map 2𝑁(𝑁 = 𝑘2 ) ℎ 𝑤 𝑑 = 𝐶′ × 𝐺 Input Tensor 𝑘 𝐶′ 𝑘 Convolutional Kernel 𝐻 𝑊 𝐷 Output Tensor 𝑘 𝑁(𝑁 = 𝑘2 ) 𝑘 Modulation Layer (Convolutional Kernel*) 𝐻 𝑊 Modulation map 𝑁(𝑁 = 𝑘2 ) Softmax (Axis -> Depth) × 𝐺 *Details not found in the paper Shared Params in Kernels
  • 24. InterImage Model 24 2. Stem layer & Downsampling
  • 27. InterImage Model 27 2. Stem & Downsampling
  • 30. InterImage Model 30 5. Hyper-parameters for models of different scales 4 Stages Models are prevalent since DETR • Swin Transformer • MetaFormer
  • 31. Q & A Q & A 31
  • 34. Experiments  Object Detection & Instance Segmentation 34
  • 35. Experiments  Object Detection & Instance Segmentation 35
  • 37. 이미지처리팀 리뷰 의견  Deformable Conv V3에 대한 분석이 부재 Ablation Study 부재 • ResNet 등 기존 Conv 기반 모델에서 성능향상을 가지는지 확인해줬으면 정말 좋았을 듯 Deformable Conv V3의 장점에 대한 정성적인 분석 부재 • Convolution을 Group으로 나누면서 생기는 단일 레이어의 다양한 Offset Map의 장점에 대한 시각화 자료가 있었으면 좋았을 듯 Inductive Bias를 줄일 수 있었다는 주장에 대한 근거자료 부재  코드가 아직 공개되지 않아 정확한 검증은 어려움 37
  • 38. Q & A Q & A 38