SlideShare a Scribd company logo
1 of 62
Download to read offline
PR 083 (29th April, 2018)
Taeoh Kim
• CVPR18 Poster
• Inspired by
Non-local Means
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
• Average Similar Pixels
• Do not Average non-Similar Pixels
Problem)
Not Enough Similar Pixels in LOCAL REGIONS
• Average Similar Pixels
• Do not Average non-Similar Pixels
Problem)
Not Enough Similar Pixels in LOCAL REGIONS
 Get More Samples in Non-LOCAL REGIONS
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
Slides fromBIL717ImageProcessing 2012
𝑁𝐿𝑀𝐹 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2
𝐼 𝑞
𝐵𝐴 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐼 𝑞
𝐺 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐺 𝜎 𝑝 − 𝑞 2 𝐼 𝑞
𝐺 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐺 𝜎 𝑝 − 𝑞 2 𝐺 𝜎 𝑟
𝐼 𝑝 − 𝐼 𝑞 1
𝐼 𝑞
𝑁𝐿𝑀𝐹 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2
𝐼 𝑞
Output Value Representation
(ProbabilityDistribution)
TargetValue (Pixel)
vs AllValues(Pixel)
Inputs
PR-049 , Attention isAllYou Need
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑄, 𝐾, 𝑉 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄𝐾 𝑇
𝑑 𝑘
𝑉
Output Value
Representation
(ProbabilityDistribution)
TargetValue (Query)
vs AllValues(Keys)
Inputs
PR-049 , Attention isAllYou Need
𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑 𝑄, 𝐾, 𝑉 = 𝑊 ∙ 𝐶𝑜𝑛𝑐𝑎𝑡 ℎ𝑒𝑎𝑑1, … , ℎ𝑒𝑎𝑑ℎ
ℎ𝑒𝑎𝑑𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄𝑊𝑖𝑄, 𝐾𝑊𝑖𝐾, 𝑉𝑊𝑖𝑉)
PR-049 , Attention isAllYou Need
𝑁𝐿𝑀𝐹 𝐼 𝑝 =
1
𝑊
෍
𝑞
𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2
𝐼 𝑞
Output Value Representation
(ProbabilityDistribution)
TargetValue (Pixel)
vs AllValues(Pixel)
Inputs
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
Output Value Representation
(ProbabilityDistribution)
TargetValue (Pixel)
vs AllValues(Pixel)
Inputs
Another Representation of Input Local Pixels
= Weighted Sum of Local Pixels with Learned Filter
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗∈3×3
𝑤𝑗 𝑔(𝑥𝑗)
Another Representation of Non-Local Pixels
= Weighted Sum of All Pixels with Similarity
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
Another Representation of Non-Local Pixels
= Weighted Sum of All Pixels with Similarity
+Learning…
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
• Gaussian 𝑓 𝑥𝑖, 𝑥𝑗 = exp(𝑥𝑖
𝑇
∙ 𝑥𝑗)
• Embedded Gaussian 𝑓 𝑥𝑖, 𝑥𝑗 = exp(𝜃(𝑥𝑖
𝑇
) ∙ 𝜙(𝑥𝑗))
• Dot Product 𝑓 𝑥𝑖, 𝑥𝑗 = 𝜃(𝑥𝑖
𝑇
) ∙ 𝜙(𝑥𝑗)
• Concatenation 𝑓 𝑥𝑖, 𝑥𝑗 = 𝑅𝑒𝐿𝑈(𝑤𝑓
𝑇
𝜃(𝑥𝑖) ∙ 𝜙(𝑥𝑗) )
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
𝑔 𝑥𝑗 = 𝑊𝑔 𝑥𝑗
For Feature Extraction
𝑦𝑖 =
1
σ 𝑗 exp(𝑥𝑖
𝑇
∙ 𝑥𝑗)
෍
𝑗
exp(𝑥𝑖
𝑇
∙ 𝑥𝑗) 𝑊𝑔 𝑥𝑗
Soft
max
HxWx1024
HxWx512 HWx512
HWx1024
1024xHW
HWxHW
HWx512
HxWx512
Reshape
1x1Conv
Operation
𝑦𝑖 =
1
σ 𝑗 exp(𝜃(𝑥𝑖
𝑇
) ∙ 𝜙(𝑥𝑗))
෍
𝑗
exp(𝜃(𝑥𝑖
𝑇
) ∙ 𝜙(𝑥𝑗))𝑊𝑔 𝑥𝑗
Soft
max
HxWx1024
HxWx512 HWx512
HWx512
512xHW
HWxHW
HWx512
HxWx512
Reshape
1x1Conv
Operation
𝑦𝑖 =
1
𝑁
෍
𝑗
𝜃(𝑥𝑖
𝑇
) ∙ 𝜙(𝑥𝑗) 𝑊𝑔 𝑥𝑗
1/N
HxWx1024
HxWx512 HWx512
HWx512
512xHW
HWxHW
HWx512
HxWx512
Reshape
1x1Conv
Operation
𝑦𝑖 =
1
𝑁
෍
𝑗
𝑅𝑒𝐿𝑈(𝑤𝑓
𝑇
𝜃(𝑥𝑖) ∙ 𝜙(𝑥𝑗) )𝑊𝑔 𝑥𝑗
1/N
HxWx1024
HxWx512 HWx512
HWx512 HWx512 HWx1024
HWx512
HxWx512
Reshape
1x1Conv
Operation
1024
xHW
+ReLU
HWxHW
HxWx1024 HxWx512
NL
Operation
HxWx1024
1x1Conv
+
Residual
𝑧𝑖 = 𝑊𝑧 𝑦𝑖 + 𝑥𝑖
Another Representation of Non-Local Pixels
= Weighted Sum of All Pixels with Similarity
+ Learning?
𝑦𝑖 =
1
𝐶(𝑥)
෍
𝑗
𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
Recalibrate Features?
• Global Representation
• Global Context
• Long-range Dependencies
• Shorter Paths
Attention isAllYouNeed,2017
Squeeze –and–Excitation Networks,CVPR 2018
Channel-wise Feature Recalibration
- SENet (ILSVRC 2017 Winner)
• 2 FC (Fully Connected) Layers Between Channels
• Excitation Layer Output (Representation) x Input = Output
• Squeeze-and-Excitation Networks (Channel-wise)
𝑥𝑖 / Learned Weights / 𝑥𝑗
• Self-Attention (Spatial)
Embedded 𝑥𝑖 / Similarity Weights / Embedded 𝑥𝑗 / Positional Encoding
• Non-local Neural Networks (Spatial/Temporal)
Embedded or Not 𝑥𝑖 / Similarity Weights / Embedded 𝑥𝑗
Layer Operation Repeat Output Size
Conv1 7x7, 64, s=2 64x112x112
Pool1 3x3, s=2 64x56x56
Res2 [1x1, 64 / 3x3, 64 / 1x1, 256]
+ [1x1, 256]
x3 256x56x56
Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512]
+ [1x1, 512]
x4 512x28x28
Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024]
+ [1x1, 1024]
x6 1024x14x14
Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048]
+ [1x1, 2048]
x3 2048x7x7
Pool2 7x7 2048
FC # of Category
3x224x224 64x112x112 64x56x56
7
x
7
P
o
o
l
1
x
1
3
x
3
1
x
1
1
x
1
+
Stride=2 Conv Stride=1 ConvStride=2 Pool
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
256x56x56
64
6464256
256
6464256
256
6464256
256
Stride=2 Conv Stride=1 ConvStride=2 Pool
256x56x56 512x28x28
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
128128512
512
Stride=2 Conv Stride=1 ConvStride=2 Pool
512x28x28
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
256256 1024
1024
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1024x14x14
Stride=2 Conv Stride=1 ConvStride=2 Pool
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1024x14x14 2048x7x7
512 512 2048
2048
2048x1x1
1000
7
x
7
F
C
Layer Operation Repeat Output Size
Conv1 7x7, 64, s=2 64x112x112
Pool1 3x3, s=2 64x56x56
Res2 [1x1, 64 / 3x3, 64 / 1x1, 256]
+ [1x1, 256]
x3 256x56x56
Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512]
+ [1x1, 512]
x4 512x28x28
Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024]
+ [1x1, 1024]
x6 1024x14x14
Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048]
+ [1x1, 2048]
x3 2048x7x7
Pool2 7x7 2048
FC # of Category
Layer Operation Repeat Output Size
Conv1 7x7, 64, s=2 64x16x112x112
Pool1 3x3x3, s=2,2,2 64x8x56x56
Res2 [1x1, 64 / 3x3, 64 / 1x1, 256]
+ [1x1, 256]
x3 256x8x56x56
Pool_T 3x1x1, s=2,1,1 256x4x56x56
Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512]
+ [1x1, 512]
x4 512x4x28x28
Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024]
+ [1x1, 1024]
x6 1024x4x14x14
Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048]
+ [1x1, 2048]
x3 2048x4x7x7
Pool2 4x7x7 2048x1
FC # of Category
Layer Operation Repeat Output Size
Conv1 5x7x7, 64, s=2 64x16x112x112
Pool1 3x3x3, s=2,2,2 64x8x56x56
Res2 [1x1, 64 / 3x3x3, 64 / 1x1, 256]
+ [1x1, 256]
x3 256x8x56x56
Pool_T 3x1x1, s=2,1,1 256x4x56x56
Res3 [1x1, 128, s=2 / 3x3x3, 128 / 1x1, 512]
+ [1x1, 512]
x4 512x4x28x28
Res4 [1x1, 256, s=2 / 3x3x3, 256 / 1x1, 1024]
+ [1x1, 1024]
x6 1024x4x14x14
Res5 [1x1, 512, s=2 / 3x3x3, 512 / 1x1, 2048]
+ [1x1, 2048]
x3 2048x4x7x7
Pool2 4x7x7 2048x1
FC # of Category
Layer Operation Repeat Output Size
Conv1 5x7x7, 64, s=2 64x16x112x112
Pool1 3x3x3, s=2,2,2 64x8x56x56
Res2 [3x1x1, 64 / 3x3, 64 / 1x1, 256]
+ [1x1, 256]
x3 256x8x56x56
Pool_T 3x1x1, s=2,1,1 256x4x56x56
Res3 [3x1x1, 128, s=2 / 3x3, 128 / 1x1, 512]
+ [1x1, 512]
x4 512x4x28x28
Res4 [3x1x1, 256, s=2 / 3x3, 256 / 1x1, 1024]
+ [1x1, 1024]
x6 1024x4x14x14
Res5 [3x1x1, 512, s=2 / 3x3, 512 / 1x1, 2048]
+ [1x1, 2048]
x3 2048x4x7x7
Pool2 4x7x7 2048x1
FC # of Category
Pretrained 3x3
Copy x3
Devide Weights by1/3
Pretrained 1x1
Copy x3
Devide Weights by1/3
3x3x3
3x1x1
• 2D Conv Training  3D Conv Test
• Kinetics Dataset
• ~246k Videos (Train)
• 20k Videos (Validation)
• 400 Human Action Categories
• Add 1 Non-local Block
• Right before the last residual block of res4
• The Attentional Behavior is Not the Key to the Improvement
• Similarity + Learning >> Similarity (Gaussian)
Stride=2 Conv Stride=1 ConvStride=2 Pool
512x28x28
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
256256 1024
1024
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1
x
1
3
x
3
1
x
1
1
x
1
+
1024x14x14
• Similar
• Except res5 (Feature Map size is too Small)
• Long-range Multi-hop Communication
• Shallow 5-block ResNet50
> Deep baseline ResNet101
• Add NL Blocks
> Add Residual Blocks
• Add 5 NL Blocks
• Spacetime >> Space = Time >> Baseline
• NL C2D > I3D (3D ConvNet Baseline)
• Smaller Number of FLOPS
• Add 5 NL Blocks
• 128 Frame (vs 32 Frame)
Pr083 Non-local Neural Networks
Pr083 Non-local Neural Networks
Pr083 Non-local Neural Networks

More Related Content

What's hot

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksSungminYou
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkFerdous ahmed
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptxNoorUlHaq47
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generationKyuYeolJung
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANS.Shayan Daneshvar
 
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6Toshinori Hanya
 
Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Hsing-chuan Hsieh
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose EstimationArithmer Inc.
 

What's hot (20)

Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Learning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networksLearning spatiotemporal features with 3 d convolutional networks
Learning spatiotemporal features with 3 d convolutional networks
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
U-Netpresentation.pptx
U-Netpresentation.pptxU-Netpresentation.pptx
U-Netpresentation.pptx
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Attn-gan : fine-grained text to image generation
Attn-gan :  fine-grained text to image generationAttn-gan :  fine-grained text to image generation
Attn-gan : fine-grained text to image generation
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Image to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GANImage to image translation with Pix2Pix GAN
Image to image translation with Pix2Pix GAN
 
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
CNNの可視化手法Grad-CAMの紹介~CNNさん、あなたはどこを見ているの?~ | OHS勉強会#6
 
Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)
 
Yolo
YoloYolo
Yolo
 
Object Pose Estimation
Object Pose EstimationObject Pose Estimation
Object Pose Estimation
 

Similar to Pr083 Non-local Neural Networks

Elementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions ManualElementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions Manualzuxigytix
 
Factoring common monomial
Factoring common monomialFactoring common monomial
Factoring common monomialAjayQuines
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithmsbigdata trunk
 
graphical-copy-130308123000-phpapp02.pptx
graphical-copy-130308123000-phpapp02.pptxgraphical-copy-130308123000-phpapp02.pptx
graphical-copy-130308123000-phpapp02.pptxABHIJEETKUMAR992494
 
Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)William Faber
 
College algebra in context 5th edition harshbarger solutions manual
College algebra in context 5th edition harshbarger solutions manualCollege algebra in context 5th edition harshbarger solutions manual
College algebra in context 5th edition harshbarger solutions manualAnnuzzi19
 
Student manual
Student manualStudent manual
Student manualec931657
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentRaphael Reitzig
 
GCSEYr9-SolvingQuadratics.pptx
GCSEYr9-SolvingQuadratics.pptxGCSEYr9-SolvingQuadratics.pptx
GCSEYr9-SolvingQuadratics.pptxAngelle Pantig
 
五次方程式は解けない - 第12回 #日曜数学会
五次方程式は解けない - 第12回 #日曜数学会五次方程式は解けない - 第12回 #日曜数学会
五次方程式は解けない - 第12回 #日曜数学会Junpei Tsuji
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Dimas Ruliandi
 
4x4 multiplication in Vedic Mathematics
4x4 multiplication in Vedic Mathematics4x4 multiplication in Vedic Mathematics
4x4 multiplication in Vedic Mathematicsculturalcomputingindia
 
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Software
 
Algebra Trigonometry Problems
Algebra Trigonometry ProblemsAlgebra Trigonometry Problems
Algebra Trigonometry ProblemsDon Dooley
 
College algebra real mathematics real people 7th edition larson solutions manual
College algebra real mathematics real people 7th edition larson solutions manualCollege algebra real mathematics real people 7th edition larson solutions manual
College algebra real mathematics real people 7th edition larson solutions manualJohnstonTBL
 
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfdiffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfChung Hyung Jin
 

Similar to Pr083 Non-local Neural Networks (20)

Elementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions ManualElementary Linear Algebra 5th Edition Larson Solutions Manual
Elementary Linear Algebra 5th Edition Larson Solutions Manual
 
Factoring common monomial
Factoring common monomialFactoring common monomial
Factoring common monomial
 
Introduction to machine learning algorithms
Introduction to machine learning algorithmsIntroduction to machine learning algorithms
Introduction to machine learning algorithms
 
graphical-copy-130308123000-phpapp02.pptx
graphical-copy-130308123000-phpapp02.pptxgraphical-copy-130308123000-phpapp02.pptx
graphical-copy-130308123000-phpapp02.pptx
 
Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)Pre-calculus 1, 2 and Calculus I (exam notes)
Pre-calculus 1, 2 and Calculus I (exam notes)
 
College algebra in context 5th edition harshbarger solutions manual
College algebra in context 5th edition harshbarger solutions manualCollege algebra in context 5th edition harshbarger solutions manual
College algebra in context 5th edition harshbarger solutions manual
 
Student manual
Student manualStudent manual
Student manual
 
FUNCTIONS L.1.pdf
FUNCTIONS L.1.pdfFUNCTIONS L.1.pdf
FUNCTIONS L.1.pdf
 
Practical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient ApportionmentPractical and Worst-Case Efficient Apportionment
Practical and Worst-Case Efficient Apportionment
 
EPCA_MODULE-2.pptx
EPCA_MODULE-2.pptxEPCA_MODULE-2.pptx
EPCA_MODULE-2.pptx
 
GCSEYr9-SolvingQuadratics.pptx
GCSEYr9-SolvingQuadratics.pptxGCSEYr9-SolvingQuadratics.pptx
GCSEYr9-SolvingQuadratics.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 
五次方程式は解けない - 第12回 #日曜数学会
五次方程式は解けない - 第12回 #日曜数学会五次方程式は解けない - 第12回 #日曜数学会
五次方程式は解けない - 第12回 #日曜数学会
 
Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...Direct solution of sparse network equations by optimally ordered triangular f...
Direct solution of sparse network equations by optimally ordered triangular f...
 
4x4 multiplication in Vedic Mathematics
4x4 multiplication in Vedic Mathematics4x4 multiplication in Vedic Mathematics
4x4 multiplication in Vedic Mathematics
 
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern...
 
Algebra Trigonometry Problems
Algebra Trigonometry ProblemsAlgebra Trigonometry Problems
Algebra Trigonometry Problems
 
College algebra real mathematics real people 7th edition larson solutions manual
College algebra real mathematics real people 7th edition larson solutions manualCollege algebra real mathematics real people 7th edition larson solutions manual
College algebra real mathematics real people 7th edition larson solutions manual
 
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdfdiffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
diffusion_posterior_sampling_for_general_noisy_inverse_problems_slideshare.pdf
 
Game theory lecture
Game theory   lectureGame theory   lecture
Game theory lecture
 

More from Taeoh Kim

CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 
PR 127: FaceNet
PR 127: FaceNetPR 127: FaceNet
PR 127: FaceNetTaeoh Kim
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
PR 103: t-SNE
PR 103: t-SNEPR 103: t-SNE
PR 103: t-SNETaeoh Kim
 
Pr072 deep compression
Pr072 deep compressionPr072 deep compression
Pr072 deep compressionTaeoh Kim
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnnTaeoh Kim
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 

More from Taeoh Kim (7)

CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
PR 127: FaceNet
PR 127: FaceNetPR 127: FaceNet
PR 127: FaceNet
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
PR 103: t-SNE
PR 103: t-SNEPR 103: t-SNE
PR 103: t-SNE
 
Pr072 deep compression
Pr072 deep compressionPr072 deep compression
Pr072 deep compression
 
Pr057 mask rcnn
Pr057 mask rcnnPr057 mask rcnn
Pr057 mask rcnn
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 

Pr083 Non-local Neural Networks

  • 1. PR 083 (29th April, 2018) Taeoh Kim
  • 2. • CVPR18 Poster • Inspired by Non-local Means
  • 3.
  • 4.
  • 9. • Average Similar Pixels • Do not Average non-Similar Pixels Problem) Not Enough Similar Pixels in LOCAL REGIONS
  • 10. • Average Similar Pixels • Do not Average non-Similar Pixels Problem) Not Enough Similar Pixels in LOCAL REGIONS  Get More Samples in Non-LOCAL REGIONS
  • 14. Slides fromBIL717ImageProcessing 2012 𝑁𝐿𝑀𝐹 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2 𝐼 𝑞 𝐵𝐴 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐼 𝑞 𝐺 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐺 𝜎 𝑝 − 𝑞 2 𝐼 𝑞 𝐺 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐺 𝜎 𝑝 − 𝑞 2 𝐺 𝜎 𝑟 𝐼 𝑝 − 𝐼 𝑞 1 𝐼 𝑞
  • 15. 𝑁𝐿𝑀𝐹 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2 𝐼 𝑞 Output Value Representation (ProbabilityDistribution) TargetValue (Pixel) vs AllValues(Pixel) Inputs
  • 16.
  • 17. PR-049 , Attention isAllYou Need
  • 18. 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑄, 𝐾, 𝑉 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄𝐾 𝑇 𝑑 𝑘 𝑉 Output Value Representation (ProbabilityDistribution) TargetValue (Query) vs AllValues(Keys) Inputs PR-049 , Attention isAllYou Need
  • 19. 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑 𝑄, 𝐾, 𝑉 = 𝑊 ∙ 𝐶𝑜𝑛𝑐𝑎𝑡 ℎ𝑒𝑎𝑑1, … , ℎ𝑒𝑎𝑑ℎ ℎ𝑒𝑎𝑑𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄𝑊𝑖𝑄, 𝐾𝑊𝑖𝐾, 𝑉𝑊𝑖𝑉) PR-049 , Attention isAllYou Need
  • 20.
  • 21. 𝑁𝐿𝑀𝐹 𝐼 𝑝 = 1 𝑊 ෍ 𝑞 𝐺 𝜎 𝑉𝑝 − 𝑉𝑞 2 𝐼 𝑞 Output Value Representation (ProbabilityDistribution) TargetValue (Pixel) vs AllValues(Pixel) Inputs
  • 22. 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗) Output Value Representation (ProbabilityDistribution) TargetValue (Pixel) vs AllValues(Pixel) Inputs
  • 23. Another Representation of Input Local Pixels = Weighted Sum of Local Pixels with Learned Filter 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗∈3×3 𝑤𝑗 𝑔(𝑥𝑗)
  • 24. Another Representation of Non-Local Pixels = Weighted Sum of All Pixels with Similarity 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
  • 25. Another Representation of Non-Local Pixels = Weighted Sum of All Pixels with Similarity +Learning… 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
  • 26. 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗) • Gaussian 𝑓 𝑥𝑖, 𝑥𝑗 = exp(𝑥𝑖 𝑇 ∙ 𝑥𝑗) • Embedded Gaussian 𝑓 𝑥𝑖, 𝑥𝑗 = exp(𝜃(𝑥𝑖 𝑇 ) ∙ 𝜙(𝑥𝑗)) • Dot Product 𝑓 𝑥𝑖, 𝑥𝑗 = 𝜃(𝑥𝑖 𝑇 ) ∙ 𝜙(𝑥𝑗) • Concatenation 𝑓 𝑥𝑖, 𝑥𝑗 = 𝑅𝑒𝐿𝑈(𝑤𝑓 𝑇 𝜃(𝑥𝑖) ∙ 𝜙(𝑥𝑗) )
  • 27. 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗) 𝑔 𝑥𝑗 = 𝑊𝑔 𝑥𝑗 For Feature Extraction
  • 28. 𝑦𝑖 = 1 σ 𝑗 exp(𝑥𝑖 𝑇 ∙ 𝑥𝑗) ෍ 𝑗 exp(𝑥𝑖 𝑇 ∙ 𝑥𝑗) 𝑊𝑔 𝑥𝑗 Soft max HxWx1024 HxWx512 HWx512 HWx1024 1024xHW HWxHW HWx512 HxWx512 Reshape 1x1Conv Operation
  • 29. 𝑦𝑖 = 1 σ 𝑗 exp(𝜃(𝑥𝑖 𝑇 ) ∙ 𝜙(𝑥𝑗)) ෍ 𝑗 exp(𝜃(𝑥𝑖 𝑇 ) ∙ 𝜙(𝑥𝑗))𝑊𝑔 𝑥𝑗 Soft max HxWx1024 HxWx512 HWx512 HWx512 512xHW HWxHW HWx512 HxWx512 Reshape 1x1Conv Operation
  • 30. 𝑦𝑖 = 1 𝑁 ෍ 𝑗 𝜃(𝑥𝑖 𝑇 ) ∙ 𝜙(𝑥𝑗) 𝑊𝑔 𝑥𝑗 1/N HxWx1024 HxWx512 HWx512 HWx512 512xHW HWxHW HWx512 HxWx512 Reshape 1x1Conv Operation
  • 31. 𝑦𝑖 = 1 𝑁 ෍ 𝑗 𝑅𝑒𝐿𝑈(𝑤𝑓 𝑇 𝜃(𝑥𝑖) ∙ 𝜙(𝑥𝑗) )𝑊𝑔 𝑥𝑗 1/N HxWx1024 HxWx512 HWx512 HWx512 HWx512 HWx1024 HWx512 HxWx512 Reshape 1x1Conv Operation 1024 xHW +ReLU HWxHW
  • 33.
  • 34. Another Representation of Non-Local Pixels = Weighted Sum of All Pixels with Similarity + Learning? 𝑦𝑖 = 1 𝐶(𝑥) ෍ 𝑗 𝑓 𝑥𝑖, 𝑥𝑗 𝑔(𝑥𝑗)
  • 35. Recalibrate Features? • Global Representation • Global Context • Long-range Dependencies • Shorter Paths
  • 37. Squeeze –and–Excitation Networks,CVPR 2018 Channel-wise Feature Recalibration - SENet (ILSVRC 2017 Winner) • 2 FC (Fully Connected) Layers Between Channels • Excitation Layer Output (Representation) x Input = Output
  • 38. • Squeeze-and-Excitation Networks (Channel-wise) 𝑥𝑖 / Learned Weights / 𝑥𝑗 • Self-Attention (Spatial) Embedded 𝑥𝑖 / Similarity Weights / Embedded 𝑥𝑗 / Positional Encoding • Non-local Neural Networks (Spatial/Temporal) Embedded or Not 𝑥𝑖 / Similarity Weights / Embedded 𝑥𝑗
  • 39.
  • 40. Layer Operation Repeat Output Size Conv1 7x7, 64, s=2 64x112x112 Pool1 3x3, s=2 64x56x56 Res2 [1x1, 64 / 3x3, 64 / 1x1, 256] + [1x1, 256] x3 256x56x56 Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512] + [1x1, 512] x4 512x28x28 Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024] + [1x1, 1024] x6 1024x14x14 Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048] + [1x1, 2048] x3 2048x7x7 Pool2 7x7 2048 FC # of Category
  • 41. 3x224x224 64x112x112 64x56x56 7 x 7 P o o l 1 x 1 3 x 3 1 x 1 1 x 1 + Stride=2 Conv Stride=1 ConvStride=2 Pool 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 256x56x56 64 6464256 256 6464256 256 6464256 256
  • 42. Stride=2 Conv Stride=1 ConvStride=2 Pool 256x56x56 512x28x28 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 128128512 512
  • 43. Stride=2 Conv Stride=1 ConvStride=2 Pool 512x28x28 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 256256 1024 1024 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1024x14x14
  • 44. Stride=2 Conv Stride=1 ConvStride=2 Pool 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1024x14x14 2048x7x7 512 512 2048 2048 2048x1x1 1000 7 x 7 F C
  • 45. Layer Operation Repeat Output Size Conv1 7x7, 64, s=2 64x112x112 Pool1 3x3, s=2 64x56x56 Res2 [1x1, 64 / 3x3, 64 / 1x1, 256] + [1x1, 256] x3 256x56x56 Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512] + [1x1, 512] x4 512x28x28 Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024] + [1x1, 1024] x6 1024x14x14 Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048] + [1x1, 2048] x3 2048x7x7 Pool2 7x7 2048 FC # of Category
  • 46. Layer Operation Repeat Output Size Conv1 7x7, 64, s=2 64x16x112x112 Pool1 3x3x3, s=2,2,2 64x8x56x56 Res2 [1x1, 64 / 3x3, 64 / 1x1, 256] + [1x1, 256] x3 256x8x56x56 Pool_T 3x1x1, s=2,1,1 256x4x56x56 Res3 [1x1, 128, s=2 / 3x3, 128 / 1x1, 512] + [1x1, 512] x4 512x4x28x28 Res4 [1x1, 256, s=2 / 3x3, 256 / 1x1, 1024] + [1x1, 1024] x6 1024x4x14x14 Res5 [1x1, 512, s=2 / 3x3, 512 / 1x1, 2048] + [1x1, 2048] x3 2048x4x7x7 Pool2 4x7x7 2048x1 FC # of Category
  • 47. Layer Operation Repeat Output Size Conv1 5x7x7, 64, s=2 64x16x112x112 Pool1 3x3x3, s=2,2,2 64x8x56x56 Res2 [1x1, 64 / 3x3x3, 64 / 1x1, 256] + [1x1, 256] x3 256x8x56x56 Pool_T 3x1x1, s=2,1,1 256x4x56x56 Res3 [1x1, 128, s=2 / 3x3x3, 128 / 1x1, 512] + [1x1, 512] x4 512x4x28x28 Res4 [1x1, 256, s=2 / 3x3x3, 256 / 1x1, 1024] + [1x1, 1024] x6 1024x4x14x14 Res5 [1x1, 512, s=2 / 3x3x3, 512 / 1x1, 2048] + [1x1, 2048] x3 2048x4x7x7 Pool2 4x7x7 2048x1 FC # of Category
  • 48. Layer Operation Repeat Output Size Conv1 5x7x7, 64, s=2 64x16x112x112 Pool1 3x3x3, s=2,2,2 64x8x56x56 Res2 [3x1x1, 64 / 3x3, 64 / 1x1, 256] + [1x1, 256] x3 256x8x56x56 Pool_T 3x1x1, s=2,1,1 256x4x56x56 Res3 [3x1x1, 128, s=2 / 3x3, 128 / 1x1, 512] + [1x1, 512] x4 512x4x28x28 Res4 [3x1x1, 256, s=2 / 3x3, 256 / 1x1, 1024] + [1x1, 1024] x6 1024x4x14x14 Res5 [3x1x1, 512, s=2 / 3x3, 512 / 1x1, 2048] + [1x1, 2048] x3 2048x4x7x7 Pool2 4x7x7 2048x1 FC # of Category
  • 49. Pretrained 3x3 Copy x3 Devide Weights by1/3 Pretrained 1x1 Copy x3 Devide Weights by1/3 3x3x3 3x1x1 • 2D Conv Training  3D Conv Test
  • 50.
  • 51. • Kinetics Dataset • ~246k Videos (Train) • 20k Videos (Validation) • 400 Human Action Categories
  • 52. • Add 1 Non-local Block • Right before the last residual block of res4 • The Attentional Behavior is Not the Key to the Improvement • Similarity + Learning >> Similarity (Gaussian)
  • 53. Stride=2 Conv Stride=1 ConvStride=2 Pool 512x28x28 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 256256 1024 1024 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1 x 1 3 x 3 1 x 1 1 x 1 + 1024x14x14
  • 54. • Similar • Except res5 (Feature Map size is too Small)
  • 55. • Long-range Multi-hop Communication • Shallow 5-block ResNet50 > Deep baseline ResNet101 • Add NL Blocks > Add Residual Blocks
  • 56. • Add 5 NL Blocks • Spacetime >> Space = Time >> Baseline
  • 57. • NL C2D > I3D (3D ConvNet Baseline) • Smaller Number of FLOPS
  • 58. • Add 5 NL Blocks
  • 59. • 128 Frame (vs 32 Frame)