SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Paper Reviews in
Visual Attention
1
2018.3.29
SNU DATAMINING CENTER
MINKI CHUNG
WHO AM I 2
▸ Chung Minki
▸ BS, KAIST, IE, 2016
▸ MS, SNU, IE, 2018..?!
▸ Vision Projects
▸ Working on Semantic Image Inpainting
WHAT IS VISUAL ATTENTION 3
▸ Attention is HOT nowadays
▸ http://openaccess.thecvf.com/CVPR2017_search.py
▸ http://search.iclr2018.smerity.com/search/?query=attention
WHAT IS VISUAL ATTENTION 4
▸ Maybe heard of
▸ "Neural Machine Translation by Jointly Learning to Align and Translate"
▸ "Show, Attend, and Tell: Neural Image Caption"
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate"
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML.
"Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention"
WHAT IS VISUAL ATTENTION 5
▸ More,
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, NIPS, 2014. "Spatial Transformer Network"
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
Siavash Gorji, James J. Clark, 2017, CVPR. "Attentional Push: A Deep Convolutional Network for Augmenting Image Salience
with Shared Attention Modeling in Social Scenes"
WHAT IS VISUAL ATTENTION 6
▸ Visual Attention:
▸ Attend on certain part of image to solve a task more efficiently
▸ Deep learning, the black box model → Interpretability
TABLE OF CONTENTS 7
▸ Early Works
▸ Recurrent Attention Model (RAM)
▸ Spatial Transformer Network (STN)
▸ Recent Works of visual attention
▸ in ICLR
▸ in CVPR
PREREQUISITE 8
▸ CNN, Transpose Convolution(or Deconvolution), Dilated Convolution
▸ RNN
▸ MLP
▸ GAN
https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
EARLY WORKS
:RAM, STN
9
RECURRENT ATTENTION MODEL 10
▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS.
"Recurrent Models of Visual Attention"
▸ Google DeepMind, 563 citations
▸ Motivation: Confronted by large image, human process image sequentially,
selecting where and what to look
▸ Tackle ConvNet limitation: poor scalability with increasing input image size
RECURRENT ATTENTION MODEL 11
▸ Multiple Object Recognition with Visual Attention (DRAM), 2015, ICLR
▸ Refined architecture version of RAM
▸ RNN Structure with multi-resolution crop, called glimpse
▸ Architecture:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
RECURRENT ATTENTION MODEL 12
▸ Architecture:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
WHERE TO SEE
WHAT TO SEE
provide initial state
locate glimpse
outputs the inputs for rnn(1)
for multiple objects
RECURRENT ATTENTION MODEL 13
▸ Demo
▸ Single object classification
https://github.com/kevinzakka/recurrent-visual-attention
RECURRENT ATTENTION MODEL 14
▸ Training:
▸ maximize
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
LOWERBOUND F
multiple object case
RECURRENT ATTENTION MODEL 15
▸ Cont'd:
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
REINFORCE
RECURRENT ATTENTION MODEL 16
▸ Experiments & Results
▸ MNIST, SVHN
Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
SPATIAL TRANSFORMER NETWORK 17
▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014
NIPS. "Spatial Transformer Network"
▸ Google DeepMind, 624 citations
▸ Motivation: Human process distorted objects by un-distorting it
▸ ConvNet is not actually invariant to large transformation(only realised over a
deep hierarchy of max-pooling)
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
https://kevinzakka.github.io/2017/01/18/stn-part2/
SPATIAL TRANSFORMER NETWORK 18
▸ Architecture:
▸ three parts: localisation net, sampling grid, sampler
▸ Assume 𝛵𝜃 is 2D affine transformation A𝜃,
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
regression
H,W,C H',W',C
SPATIAL TRANSFORMER NETWORK 19
▸ 𝛵𝜃, for attention becomes:
▸ Allowing cropping, translation, isotropic scaling
▸ In case if a bilinear sampling kernel,
▸ Differentiable, Modular,
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 20
▸ Experiments and Results
▸ MNIST
▸ SVHN
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 21
▸ Experiments and Results
▸ Fine-grained classification (CUB-200-211 bird classification dataset)
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
SPATIAL TRANSFORMER NETWORK 22
▸ Already implemented in Tensorlayer
Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 23
▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional
Networks for Saliency Detection"
▸ RAM(Glimpse system) + STN(Differentiability) for Saliency Detection
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 24
▸ Recurrent Attentional Convolutional-Deconvolutional Network (RACDNN)
▸ Architecture
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 25
▸ Experiments & Results
Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
RECENT WORKS
:ICLR, CVPR
26
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 27
▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR.
"Generative Image Inpainting with Contextual Attention"
▸ Adobe Research
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 28
▸ Architecture
▸ Two-stage(coarse to fine)
▸ Global and Local W-GANS
▸ Spatially discounted reconstruction loss(𝑙1): 𝛾
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
USE W-GAN
attention
𝑙
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 29
▸ Attention
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
fx,y
bx,y
Calculate cosine similarity:
GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 30
▸ Experiments & Results
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
LEARN TO PAY ATTENTION 31
▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn
to Pay Attention"
▸ Very simple
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LEARN TO PAY ATTENTION 32
▸ Architecture
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
Attention
Compatibility
function(dot
product)
LEARN TO PAY ATTENTION 33
▸ Experiments & Results
▸ Image classification and fine-grained recognition
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LEARN TO PAY ATTENTION 34
▸ Experiments & Results
▸ Weakly supervised semantic segmentation
Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
LOOK CLOSER TO SEE BETTER 35
▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better:
Recurrent Attention Convolutional Neural Network for Fine-grained Image
Recognition"
▸ Fine-grained image recognition:
▸ Discriminative region localization + fine-grained feature learning
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 36
▸ Recurrent Attention Convolutional Neural Network (RA-CNN)
▸ Multi-scale networks: classification sub-network, attention proposal sub-
network(APN)
▸ Finer-scale network (coarse to fine)
▸ Intra-scale softmax loss for classification, inter-scale pairwise ranking loss for
APN
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 37
▸ RA-CNN architecture:
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
bilinear
interpolation
to amplify
LOOK CLOSER TO SEE BETTER 38
▸ Training:
▸ Multi-task loss:
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
forces
LOOK CLOSER TO SEE BETTER 39
▸ Experiments & Results
▸ CUB-200-211 Bird Dataset
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
LOOK CLOSER TO SEE BETTER 40
▸ Experiments & Results
▸ Stanford Dogs, Stanford Cars
Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-
grained Image Recognition"
SUMMARY 41
▸ Attention for efficiency, better performance, interpretability
▸ Many types of Attention:
▸ RAM
▸ STN
▸ RAM+STN
▸ Others
ANY Q?
42
REFERERNCE 43
▸ Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly
Learning to Align and Translate"
▸ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard
Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual
Attention"
▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual
Attention"
▸ Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual
Attention"
▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer
Network"
▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image
Inpainting with Contextual Attention"
▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention
Convolutional Neural Network for Fine-grained Image Recognition"
END OF
DOCUMENT
44

Weitere ähnliche Inhalte

Ähnlich wie Paper Reviews on Visual Attention

(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
NAVER Engineering
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
Ahmed Gad
 

Ähnlich wie Paper Reviews on Visual Attention (20)

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...(Research Note) Delving deeper into convolutional neural networks for camera ...
(Research Note) Delving deeper into convolutional neural networks for camera ...
 
Cs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and UnderstandingCs231n 2017 lecture12 Visualizing and Understanding
Cs231n 2017 lecture12 Visualizing and Understanding
 
Modeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networksModeling perceptual similarity and shift invariance in deep networks
Modeling perceptual similarity and shift invariance in deep networks
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
What Would Shannon Do?
What Would Shannon Do?What Would Shannon Do?
What Would Shannon Do?
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
DLD_WeightSharing_Slide
DLD_WeightSharing_SlideDLD_WeightSharing_Slide
DLD_WeightSharing_Slide
 
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for DenoisingSupervised Learning of Sparsity-Promoting Regularizers for Denoising
Supervised Learning of Sparsity-Promoting Regularizers for Denoising
 
capsule network
capsule networkcapsule network
capsule network
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
 
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
[OSGeo-KR Tech Workshop] Deep Learning for Single Image Super-Resolution
 
Intermediate inception network for person re-identification
Intermediate inception network for person re-identificationIntermediate inception network for person re-identification
Intermediate inception network for person re-identification
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Paper Reviews on Visual Attention

  • 1. Paper Reviews in Visual Attention 1 2018.3.29 SNU DATAMINING CENTER MINKI CHUNG
  • 2. WHO AM I 2 ▸ Chung Minki ▸ BS, KAIST, IE, 2016 ▸ MS, SNU, IE, 2018..?! ▸ Vision Projects ▸ Working on Semantic Image Inpainting
  • 3. WHAT IS VISUAL ATTENTION 3 ▸ Attention is HOT nowadays ▸ http://openaccess.thecvf.com/CVPR2017_search.py ▸ http://search.iclr2018.smerity.com/search/?query=attention
  • 4. WHAT IS VISUAL ATTENTION 4 ▸ Maybe heard of ▸ "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ "Show, Attend, and Tell: Neural Image Caption" Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention"
  • 5. WHAT IS VISUAL ATTENTION 5 ▸ More, Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, NIPS, 2014. "Spatial Transformer Network" Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" Siavash Gorji, James J. Clark, 2017, CVPR. "Attentional Push: A Deep Convolutional Network for Augmenting Image Salience with Shared Attention Modeling in Social Scenes"
  • 6. WHAT IS VISUAL ATTENTION 6 ▸ Visual Attention: ▸ Attend on certain part of image to solve a task more efficiently ▸ Deep learning, the black box model → Interpretability
  • 7. TABLE OF CONTENTS 7 ▸ Early Works ▸ Recurrent Attention Model (RAM) ▸ Spatial Transformer Network (STN) ▸ Recent Works of visual attention ▸ in ICLR ▸ in CVPR
  • 8. PREREQUISITE 8 ▸ CNN, Transpose Convolution(or Deconvolution), Dilated Convolution ▸ RNN ▸ MLP ▸ GAN https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d
  • 10. RECURRENT ATTENTION MODEL 10 ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Google DeepMind, 563 citations ▸ Motivation: Confronted by large image, human process image sequentially, selecting where and what to look ▸ Tackle ConvNet limitation: poor scalability with increasing input image size
  • 11. RECURRENT ATTENTION MODEL 11 ▸ Multiple Object Recognition with Visual Attention (DRAM), 2015, ICLR ▸ Refined architecture version of RAM ▸ RNN Structure with multi-resolution crop, called glimpse ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  • 12. RECURRENT ATTENTION MODEL 12 ▸ Architecture: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" WHERE TO SEE WHAT TO SEE provide initial state locate glimpse outputs the inputs for rnn(1) for multiple objects
  • 13. RECURRENT ATTENTION MODEL 13 ▸ Demo ▸ Single object classification https://github.com/kevinzakka/recurrent-visual-attention
  • 14. RECURRENT ATTENTION MODEL 14 ▸ Training: ▸ maximize Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" LOWERBOUND F multiple object case
  • 15. RECURRENT ATTENTION MODEL 15 ▸ Cont'd: Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" REINFORCE
  • 16. RECURRENT ATTENTION MODEL 16 ▸ Experiments & Results ▸ MNIST, SVHN Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention"
  • 17. SPATIAL TRANSFORMER NETWORK 17 ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Google DeepMind, 624 citations ▸ Motivation: Human process distorted objects by un-distorting it ▸ ConvNet is not actually invariant to large transformation(only realised over a deep hierarchy of max-pooling) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" https://kevinzakka.github.io/2017/01/18/stn-part2/
  • 18. SPATIAL TRANSFORMER NETWORK 18 ▸ Architecture: ▸ three parts: localisation net, sampling grid, sampler ▸ Assume 𝛵𝜃 is 2D affine transformation A𝜃, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network" regression H,W,C H',W',C
  • 19. SPATIAL TRANSFORMER NETWORK 19 ▸ 𝛵𝜃, for attention becomes: ▸ Allowing cropping, translation, isotropic scaling ▸ In case if a bilinear sampling kernel, ▸ Differentiable, Modular, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 20. SPATIAL TRANSFORMER NETWORK 20 ▸ Experiments and Results ▸ MNIST ▸ SVHN Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 21. SPATIAL TRANSFORMER NETWORK 21 ▸ Experiments and Results ▸ Fine-grained classification (CUB-200-211 bird classification dataset) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 22. SPATIAL TRANSFORMER NETWORK 22 ▸ Already implemented in Tensorlayer Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014, NIPS. "Spatial Transformer Network"
  • 23. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 23 ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ RAM(Glimpse system) + STN(Differentiability) for Saliency Detection Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 24. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 24 ▸ Recurrent Attentional Convolutional-Deconvolutional Network (RACDNN) ▸ Architecture Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 25. RECURRENT ATTENTIONAL NETWORKS FOR SALIENCY DETECTION 25 ▸ Experiments & Results Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection"
  • 27. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 27 ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Adobe Research Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  • 28. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 28 ▸ Architecture ▸ Two-stage(coarse to fine) ▸ Global and Local W-GANS ▸ Spatially discounted reconstruction loss(𝑙1): 𝛾 Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention USE W-GAN attention 𝑙
  • 29. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 29 ▸ Attention Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention fx,y bx,y Calculate cosine similarity:
  • 30. GENERATIVE IMAGE INPAINTING WITH CONTEXTUAL ATTENTION 30 ▸ Experiments & Results Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention
  • 31. LEARN TO PAY ATTENTION 31 ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Very simple Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 32. LEARN TO PAY ATTENTION 32 ▸ Architecture Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" Attention Compatibility function(dot product)
  • 33. LEARN TO PAY ATTENTION 33 ▸ Experiments & Results ▸ Image classification and fine-grained recognition Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 34. LEARN TO PAY ATTENTION 34 ▸ Experiments & Results ▸ Weakly supervised semantic segmentation Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention"
  • 35. LOOK CLOSER TO SEE BETTER 35 ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition" ▸ Fine-grained image recognition: ▸ Discriminative region localization + fine-grained feature learning Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 36. LOOK CLOSER TO SEE BETTER 36 ▸ Recurrent Attention Convolutional Neural Network (RA-CNN) ▸ Multi-scale networks: classification sub-network, attention proposal sub- network(APN) ▸ Finer-scale network (coarse to fine) ▸ Intra-scale softmax loss for classification, inter-scale pairwise ranking loss for APN Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 37. LOOK CLOSER TO SEE BETTER 37 ▸ RA-CNN architecture: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" bilinear interpolation to amplify
  • 38. LOOK CLOSER TO SEE BETTER 38 ▸ Training: ▸ Multi-task loss: Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition" forces
  • 39. LOOK CLOSER TO SEE BETTER 39 ▸ Experiments & Results ▸ CUB-200-211 Bird Dataset Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 40. LOOK CLOSER TO SEE BETTER 40 ▸ Experiments & Results ▸ Stanford Dogs, Stanford Cars Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine- grained Image Recognition"
  • 41. SUMMARY 41 ▸ Attention for efficiency, better performance, interpretability ▸ Many types of Attention: ▸ RAM ▸ STN ▸ RAM+STN ▸ Others
  • 43. REFERERNCE 43 ▸ Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, 2015, ICLR. "Neural Machine Translation by Jointly Learning to Align and Translate" ▸ Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, Yoshua Bengio, 2015, ICML. "Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention" ▸ Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu, 2014, NIPS. "Recurrent Models of Visual Attention" ▸ Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, 2015, ILCR. "Multiple Object Recognition With Visual Attention" ▸ Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, 2014 NIPS. "Spatial Transformer Network" ▸ Jason Kuen, Zhenhua Wang, Gang Wang, 2016, CVPR. "Recurrent Attentional Networks for Saliency Detection" ▸ Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang, 2018, CVPR. "Generative Image Inpainting with Contextual Attention" ▸ Saumya Jetley, Nicholas A. Lord, Namhoon Lee, Philip H. S. Torr, 2018, ICLR. "Learn to Pay Attention" ▸ Jianlong Fu, Heliang Zheng, Tao Mei, 2017, CVPR. "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-grained Image Recognition"