Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

•

1 gefällt mir•610 views

https://telecombcn-dl.github.io/2017-dlcv/ Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Daten & Analysen

[course site]
#DLUPC
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Visual saliency
Day 4 Lecture 3

Why don’t we see the changes?
We don’t really see the whole image
We only focus on small specific regions: the salient parts
Human beings reliably attend to the same regions of images
when shown

Can we predict where humans will look?
Yes! Computational models of visual saliency
Why might this be useful?

SalNet: deep visual saliency model
Predict map of visual attention from image pixels
(find the parts of the image that stand out)
● Feedforward 8 layer “fully convolutional”
architecture
● Transfer learning in bottom 3 layers from
pretrained VGG-M model on ImageNet
● Trained on SALICON dataset (simulated
crowdsourced attention dataset using
mouse and artificial foveation)
● MIT 300 saliency benchmark
http://saliency.mit.edu/results_mit300.html
Predicted Ground truth
Pan, McGuinness, et al. Shallow and Deep Convolutional Networks for Saliency Prediction, CVPR 2016 http://arxiv.org/abs/1603.00845

SalGAN
Adversarial loss
Data loss
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual
Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017.
16

SALICON
Huang et al., SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks, ICCV 2015

ML-NET
Cornia et al., A Deep Multi-Level Network for Saliency Prediction https://arxiv.org/abs/1609.01064

DeepFix
Weights initialized from
VGG16 trained on ImageNet
Dilated convolutions
Location biased convolutions
Inception layers
Kruthiventi et al. DeepFix: A Fully Convolutional
Neural Network for predicting Human Eye Fixations
https://arxiv.org/abs/1510.02927

Applications of visual attention
Intelligent image cropping
Image retrieval
Improved image classification

Image retrieval: query by example
Given:
● An example query image
that illustrates the user's
information need
● A very large dataset of
images
Task:
● Rank all images in the
dataset according to how
likely they are to fulfil the
user's information need
24

Retrieval benchmarks
Oxford Buildings
2007
Paris Buildings
2008
TRECVID INS
2014

Bags of convolutional features instance search
Objective: rank images according to relevance to
query image
Local CNN features and BoW
● Pretrained VGG-16 network
● Features from conv-5
● L2-norm, PCA, L2-norm
● K-means clustering -> BoW
● Cosine similarity
● Query augmentation, spatial reranking
Scalable, fast, high-performance on Oxford 5K,
Paris 6K and TRECVid INS
BoW Descriptor
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653

Bags of convolutional features instance search
BoW Descriptor
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653

Using saliency to improve retrieval
CNN
CNN
Saliency
Semantic
features
Importance
weighting
Weighted
features
Pooling (e.g.
BoW)
Image descriptors

Saliency weighted retrieval
Oxford Paris INSTRE
Global Local Global Local Global Local
No weighting 0.614 0.680 0.621 0.720 0.304 0.472
Center prior 0.656 0.702 0.691 0.758 0.407 0.546
Saliency 0.680 0.717 0.716 0.770 0.514 0.617
QE saliency - 0.784 - 0.834 0.719
Mean Average Precision

12.4%
Using saliency to improve image classification
Conv 1
Conv 3
Conv 4
Conv 5
FC 1
FC 1
FC 3 - Output
Drop Out
Drop Out
Batch Norm.
Max-Pooling
Max-Pooling
RGB
Saliency
Conv 1
Batch Norm.
Max-Pooling
Figure credit: Eric Arazo

Why does it improve classification accuracy?
Acoustic guitar +25 %
Volleyball +23 %

Weitere ähnliche Inhalte

Was ist angesagt?

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu

“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...Edge AI and Vision Alliance

Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD

Neural Radiance Fields & Neural Rendering.pdfNavneetPaul2

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...harmonylab

Introduction to object detectionBrodmann17

Squeeze and excitation networks준영 박

Semantic Segmentation ReviewTakeshi Otsuka

You only look onceGin Kyeng Lee

Survey of Attention mechanism & Use in Computer VisionSwatiNarkhede1

Deep learning for object detectionWenjing Chen

YoloSourav Garai

EfficientNetChangjin Lee

Representational Continuity for Unsupervised Continual LearningMLAI2

Pose estimation from RGB images by deep learningYu Huang

An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo

“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAILviv Startup Club

Was ist angesagt? (20)

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...

Depth Fusion from RGB and Depth Sensors by Deep Learning

Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)

Deep Learning for Structure-from-Motion (SfM)

Neural Radiance Fields & Neural Rendering.pdf

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Est...

Introduction to object detection

Squeeze and excitation networks

Semantic Segmentation Review

You only look once

Survey of Attention mechanism & Use in Computer Vision

Deep learning for object detection

Yolo

EfficientNet

Representational Continuity for Unsupervised Continual Learning

Pose estimation from RGB images by deep learning

An Empirical Study of Training Self-Supervised Vision Transformers.pptx

“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...

Yurii Pashchenko: Zero-shot learning capabilities of CLIP model from OpenAI

Ähnlich wie Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

Learning where to look: focus and attention in deep visionUniversitat Politècnica de Catalunya

Scene recognition using Convolutional Neural NetworkDhirajGidde

Artificial Intelligence for Vision: A walkthrough of recent breakthroughsNikolas Markou

Principles of Data VisualizationEamonn Maguire

Introduction talk to Computer Vision Chen Sagiv

dwdwdmokamojah

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Object Detetcion using SSD-MobileNetIRJET Journal

Image Captioning Generator using Deep Machine Learningijtsrd

Remote Sensing Image Scene ClassificationGaurav Singh

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya

Pratik ibm-open power-pptVaibhav R

AaSeminar_Template.pptxManojGowdaKb

ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...IRJET Journal

Google | Infinite Nature Zero WhitepaperAlejandro Franceschi

Satellite Image Classification with Deep Learning Surveyijtsrd

A Literature Survey on Image Linguistic Visual Question AnsweringIRJET Journal

PointNetPetteriTeikariPhD

ObjectDetection.pptxRitikPabbaraju2

Ähnlich wie Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017 (20)

最近の研究情勢についていくために - Deep Learningを中心に -

Learning where to look: focus and attention in deep vision

Scene recognition using Convolutional Neural Network

Artificial Intelligence for Vision: A walkthrough of recent breakthroughs

Principles of Data Visualization

Introduction talk to Computer Vision

dwdwd

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Object Detetcion using SSD-MobileNet

Image Captioning Generator using Deep Machine Learning

Remote Sensing Image Scene Classification

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

Pratik ibm-open power-ppt

AaSeminar_Template.pptx

ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...

Google | Infinite Nature Zero Whitepaper

Satellite Image Classification with Deep Learning Survey

A Literature Survey on Image Linguistic Visual Question Answering

PointNet

ObjectDetection.pptx

Mehr von Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Deep Generative Learning for AllUniversitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya

The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya

Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya

Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya

Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya

Intepretability / Explainable AI for Deep Neural NetworksUniversitat Politècnica de Catalunya

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya

Curriculum Learning for Recurrent Video Object SegmentationUniversitat Politècnica de Catalunya

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Universitat Politècnica de Catalunya

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

The Transformer - Xavier Giró - UPC Barcelona 2021

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Open challenges in sign language translation and production

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Learn2Sign : Sign language recognition and translation using human keypoint e...

Intepretability / Explainable AI for Deep Neural Networks

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Curriculum Learning for Recurrent Video Object Segmentation

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Kürzlich hochgeladen

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Week-01-2.ppt BBB human Computer interactionfulawalesam

Halmar dropshipping via API with DroFxolyaivanovalion

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Carero dropshipping via API with DroFx.pptxolyaivanovalion

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Introduction-to-Machine-Learning (1).pptxfirstjob4

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Industrialised data - the key to AI success.pdfLars Albertsson

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

Kürzlich hochgeladen (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf

BigBuy dropshipping via API with DroFx.pptx

Week-01-2.ppt BBB human Computer interaction

Halmar dropshipping via API with DroFx

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Sampling (random) method and Non random.ppt

CebaBaby dropshipping via API with DroFX.pptx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Carero dropshipping via API with DroFx.pptx

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Brighton SEO | April 2024 | Data Storytelling

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Introduction-to-Machine-Learning (1).pptx

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Edukaciniai dropshipping via API with DroFx

Industrialised data - the key to AI success.pdf

VidaXL dropshipping via API with DroFx.pptx

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

1. [course site] #DLUPC Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Visual saliency Day 4 Lecture 3

2. The importance of visual attention

4. The importance of visual attention

5. The importance of visual attention

7. The importance of visual attention

8. Why don’t we see the changes? We don’t really see the whole image We only focus on small specific regions: the salient parts Human beings reliably attend to the same regions of images when shown

9. What we perceive

10. Where we look

11. What we actually see

12. Can we predict where humans will look? Yes! Computational models of visual saliency Why might this be useful?

13. SalNet: deep visual saliency model Predict map of visual attention from image pixels (find the parts of the image that stand out) ● Feedforward 8 layer “fully convolutional” architecture ● Transfer learning in bottom 3 layers from pretrained VGG-M model on ImageNet ● Trained on SALICON dataset (simulated crowdsourced attention dataset using mouse and artificial foveation) ● MIT 300 saliency benchmark http://saliency.mit.edu/results_mit300.html Predicted Ground truth Pan, McGuinness, et al. Shallow and Deep Convolutional Networks for Saliency Prediction, CVPR 2016 http://arxiv.org/abs/1603.00845

14. ImageGroundtruthPrediction

15. ImageGroundtruthPrediction

16. SalGAN Adversarial loss Data loss Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 16

17. SalNet and SalGAN benchmarks

18. SALICON Huang et al., SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks, ICCV 2015

19. ML-NET Cornia et al., A Deep Multi-Level Network for Saliency Prediction https://arxiv.org/abs/1609.01064

20. DeepFix Weights initialized from VGG16 trained on ImageNet Dilated convolutions Location biased convolutions Inception layers Kruthiventi et al. DeepFix: A Fully Convolutional Neural Network for predicting Human Eye Fixations https://arxiv.org/abs/1510.02927

21. Applications of visual attention Intelligent image cropping Image retrieval Improved image classification

22. Intelligent image cropping

23.

24. Image retrieval: query by example Given: ● An example query image that illustrates the user's information need ● A very large dataset of images Task: ● Rank all images in the dataset according to how likely they are to fulfil the user's information need 24

25. Retrieval benchmarks Oxford Buildings 2007 Paris Buildings 2008 TRECVID INS 2014

26. Bags of convolutional features instance search Objective: rank images according to relevance to query image Local CNN features and BoW ● Pretrained VGG-16 network ● Features from conv-5 ● L2-norm, PCA, L2-norm ● K-means clustering -> BoW ● Cosine similarity ● Query augmentation, spatial reranking Scalable, fast, high-performance on Oxford 5K, Paris 6K and TRECVid INS BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653

27. Bags of convolutional features instance search BoW Descriptor Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search, ICMR 2016 http://arxiv.org/abs/1604.04653

28. Using saliency to improve retrieval CNN CNN Saliency Semantic features Importance weighting Weighted features Pooling (e.g. BoW) Image descriptors

29. Saliency weighted retrieval Oxford Paris INSTRE Global Local Global Local Global Local No weighting 0.614 0.680 0.621 0.720 0.304 0.472 Center prior 0.656 0.702 0.691 0.758 0.407 0.546 Saliency 0.680 0.717 0.716 0.770 0.514 0.617 QE saliency - 0.784 - 0.834 0.719 Mean Average Precision

30. 12.4% Using saliency to improve image classification Conv 1 Conv 3 Conv 4 Conv 5 FC 1 FC 1 FC 3 - Output Drop Out Drop Out Batch Norm. Max-Pooling Max-Pooling RGB Saliency Conv 1 Batch Norm. Max-Pooling Figure credit: Eric Arazo

31. Why does it improve classification accuracy? Acoustic guitar +25 % Volleyball +23 %

32. Questions?

Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017

Ähnlich wie Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017 (20)

Mehr von Universitat Politècnica de Catalunya

Mehr von Universitat Politècnica de Catalunya (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep Visual Saliency - Kevin McGuinness - UPC Barcelona 2017