DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)

•

4 likes•5,741 views

Universitat Politècnica de Catalunya

Slides by Xavier Giró-i-Nieto, from the Computer Vision Reading Group. UPC, ETSETB. (27/10/2015)

Technology

DeepFix: A Fully Convolutional
Neural Network for Predicting
Human Fixations
Srinivas S S Kruthiventi, Kumar Ayush, and R. Venkatesh Babu
(arXiv October 2015) [URL]
Slides by Xavier Giró-i-Nieto, from the Computer Vision Reading Group. (27/10/2015)
https://imatge.upc.edu/web/teaching/computer-vision-reading-group

Introduction
3
Bottom-up attention
Automatic
Reflexive
Stimulus-driven

Introduction
4
Top-down attention
Subjective’s prior
knowledge
Expectations
Task oriented
Memory
Behavioral goals

Introduction
5
Visual Attentional
Mechanisms
Bottom-up
Automatic
Reflexive
Stimulus-driven
Top-down
Subjective’s prior
knowledge
Expectations
Task oriented
Memory
Behavioral goals

Very deep network
11
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for
large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014)
● Inspired by Oxford’s VGG net (19 layers).
● 20 layers
● Small kernel sizes.

Fully convolutional network (FCN)
12
● Fully connected layers at the end
are replaced by convolutional
layers with very large receptive
fields.
● They capture the global context of
the scene.
● End-to-end training
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic
Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 3431-3440)

13
Inception layers
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going
Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (pp. 1-9)
● GoogLeNet
● Different kernel sizes
operating in parallel.

14
Location Biased Convolutional (LBC) layer
● Centre-bias
●

Architecture
16
Small convolutional filters of
3x3 with stride of 1 to allow a
large depth without increasing
the memory requirement

Architecture
17
Max pooling layers (in red)
reduce computation.

Architecture
18
Gradual increase in the
amount of channels to
progressively learn richer
semantic representations: 64,
128, 256, 512...

Architecture
19
Weights initialized from
VGG-16 net for stable and
effective learning

Architecture
20
Convolution kernel 3x3 with
hole size 2 have a
receptive field of 5x5.

Architecture
21
Capture multi-scale
semantic structure using
two inception style
convolutional modules

Architecture
22
Very large receptive fields
of 25x25 by introducing
holes of size 6 in kernels

Architecture
23
Location Biased
Convolutional (LBC) layers

Architecture
24
Location Biased
Convolutional (LBC) layers

Architecture
25
constant during training learnt during training
weights from c’th filter in
a convolutional layer
input blob

Architecture
26
Final output W/8xH/8 is
upsampled.

Training
28
2nd stage
MIT 1003
CAT2000
Mouse clicks from Microsoft CoCo
Not mentioned how to go from eye
fixations to heat mapa !!

Training
29
● End to end (as JuntingNet)
● Caffeframework
● 1 day in K40 GOU!

What's hot

Learning where to look: focus and attention in deep visionUniversitat Politècnica de Catalunya

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

DLD meetup 2017, Efficient Deep LearningBrodmann17

CNN TutorialSungjoon Choi

Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...Universitat Politècnica de Catalunya

Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya

Intepretability / Explainable AI for Deep Neural NetworksUniversitat Politècnica de Catalunya

Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang

[PR12] PR-063: Peephole predicting network performance before trainingTaegyun Jeon

Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU

Convolutional Neural NetworkJunho Cho

Transformer 動向調査 in 画像認識Kazuki Maeno

Deep Learning - Convolutional Neural NetworksChristian Perone

モデルアーキテクチャ観点からの高速化2019Yusuke Uchida

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Universitat Politècnica de Catalunya

Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

What's hot (20)

Learning where to look: focus and attention in deep vision

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

DLD meetup 2017, Efficient Deep Learning

CNN Tutorial

Visualization of Deep Learning Models (D1L6 2017 UPC Deep Learning for Comput...

Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020

Intepretability / Explainable AI for Deep Neural Networks

Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015

[PR12] PR-063: Peephole predicting network performance before training

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

Convolutional Neural Network

Transformer 動向調査 in 画像認識

Deep Learning - Convolutional Neural Networks

モデルアーキテクチャ観点からの高速化2019

Semantic segmentation with Convolutional Neural Network Approaches

Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)

Deep Learning for Computer Vision: Transfer Learning and Domain Adaptation (U...

Deep Learning for Computer Vision: Image Retrieval (UPC 2016)

Faster R-CNN: Towards real-time object detection with region proposal network...

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Viewers also liked

PyData London CNN Lightning TalkEddie Bell

Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield

Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal

The Rolex Learning Center at EPFL: a new building for a new vision in collect...Thomas Guignard

Faces in Places: Compound Query RetrievalUniversitat Politècnica de Catalunya

Hybrid neural networks for time series learning by Tian Guo, EPFL, SwitzerlandEuroIoTa

Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov

Neural Art (English Version)Mark Chang

Applied Deep Learning 11/03 Convolutional Neural NetworksMark Chang

Caffe framework tutorial2Park Chunduck

Case Study of Convolutional Neural NetworkNamHyuk Ahn

AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa

Deep Learning - Convolutional Neural Networks - Architectural ZooChristian Perone

Staying Shallow & Lean in a Deep Learning WorldXavier Amatriain

Deep Learning and Reinforcement LearningRenārs Liepiņš

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)Universitat Politècnica de Catalunya

DIY Deep Learning with Caffe Workshopodsc

Viewers also liked (17)

PyData London CNN Lightning Talk

Devil in the Details: Analysing the Performance of ConvNet Features

Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...

The Rolex Learning Center at EPFL: a new building for a new vision in collect...

Faces in Places: Compound Query Retrieval

Hybrid neural networks for time series learning by Tian Guo, EPFL, Switzerland

Введение в архитектуры нейронных сетей / HighLoad++ 2016

Neural Art (English Version)

Applied Deep Learning 11/03 Convolutional Neural Networks

Caffe framework tutorial2

Case Study of Convolutional Neural Network

AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.

Deep Learning - Convolutional Neural Networks - Architectural Zoo

Staying Shallow & Lean in a Deep Learning World

Deep Learning and Reinforcement Learning

Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)

DIY Deep Learning with Caffe Workshop

Similar to DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)

Recent developments in Deep LearningBrahim HAMADICHAREF

Dp2 ppt by_bikramjit_chowdhury_finalBikramjit Chowdhury

Machine Learning - Convolutional Neural NetworkRichard Kuo

Pres Tesi LM-2016+transcript_engDaniele Ciriello

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019Universitat Politècnica de Catalunya

slides.pdfssuser907a32

Group Communication Techniques in Overlay NetworksKnut-Helge Vik

Reservoir computing fast deep learning for sequencesClaudio Gallicchio

PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak światPROIDEA

Towards better analysis of deep convolutional neural networks曾子芸

The Importance of Time in Visual Attention ModelsUniversitat Politècnica de Catalunya

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_ReportSitakanta Mishra

Part 2: Efficient Multimedia Delivery in Content-Centric Mobile NetworksDr. Mahfuzur Rahman Bosunia

Deep Learning Initiative @ NECSTLabNECST Lab @ Politecnico di Milano

AaSeminar_Template.pptxManojGowdaKb

convolutional_neural_networks.pptxMsKiranSingh

CNNUkjae Jeong

LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...Munisekhar Gunapati

vistech2015Jie Jiang

Architecture Design for Deep Neural Networks IWanjin Yu

Similar to DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group) (20)

Recent developments in Deep Learning

Dp2 ppt by_bikramjit_chowdhury_final

Machine Learning - Convolutional Neural Network

Pres Tesi LM-2016+transcript_eng

Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019

slides.pdf

Group Communication Techniques in Overlay Networks

Reservoir computing fast deep learning for sequences

PLNOG 18 - Dr Marek Michalewicz - InfiniCortex: Superkomputer wielki jak świat

Towards better analysis of deep convolutional neural networks

The Importance of Time in Visual Attention Models

Saptashwa_Mitra_Sitakanta_Mishra_Final_Project_Report

Part 2: Efficient Multimedia Delivery in Content-Centric Mobile Networks

Deep Learning Initiative @ NECSTLab

AaSeminar_Template.pptx

convolutional_neural_networks.pptx

CNN

LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...

vistech2015

Architecture Design for Deep Neural Networks I

Recently uploaded

Manual 508 Accessibility Compliance AuditSkynet Technologies

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Data governance with Unity Catalog PresentationKnoldus Inc.

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Rise of the Machines: Known As Drones...Rick Flair

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

A Framework for Development in the AI AgeCprime

Time Series Foundation Models - current state and future directionsNathaniel Shimoni

From Family Reminiscence to Scholarly Archive .Alan Dix

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

2024 April Patch TuesdayIvanti

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Connecting the Dots for Information Discovery.pdfNeo4j

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Recently uploaded (20)

Manual 508 Accessibility Compliance Audit

Take control of your SAP testing with UiPath Test Suite

Data governance with Unity Catalog Presentation

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

The State of Passkeys with FIDO Alliance.pptx

Rise of the Machines: Known As Drones...

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

Generative Artificial Intelligence: How generative AI works.pdf

Testing tools and AI - ideas what to try with some tool examples

A Framework for Development in the AI Age

Time Series Foundation Models - current state and future directions

From Family Reminiscence to Scholarly Archive .

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

2024 April Patch Tuesday

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

Connecting the Dots for Information Discovery.pdf

DevEX - reference for building teams, processes, and platforms

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...

UiPath Community: Communication Mining from Zero to Hero

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)

1. DeepFix: A Fully Convolutional Neural Network for Predicting Human Fixations Srinivas S S Kruthiventi, Kumar Ayush, and R. Venkatesh Babu (arXiv October 2015) [URL] Slides by Xavier Giró-i-Nieto, from the Computer Vision Reading Group. (27/10/2015) https://imatge.upc.edu/web/teaching/computer-vision-reading-group

2. Introduction 2

3. Introduction 3 Bottom-up attention Automatic Reflexive Stimulus-driven

4. Introduction 4 Top-down attention Subjective’s prior knowledge Expectations Task oriented Memory Behavioral goals

5. Introduction 5 Visual Attentional Mechanisms Bottom-up Automatic Reflexive Stimulus-driven Top-down Subjective’s prior knowledge Expectations Task oriented Memory Behavioral goals

6. Introduction

7. Introduction 7 DeepFixClassic method

8. Introduction 8 mit300 benchmark [URL]

9. Introduction 9 cat200 benchmark [URL]

10. The ingredients 10

11. Very deep network 11 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014) ● Inspired by Oxford’s VGG net (19 layers). ● 20 layers ● Small kernel sizes.

12. Fully convolutional network (FCN) 12 ● Fully connected layers at the end are replaced by convolutional layers with very large receptive fields. ● They capture the global context of the scene. ● End-to-end training Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440)

13. 13 Inception layers Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going Deeper With Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9) ● GoogLeNet ● Different kernel sizes operating in parallel.

14. 14 Location Biased Convolutional (LBC) layer ● Centre-bias ●

15. The network 15

16. Architecture 16 Small convolutional filters of 3x3 with stride of 1 to allow a large depth without increasing the memory requirement

17. Architecture 17 Max pooling layers (in red) reduce computation.

18. Architecture 18 Gradual increase in the amount of channels to progressively learn richer semantic representations: 64, 128, 256, 512...

19. Architecture 19 Weights initialized from VGG-16 net for stable and effective learning

20. Architecture 20 Convolution kernel 3x3 with hole size 2 have a receptive field of 5x5.

21. Architecture 21 Capture multi-scale semantic structure using two inception style convolutional modules

22. Architecture 22 Very large receptive fields of 25x25 by introducing holes of size 6 in kernels

23. Architecture 23 Location Biased Convolutional (LBC) layers

24. Architecture 24 Location Biased Convolutional (LBC) layers

25. Architecture 25 constant during training learnt during training weights from c’th filter in a convolutional layer input blob

26. Architecture 26 Final output W/8xH/8 is upsampled.

27. Experiments 27

28. Training 28 2nd stage MIT 1003 CAT2000 Mouse clicks from Microsoft CoCo Not mentioned how to go from eye fixations to heat mapa !!

29. Training 29 ● End to end (as JuntingNet) ● Caffeframework ● 1 day in K40 GOU!

30. Results 30

31. Results 31

32. Results 32

33. Results 33

34. Results 34

DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)

Similar to DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group) (20)

More from Universitat Politècnica de Catalunya

More from Universitat Politècnica de Catalunya (20)

Recently uploaded

Recently uploaded (20)

DeepFix: a fully convolutional neural network for predicting human fixations (UPC Reading Group)