SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Downloaden Sie, um offline zu lesen
[course site]
#DLUPC
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Image retrieval
Day 4 Lecture 5
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features for retrieval
● Learning representations for retrieval
2
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features for retrieval
● Learning representations for retrieval
3
The problem: query by example
Given:
● An example query image that
illustrates the user's information
need
● A very large dataset of images
Task:
● Rank all images in the dataset
according to how likely they
are to fulfil the user's
information need
4
Challenges
Similarity: How to compare two images for similarity? What does it mean for two
images to be similar?
Speed: How to guarantee sub-second query response times?
Scalability: How to handle millions of images and multiple simultaneous users?
How to ensure image representations are sufficiently compact?
5
Challenges: comparing images
Distance: 0.0 Distance: 0.654 Distance: 0.661
Image representation: 150528 dimensional vector consisting in
the concatenation of the flatten color channels. Final vector is l2
normalized.
6
Challenges: comparing images
Distance: 0.0 Distance: 0.718 Distance: 0.080
Image representation: 256 dimensional vector based on the
histogram of the grayscale pixels. Vector is l2 normalized.
7
Classification
8
Query: This chair
Results from dataset classified as “chair”
Retrieval
9
Query: This chair
Results from dataset ranked by similarity to the query
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features for retrieval
● Learning representations for retrieval
10
The retrieval pipeline
11
Image RepresentationsQuery
Image
Dataset
Image Matching Ranked List
Similarity score Image
.
.
.
0.98
0.97
0.10
0.01
v = (v1
, …, vn
)
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
Euclidean distance
Cosine Similarity
Similarity
Metric .
.
.
The classic SIFT retrieval pipeline
12
v1
= (v11
, …, v1n
)
vk
= (vk1
, …, vkn
)
...
variable number of
feature vectors per image
Bag of Visual
Words
N-Dimensional
feature space
M visual words
(M clusters)
INVERTED FILE
word Image ID
1 1, 12,
2 1, 30, 102
3 10, 12
4 2,3
6 10
...
Large vocabularies (50k-1M)
Very fast!
Typically used with SIFT features
Convolutional neural networks and retrieval
13
Classification Object Detection Segmentation
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off-the-shelf CNN features for retrieval
● Learning representations for retrieval
14
Off-the-shelf CNN representations
15Razavian et al. CNN features off-the-shelf: an astounding baseline for recognition. In CVPR
Babenko et al. Neural codes for image retrieval. ECCV 2014
FC layers as global feature
representation
Off-the-shelf CNN representations
Neural codes for retrieval [1]
● FC7 layer (4096D)
● L2
norm + PCA whitening + L2
norm
● Euclidean distance
● Only better than traditional SIFT approach after fine tuning on similar domain image dataset.
CNN features off-the-shelf: an astounding baseline for recognition [2]
● Extending Babenko’s approach with spatial search
● Several features extracted by image (sliding window approach)
● Really good results but too computationally expensive for practical situations
[1] Babenko et al, Neural codes for image retrieval CVPR 2014
[2] Razavian et al, CNN features off-the-shelf: an astounding baseline for recognition, CVPR 2014
16
Off-the-shelf CNN representations
17
sum/max pool conv features across filters
Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015
Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879.
Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv:1512.04065.
Off-the-shelf CNN representations
18
Descriptors from convolutional layers
Off-the-shelf CNN representations
19
Pooling features on conv layers allow to describe specific parts of an image
Off-the-shelf CNN representations
[1] Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015
[2] Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. ECCV 2016
Apply spatial weighting on the features
before pooling them
Sum/max pooling operation of a conv
layer
[1] weighting based on ‘strength’ of
the local features
[2] weighting based on the
distance to the center of
the image
20
Off-the-shelf CNN representations
21
BoW, VLAD encoding of conv features
Ng et al. Exploiting local features from deep networks for image retrieval. CVPR Workshops 2015
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
Off-the-shelf CNN representations
22
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Descriptors from convolutional layers
Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
Off-the-shelf CNN representations
23
(336x256)
Resolution
conv5_1 from
VGG16
(42x32)
25K centroids 25K-D vector
Off-the-shelf CNN representations
24
Paris Buildings 6k Oxford Buildings 5k
TRECVID Instance Search 2013
(subset of 23k frames)
[7] Kalantidis et al. Cross-dimensional Weighting for Aggregated
Deep Convolutional Features. arXiv:1512.04065.
Mohedano et al. Bags of Local Convolutional Features for Scalable
Instance Search. ICMR 2016
Off-the-shelf CNN representations
CNN representations
● L2
Normalization + PCA whitening + L2
Normalization
● Cosine similarity
● Convolutional features better than fully connected features
● Convolutional features keep spatial information → retrieval + object location
● Convolutional layers allows custom input size.
● If data labels available, fine tuning the network to the image domain improves
CNN representations.
25
CNN as a feature extractor
Diagram from SIFT Meets CNN: A Decade Survey of Instance Retrieval
26
Overview
● What is content based image retrieval?
● The classic SIFT retrieval pipeline
● Using off the shelf CNN features for retrieval
● Learning representations for retrieval
27
Learning representations for retrieval
Siamese network: network to learn a function that maps input
patterns into a target space such that L2
norm in the target
space approximates the semantic distance in the input space.
Applied in:
● Dimensionality reduction [1]
● Face verification [2]
● Learning local image representations [3]
28
[1] Song et al.: Deep metric learning via lifted structured feature embedding. CVPR 2015
[2] Chopra et al. Learning a similarity metric discriminatively, with application to face verification CVPR' 2005
[3] Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
Learning representations for retrieval
Siamese network: network to learn a function that maps input
patterns into a target space such that L2
norm in the target
space approximates the semantic distance in the input space.
29
Image credit: Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR 2014
Margin: m
Hinge loss
Negative pairs: if
nearer than the
margin, pay a linear
penalty
The margin is a
hyperparameter
30
Learning representations for retrieval
Siamese network with triplet loss: loss function minimizes distance between query and positive
and maximizes distance between query and negative
31Schroff et al. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015
CNN
w w
a p n
L2 embedding space
Triplet Loss
CNN CNN
Learning representations for retrieval
32
Deep Image Retrieval: Learning global representations for image
search, Gordo A. et al. Xerox Research Centre, 2016
● R-MAC representation
● Learning descriptors for retrieval using three channels
siamese loss: Ranking objective:
● Learning where to pool within an image: predicting object
locations
● Local features (from predicted ROI) pooled into a more
discriminative space (learned fc)
● Building and cleaning a dataset to generate triplets
Learning representations for retrieval
33
Learning representations for retrieval
34
“Manually” generated dataset to create the training triplets
Dataset: Landmarks dataset:
● 214K images of 672 famous landmark site.
● Dataset processing based on a matching
baseline: SIFT + Hessian-Affine keypoint
detector.
● Important to select the “useful” triplets.
Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
Learning representations for retrieval
35
Comparison between R-MAC from off-the-shelf network and R-MAC retrained for retrieval
Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
Summary
36
● Pre-trained CNN are useful to generate image descriptors for retrieval
● Convolutional layers allow us to encode local information
● Knowing how to rank similarity is the primary task in retrieval
● Designing CNN architectures to learn how to rank
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color CourseFilmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Coursehpduiker
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a surveyHaseeb Hassan
 
[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image ColorizationSatoshi Iizuka
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentationMrsShwetaBanait1
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Gradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementGradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementMasayuki Tanaka
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GANJunho Cho
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Bit plane slicing
Bit plane slicingBit plane slicing
Bit plane slicingAsad Ali
 
Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image RestorationMostafa G. M. Mostafa
 
08 frequency domain filtering DIP
08 frequency domain filtering DIP08 frequency domain filtering DIP
08 frequency domain filtering DIPbabak danyal
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification哲东 郑
 
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Vincent Sitzmann
 

Was ist angesagt? (20)

Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color CourseFilmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
Filmic Tonemapping for Real-time Rendering - Siggraph 2010 Color Course
 
Object tracking a survey
Object tracking a surveyObject tracking a survey
Object tracking a survey
 
[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization[SIGGRAPH 2016] Automatic Image Colorization
[SIGGRAPH 2016] Automatic Image Colorization
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Gradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image EnhancementGradient-Based Low-Light Image Enhancement
Gradient-Based Low-Light Image Enhancement
 
Image Translation with GAN
Image Translation with GANImage Translation with GAN
Image Translation with GAN
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Bit plane slicing
Bit plane slicingBit plane slicing
Bit plane slicing
 
Object tracking final
Object tracking finalObject tracking final
Object tracking final
 
Image Segmentation
 Image Segmentation Image Segmentation
Image Segmentation
 
Digital Image Processing: Image Restoration
Digital Image Processing: Image RestorationDigital Image Processing: Image Restoration
Digital Image Processing: Image Restoration
 
08 frequency domain filtering DIP
08 frequency domain filtering DIP08 frequency domain filtering DIP
08 frequency domain filtering DIP
 
Chap6 image restoration
Chap6 image restorationChap6 image restoration
Chap6 image restoration
 
Global illumination
Global illuminationGlobal illumination
Global illumination
 
Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...
Tutorial on Generalization in Neural Fields, CVPR 2022 Tutorial on Neural Fie...
 

Ähnlich wie Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachUniversitat de Barcelona
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET Journal
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...Seiya Ito
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognitionGeunhee Cho
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptxManojGowdaKb
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learningNAVER Engineering
 

Ähnlich wie Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision) (20)

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
 
Convolutional Features for Instance Search
Convolutional Features for Instance SearchConvolutional Features for Instance Search
Convolutional Features for Instance Search
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Region-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object RetrievalRegion-oriented Convolutional Networks for Object Retrieval
Region-oriented Convolutional Networks for Object Retrieval
 
AaSeminar_Template.pptx
AaSeminar_Template.pptxAaSeminar_Template.pptx
AaSeminar_Template.pptx
 
Visual geometry with deep learning
Visual geometry with deep learningVisual geometry with deep learning
Visual geometry with deep learning
 

Mehr von Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Kürzlich hochgeladen

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

  • 1. [course site] #DLUPC Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Image retrieval Day 4 Lecture 5
  • 2. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 2
  • 3. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 3
  • 4. The problem: query by example Given: ● An example query image that illustrates the user's information need ● A very large dataset of images Task: ● Rank all images in the dataset according to how likely they are to fulfil the user's information need 4
  • 5. Challenges Similarity: How to compare two images for similarity? What does it mean for two images to be similar? Speed: How to guarantee sub-second query response times? Scalability: How to handle millions of images and multiple simultaneous users? How to ensure image representations are sufficiently compact? 5
  • 6. Challenges: comparing images Distance: 0.0 Distance: 0.654 Distance: 0.661 Image representation: 150528 dimensional vector consisting in the concatenation of the flatten color channels. Final vector is l2 normalized. 6
  • 7. Challenges: comparing images Distance: 0.0 Distance: 0.718 Distance: 0.080 Image representation: 256 dimensional vector based on the histogram of the grayscale pixels. Vector is l2 normalized. 7
  • 8. Classification 8 Query: This chair Results from dataset classified as “chair”
  • 9. Retrieval 9 Query: This chair Results from dataset ranked by similarity to the query
  • 10. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 10
  • 11. The retrieval pipeline 11 Image RepresentationsQuery Image Dataset Image Matching Ranked List Similarity score Image . . . 0.98 0.97 0.10 0.01 v = (v1 , …, vn ) v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... Euclidean distance Cosine Similarity Similarity Metric . . .
  • 12. The classic SIFT retrieval pipeline 12 v1 = (v11 , …, v1n ) vk = (vk1 , …, vkn ) ... variable number of feature vectors per image Bag of Visual Words N-Dimensional feature space M visual words (M clusters) INVERTED FILE word Image ID 1 1, 12, 2 1, 30, 102 3 10, 12 4 2,3 6 10 ... Large vocabularies (50k-1M) Very fast! Typically used with SIFT features
  • 13. Convolutional neural networks and retrieval 13 Classification Object Detection Segmentation
  • 14. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off-the-shelf CNN features for retrieval ● Learning representations for retrieval 14
  • 15. Off-the-shelf CNN representations 15Razavian et al. CNN features off-the-shelf: an astounding baseline for recognition. In CVPR Babenko et al. Neural codes for image retrieval. ECCV 2014 FC layers as global feature representation
  • 16. Off-the-shelf CNN representations Neural codes for retrieval [1] ● FC7 layer (4096D) ● L2 norm + PCA whitening + L2 norm ● Euclidean distance ● Only better than traditional SIFT approach after fine tuning on similar domain image dataset. CNN features off-the-shelf: an astounding baseline for recognition [2] ● Extending Babenko’s approach with spatial search ● Several features extracted by image (sliding window approach) ● Really good results but too computationally expensive for practical situations [1] Babenko et al, Neural codes for image retrieval CVPR 2014 [2] Razavian et al, CNN features off-the-shelf: an astounding baseline for recognition, CVPR 2014 16
  • 17. Off-the-shelf CNN representations 17 sum/max pool conv features across filters Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015 Tolias et al. Particular object retrieval with integral max-pooling of CNN activations. arXiv:1511.05879. Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv:1512.04065.
  • 19. Off-the-shelf CNN representations 19 Pooling features on conv layers allow to describe specific parts of an image
  • 20. Off-the-shelf CNN representations [1] Babenko and Lempitsky, Aggregating local deep features for image retrieval. ICCV 2015 [2] Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. ECCV 2016 Apply spatial weighting on the features before pooling them Sum/max pooling operation of a conv layer [1] weighting based on ‘strength’ of the local features [2] weighting based on the distance to the center of the image 20
  • 21. Off-the-shelf CNN representations 21 BoW, VLAD encoding of conv features Ng et al. Exploiting local features from deep networks for image retrieval. CVPR Workshops 2015 Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  • 22. Off-the-shelf CNN representations 22 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector Descriptors from convolutional layers Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  • 23. Off-the-shelf CNN representations 23 (336x256) Resolution conv5_1 from VGG16 (42x32) 25K centroids 25K-D vector
  • 24. Off-the-shelf CNN representations 24 Paris Buildings 6k Oxford Buildings 5k TRECVID Instance Search 2013 (subset of 23k frames) [7] Kalantidis et al. Cross-dimensional Weighting for Aggregated Deep Convolutional Features. arXiv:1512.04065. Mohedano et al. Bags of Local Convolutional Features for Scalable Instance Search. ICMR 2016
  • 25. Off-the-shelf CNN representations CNN representations ● L2 Normalization + PCA whitening + L2 Normalization ● Cosine similarity ● Convolutional features better than fully connected features ● Convolutional features keep spatial information → retrieval + object location ● Convolutional layers allows custom input size. ● If data labels available, fine tuning the network to the image domain improves CNN representations. 25
  • 26. CNN as a feature extractor Diagram from SIFT Meets CNN: A Decade Survey of Instance Retrieval 26
  • 27. Overview ● What is content based image retrieval? ● The classic SIFT retrieval pipeline ● Using off the shelf CNN features for retrieval ● Learning representations for retrieval 27
  • 28. Learning representations for retrieval Siamese network: network to learn a function that maps input patterns into a target space such that L2 norm in the target space approximates the semantic distance in the input space. Applied in: ● Dimensionality reduction [1] ● Face verification [2] ● Learning local image representations [3] 28 [1] Song et al.: Deep metric learning via lifted structured feature embedding. CVPR 2015 [2] Chopra et al. Learning a similarity metric discriminatively, with application to face verification CVPR' 2005 [3] Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR, abs/1412.6537, 2014
  • 29. Learning representations for retrieval Siamese network: network to learn a function that maps input patterns into a target space such that L2 norm in the target space approximates the semantic distance in the input space. 29 Image credit: Simo-Serra et al. Fracking deep convolutional image descriptors. CoRR 2014
  • 30. Margin: m Hinge loss Negative pairs: if nearer than the margin, pay a linear penalty The margin is a hyperparameter 30
  • 31. Learning representations for retrieval Siamese network with triplet loss: loss function minimizes distance between query and positive and maximizes distance between query and negative 31Schroff et al. FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015 CNN w w a p n L2 embedding space Triplet Loss CNN CNN
  • 32. Learning representations for retrieval 32 Deep Image Retrieval: Learning global representations for image search, Gordo A. et al. Xerox Research Centre, 2016 ● R-MAC representation ● Learning descriptors for retrieval using three channels siamese loss: Ranking objective: ● Learning where to pool within an image: predicting object locations ● Local features (from predicted ROI) pooled into a more discriminative space (learned fc) ● Building and cleaning a dataset to generate triplets
  • 34. Learning representations for retrieval 34 “Manually” generated dataset to create the training triplets Dataset: Landmarks dataset: ● 214K images of 672 famous landmark site. ● Dataset processing based on a matching baseline: SIFT + Hessian-Affine keypoint detector. ● Important to select the “useful” triplets. Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
  • 35. Learning representations for retrieval 35 Comparison between R-MAC from off-the-shelf network and R-MAC retrained for retrieval Gordo et al, Deep Image Retrieval: Learning global representations for image search. ECCV 2016
  • 36. Summary 36 ● Pre-trained CNN are useful to generate image descriptors for retrieval ● Convolutional layers allow us to encode local information ● Knowing how to rank similarity is the primary task in retrieval ● Designing CNN architectures to learn how to rank