1. Lei Wang
School of Computing and Information Technology
University of Wollongong, Australia
15-Oct-2016
CBIR in the Era of Deep Learning
-- A Perspective from Feature Representation
2. • Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
4. Introduction
• Text-based image retrieval (TBIR, since late 1970’s)
– Manually associate images with text annotations
– Interpret images with high-level semantics
– Retrieval by matching the associated text annotations
Retrieval result of Google Images for “Airplane”
5. Introduction
• Issus with text-based image retrieval
– Annotation is time consuming and labour intensive
– Only partially describe the visual content
– Human’s perception subjectivity
– Not support query by example
Drouin Post Office, front desks Iron Ore Fashion
6. Introduction
• Content-based image retrieval
– Human annotators are replaced by computers
– Text annotations are replaced by visual features
– Retrieval by comparing the associated visual features
Drouin Post Office, front desks Iron Ore Fashion
7. Introduction
• National Science Foundation (NSF) organised a special
workshop on the topic of visual information
management (Feb 1992, San Jose, CA)
• "It would be impossible to cope with this explosion of image
information, unless the images were organized for retrieval.
The fundamental problem is that images, video, and other
similar data differ from numeric data and text data format,
and hence they require a totally different technique of
organization, indexing and query processing."
8. Introduction
• CBIR categorisation
– No query: Randomly browse similar images
– Query by text (by typing “airplane” or description)
– Query by example
• by using an image, sketch, or graphic of airplane
13. Introduction
• Applications of CBIR
– Archival photo collection management
– Personal album management
– Crime investigation
– Fashion and design
– Education and entertainment
– Localisation and navigation
– Medical Image analysis
– ….
14. Introduction
• CBIR systems
– QBIC, Virage, Photobook, VisualSEEk, MARS, etc.
Source: http://vismod.media.mit.edu/vismod/demos/photobook/Source: http://www.cse.unsw.edu.au/~jas/talks/curveix/notes.html
16. • Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
17. Early days
A new research problem received great interest
CBIR
Application
Semantic gap
Domain
knowledge
User model
Query mode
Visual features
Similarity measure
Interaction
Learning from data
System
Evaluation
18. • Hand-crafted features
– Color, texture, shape, structure, etc.
– Goal: “Invariant and discriminative”
• Similarity or distance measure
– Euclidean distance, Manhattan distance, etc.
– Specific measures designed for specific features
Early days
19. • Relevance feedback
– Bring user into the loop of CBIR to handle “Semantic Gap”
– A key point of “machine Learning” research in CBIR
Early days
20. • Relevance feedback
– Learning from small sample
– Semi-supervised learning
– Transductive learning
– Feature selection, dimensionality reduction
– Kernel based learning
– Manifold learning
– Relation learning
– …
Early days
21. • Achievements
– Researched CBIR from various perspectives
– Identified the key issues and obstacles
– Many initial but insightful observations and attempts
– Machine learning started playing an important role
• To be improved
– Basic, hand-crafted features, limited invariance
– Considerably depend on domain theory
– Small-sized databases for evaluation
22. • Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of the BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
23. • SIFT, HOG, SURF, CENTRIST, filter-based, …
– Invariant to view angle, rotation, scale, illumination, ...
Days of the BoF model
Local Invariant Features
http://www.robots.ox.ac.uk/~vgg/software/
Image courtesy of David Lowe, IJCV04
SIFT (Scale Invariant Feature Transform
24. Days of the BoF model
Local Invariant Features
http://www.robots.ox.ac.uk/~vgg/research/affine/#software/
Image A Image B
25. Days of the BoF model
Local Invariant Features
Source: http://ivt.sourceforge.net/examples.html
Image A Image B
26. Days of the BoF model
Local Invariant Features
Source: http://www.robots.ox.ac.uk/~vgg/share/SearchPractical2012.html
Image A Image B
27. Days of the BoF model
Local Invariant Features
28.
29. Days of the BoF model
Bag-of-features (BoF) model is borrowed from text analysis
30. Days of the BoF model
Interest point detection
or
Dense sampling
The cropped detected regions
Bag-of-feature model is borrowed from text analysis
33. Days of the BoF model
Extract features from all training/test images
x 2 Rd
34. Days of the BoF model
Cluster all features to generated “Visual Words”
Rd
35. Days of the BoF model
Generated “Visual Words”
…
…
…
…
Word 1:
Word 2:
Word 3:
Word 4:
Word k: … … … … … … … … … … … … … … … … … … … … … … … … …
…
36. Days of the BoF model
From an image to a histogram
[ n1 , n2, … , nk ]
The number of
occurrence of 1st “word”
in this image
2 Rk
[ 0 , 1, 0, … , 0 ] 2 Rk
[ 1 , 0, 0, … , 0 ] 2 Rk
[ 0 , 0, 1, … , 0 ] 2 Rk
… … … …
37. Days of the BoF model
Classifying, clustering or retrieving images
Rk
y = w>
x + b
38. Days of the BoF model
A Bag-of-Features Image Analysis System
Image
database
Feature
extraction
Codebook
generation
Feature
coding
Feature
pooling
Classification
Clustering or
Retrieval
39. Days of the BoF model
Local Invariant Features, such as SIFT (Lowe, ICCV99)
Video Google (Sivic, CVPR03); Bag-of-keypoints (Csurka, SLCV@ECCV04)
Vocabulary tree (Nister, CVPR06); Randomized Clustering Forests
(Moosmann, NIPS06); Spatial Pyramid Matching (Lazebnik, CVPR06)
Pyramid Match
Kernel (Grauman,
ICCV05);
Dense sampling
(Jurie, ICCV05);
Compact Codebook
(Winn, ICCV05)
Comparative Study (Zhang, IJCV07);
Coding with Fisher Kernels (Perronnin, CVPR07)
Local Soft-assignment Coding & Mix-order pooling (Liu, ICCV11);
Comparative Study on BoF model (Chatfield, BMVC, 2011);
Locality-constrained Linear Coding for BoF (Wang, CVPR10);
Coding & pooling scheme comparison (Boureau, CVPR10);
Sparse coding for BoF (Yang, CVPR09)
Local Coordinate Coding (Yu, NIPS09)
Kernel Codebook
(van Gemert, ECCV08);
In Defense of Nearest
Neighbor Classifier
(Boiman, CVPR08)
11
10
09
08
07
06
05
03
99
40. Days of the BoF model
Key issues of CBIR with the BoF model
Source: Nister and Stewenius, CVPR06
• How to quickly create a large visual codebook
– hierarchical k-means clustering
– Approximate k-means clustering
41. Days of the BoF model
Key issues of CBIR with the BoF model
• How to incorporate spatial information
– The BoF model ignores the spatial information of
SIFT features
Spatial Pyramid Matching Re-ranking with Spatial verification
42. Days of the BoF model
Key issues of CBIR with the BoF model
Retrieval result before spatial verification
Query:
43. Days of the BoF model
25 points matched under a consistent spatial relationship
Only 4 points matched under a consistent spatial
relationship
• Re-ranking with spatial verification
Key issues of CBIR with the BoF model
44. Days of the BoF model
Retrieval result after spatial verification
Query:
Key issues of CBIR with the BoF model
45. Days of the BoF model
• Large-scale image retrieval
– Memory, time, precision
– Approximate nearest-neighbor search
x1
x2
xd
.
.
.
0100101100…
How?
Key issues of CBIR with the BoF model
46. Days of the BoF model
• Local sensitive hashing (LSH)
– Random projection, data independent, unsupervised,
• Learning compact binary codes
– Preserving sample similarities, data dependent
1
1
1
0
0
0
LSH
Key issues of CBIR with the BoF model
47. Days of the BoF model
Retrieval examples from the “Oxford5K” data set
Source: Philbin et. al, Object retrieval with large vocabularies and fast spatial matching, CVPR07
48. Days of the BoF model (Summary)
• Achievements
– Local invariant features plays a fundamental role
– Visual codebook creation, feature coding, and feature
pooling are extensively studied
– Multiple benchmark data sets are established
– Large-scale image retrieval is also researched
• To be improved
– Feature representation and recognition separate
– Focused more on object level level retrieval but less
on semantic level retrieval
49. • Introduction of CBIR
• Evolution of CBIR
– Early days (before 2000)
– Days of the BoF model (2000 ~ 2012)
– Era of Deep learning (after 2012)
• Conclusion
Outline
Images courtesy of related papers and authors
50. Era of Deep Learning
Visual
• Images
• Videos
Audio
• Speech
• Music
Text
• Natural Language
Planning
…
51. Era of Deep Learning
• Image Recognition
– Faces, objects, poses, scenes, …
• Video content analysis
– Action, activities, events, summarization, …
• Visual information management
– Search, retrieval, indexing, browsing, …
• Potential Outcome: AI
– Computers can see and understand visual
information
– Robotics, self-driving cars, surveillance
– ….
52. Era of Deep Learning
Object detection (Source: Rich feature hierarchies for accurate object detection and
semantic segmentation, CVPR 2014)
Face Recognition (Source: DeepFace: Closing the Gap to Human-Level Performance in Face
Verification, CVPR 2014)
53. Era of Deep Learning
Pose estimation (DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR2014)
Image Segmentation (Source: SegNet: A Deep Convolutional Encoder-Decoder
Architecture for Image Segmentation, IEEE TPAMI 2016)
54. Era of Deep Learning
• Fine-grained image recognition
• Human attribute classification
[Ning Zhang et al.
CVPR 2014]
[Branson et al. arXiv 2014 ]
55. Era of Deep Learning
• Action Recognition
• Large-scale Video Classification
[Karpathy et al. CVPR 2014]
[Simonyan et al. arXiv 2014]
56. Era of Deep Learning
• Invariant and discriminative features
Feature Representation
Feature Extraction Classification “Panda”?
Prior Knowledge,
Experience
Pose Occlusion Multiple
objects
Inter-class
similarity
Image courtesy of M. Ranzato
57. Era of Deep Learning
• From hand-crafted features to automatically learned ones
Rd
Rk
y = w>
x + b
58. Era of Deep Learning
• Directly learn features representations from data.
• Joint learn feature representation and classifier.
Low-level
Features
Mid-level
Features
High-level
Features
Classifier
Deep Learning: train layers of features so that classifier works well.
More abstract representation
“Panda”?
Image courtesy of M. Ranzato
59. Era of Deep Learning
• Deep Learning
– Inspired by the way human brain processes information
– Many layers of non-linear information processing stages
60. Era of Deep Learning
Yes.
• Basic ideas common to past neural networks research
• Standard machine learning strategies still relevant.
No.
Have we been here before?
Computational
Power
Large-scale Data New Algorithms
Deep Learning
61. Era of Deep Learning
Convolutional Neural Networks (CNNs)
• A special multi-stage architecture inspired by visual system
62. Era of Deep Learning
Source: Slide: Girshick
Fukushima 1980
Neocognitron
LeCun et al. 1989-1998
Hand-written digit reading
Rumelhart, Hinton, Williams 1986
“T” versus “C” problem
...
Krizhevksy, Sutskever, Hinton 2012
ImageNet classification breakthrough
“SuperVision” CNN
Convolutional Neural Networks (CNNs)
63. Era of Deep Learning
CNNs: ImageNet Breakthrough
● Krizhevsky et al. win 2012 ImageNet classification with a much bigger ConvNet
○ deeper: 7 stages vs 3 before
○ larger: 60 million parameters vs 1 million before
○ 16.4% error (top-5) vs Next best 26.2% error
● This was made possible by:
○ fast hardware: GPU-optimized code
○ big dataset: 1.2 million images vs thousands before
○ better regularization: dropout et al.
[Krizhevsky et al. NIPS 2012]
Image courtesy of Deng et al.
64. Era of Deep Learning
Learned Features of CNNs
[Matthew D. Zeiler et al. ECCV 2014]
65. Era of Deep Learning
CBIR: From SIFT to CNNs
• Three main approaches
– Directly use pre-trained CNNs models
• to extract feature representations
– Fine-tune pre-trained CNNs models
• with information (pairwise or triplet similarity)
– Bag-of-features model on CNN features
• “Deep SIFT”
66. Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations?
– Which layer?
– How to pool the features in a convolutional layer?
– How to select the features in a convolutional layer?
67. Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations?
– Which layer?
Fully connected layer
Convolutional layer
68. Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations?
– How to pool the features in a convolutional layer?
Depth
Height
Width
x1
x2
.
.
.
xn
How?
69. Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations?
– How to pool the features in a convolutional layer?
Depth
Height
Width
x1
x2
.
.
.
xn
How?
• Sum-pooling
• Max-pooling
• Grid-based max-pooling
• Region-based pooling
• Mixed sum & max pooling
70. Era of Deep Learning
1. Directly use pre-trained CNNs
• How to use the feature representations?
– How to select the features in a convolutional layer?
• Weighting
• Activation
magnitude
• Region
detection
Source: Cao et. al, Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps
71. Era of Deep Learning
2. Fine-tune pre-trained CNNs
• To incorporate extra information from a new
image data set
– Side information (pairwise or triplet similarity)
– Distance metric learning
√
X
72. Era of Deep Learning
2. Fine-tune pre-trained CNNs
Source: MatchNet, CVPR2015
Source: Learning Fine-Grained Image Similarity with Deep
Ranking. CVPR 2014
73. Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
SIFT (Scale Invariant Feature Transform
Source: Multi-scale Orderless Pooling of Deep Convolutional Activation Features, ECCV2014
74. Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
SIFT (Scale Invariant Feature Transform
“Deep SIFT”
Source: Cao et. al, Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps
75. Era of Deep Learning
3. Bag-of-features model on “Deep SIFT”
Codebook
generation
Feature
coding
Feature
pooling
Classification
Clustering or
Retrieval
Or
76. Era of Deep Learning
Image Classification with DCNN (Krizhevsky, NIPS12)
CNN Features off-the-shelf
(Razavian, CVPRW14);
Neural codes (Babenko,
ECCV14)
Deep ranking (Wang, CVPR14)
Multi-scale orderless pooling
(Gong, ECCV14)
Encoding High Dimensional
Local Features (Liu, NIPS14)
Survey: Deep learning for CBIR
(Wan, ACMMM14)
16
15
14
13
12
Deep filter banks (Cimpoi, CVPR15);
Exploiting Local Features from DNN (Ng,
CVPRW15)
SPoC (Babenko, ICCV15);
MatchNet (Han, CVPR15)
R-MAC (Tolias, ICLR16);
CNN IR Learns from BoW (Radenovic,
ECCV16);
CroW (Kalantidis, ECCVW16);
Where to focus (Cao, 2016)
Some papers appeared on Arxiv
77. Summary
• A very limited (and biased) account of CBIR
• CBIR has made significant progress during two
past decades
• The development of feature representation plays
a key role
• Issues to be resolved
– How to transfer the benefit of Deep Learning?
– How to deal with unsupervised learning case?
– How to better handle the semantic gap?
– …