This document provides an overview of visual information retrieval and content-based image retrieval. It discusses the motivation for these topics due to the large number of digital images and videos now available. Various low-level image features that can be extracted for content-based image retrieval are described, including color histograms, dominant colors, color distribution, and color correlograms. The benefits and disadvantages of different features are also outlined. The goal is to reduce images to feature vectors for efficient similarity comparison and retrieval.
3. Motivation
http://www.uni-klu.ac.at
● Things get easier in digital age …
Taking pictures & recording videos
Storing thousands of MBs
Publishing content to the web
Entertainment at your fingertips
● Just some figures …
3
4. Digital Imaging Devices
(global) http://www.uni-klu.ac.at
● How many devices exist?
Device # in 2006
digital cameras 400 * 106
camera phones 600 * 106
Source: IDC Study “Expanding Digital Universe” http://www.emc.com/about/destination/digital_universe/
ITEC, Klagenfurt University, Austria 4
5. Number of Digital Photos
(global) http://www.uni-klu.ac.at
● Estimate 2006
> 150 billion photos from cameras
> 100 billion photos from camera phones
● Forecast 2010
> 500 billion photos
+ increased resolution
Source: IDC Study “Expanding Digital Universe” http://www.emc.com/about/destination/digital_universe/
ITEC, Klagenfurt University, Austria 5
6. Digital Imaging Devices
(Germany) http://www.uni-klu.ac.at
Still image cameras sold in Germany (thousands)
9000
8000
analogue
7000
6000 digital
5000
4000
3000
2000
1000
0
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Source: Cewe Factbook, http://www.cewecolor.de
ITEC, Klagenfurt University, Austria 6
7. Photo prints market
(Western Europe) http://www.uni-klu.ac.at
● Photo prints forecast (in billions)
analogue:
- labs
digital:
- labs
- printers
Source: Cewe Factbook, http://www.cewecolor.de
ITEC, Klagenfurt University, Austria 7
8. Motivation
http://www.uni-klu.ac.at
So how do we actually find images when we
need them?
● Using a clever directory structure?
● Using “sophisticated” applications?
8
15. Motivation
http://www.uni-klu.ac.at
Satisfied with the results?
● Actually there are some minor problems.
15
16. Sensory Gap
http://www.uni-klu.ac.at
● Regarding the sensor
● Inability to record the scene
● Example:
Too few colors, pixels
Too low light, too small memory
Too few fps
16
17. What is so special about
Semantic Gap
Mona Lisa’s smile?
http://www.uni-klu.ac.at
● Inability of computers to interpret the
scene
17
18. Semantic Gap
http://www.uni-klu.ac.at
● Limited understanding of computers
● Inability to interpret image content
18
20. What is VIR?
http://www.uni-klu.ac.at
It’s about finding an automated solutions to
the problem of finding and retrieving
visual information (images, videos) from
(large, distributed, unstructured)
repositories in a way that satisfies the
search criteria specified by their users,
relying (primarily) on the visual contents
of the media.
20
21. What is the problem
with VIR? http://www.uni-klu.ac.at
The fundamental difficulty in doing what we
want to do is related to the need to
encode, perceive, convey, and measure
similarity (e.g. between two images)
21
22. Similarity
http://www.uni-klu.ac.at
● Are these two images similar?
taken from [Eidenberger 2004]
22
23. Similarity
http://www.uni-klu.ac.at
● Which of the small images is most similar
to the big one?
23
24. Dimensions of the
Problem: User http://www.uni-klu.ac.at
From [Datta et al. 2008]
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 24
25. Dimensions of the
Problem: System http://www.uni-klu.ac.at
From [Datta et al. 2008]
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 25
26. Research issues …
http://www.uni-klu.ac.at
26
From [Datta et al. 2008]
28. Content Based Image
Retrieval (CBIR) http://www.uni-klu.ac.at
● Text & structured text have already been
discussed
So we leave metadata for today
● Focus on image content
Given by pixels
Within a raster
Each pixel has a color value
28
29. Images
http://www.uni-klu.ac.at
Real world Digitized
29
30. Sampling &
Quantization http://www.uni-klu.ac.at
● Size of a captured image:
# of samples (width*heigth) * # of colors
30
31. Image Features
http://www.uni-klu.ac.at
● Images are too “big” for retrieval
Too many pixels & colors
● We need to extract
The necessary minimum of information
For meaningful similarity assessment
● Reduce the problem to a “lower
dimensional space”
31
36. Other Constraints: It’s
a metric … http://www.uni-klu.ac.at
● For a dissimilarity measure d(i,j)
d(i,i)=0 … no dissimilarity for same image
d(i,j)=d(j,i) … reflexive
d(i,j)+d(j,k)>=d(i,k) … transitive
36
37. Common features
http://www.uni-klu.ac.at
● Color histograms
● Dominant colors
● Color distribution
● Color correlogram
● Tamura features
● Edge histogram
● Local features
● Region based features
(CC) by Pixel Addict, flickr.com/photos/pixel_addict/1083928126/
37
38. Color Histogram
http://www.uni-klu.ac.at
● Count how often which color is used
● Algorithm:
Allocate int array h with dim = # of colors
Visit next pixel -> it has color with index i
Increment h[i]
IF pixels left THEN goto line 2
● Example: 4 colors, 10*10 pixels
histogram: [4, 12, 20, 64]
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 38
39. Color Histogram
http://www.uni-klu.ac.at
● Strategies:
Quantize if too many colors
Normalize histogram (different image sizes)
Weight colors according to use case
Use (part of) color space according to domain
● Distance / Similarity
Assumption: All images have the same colors
L1 or L2 is quite common, JD works even better
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 39
40. Color Histogram
http://www.uni-klu.ac.at
● Benefits
Easy to compute, not depending on pixel order
Matches human perception quite well
Quantization allows to scale size of histogram
Invariant to rotation, translation & reflection
● Disadvantages
Distribution of colors not taken into account
Colors might not represent semantics
Find quantization fitting to domain / perception
Image scaling might be a problem
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 40
43. Dominant Color
http://www.uni-klu.ac.at
● Reduce histogram to dominant colors
e.g. for 64 colors c0-c63:
• image 1: c12 -> 23%, c33 -> 6%, c2 -> 2%
• image 2: c11 -> 43%, c2 -> 12%, c54 -> 10%
● Dissimilarity function in 2 aspects:
Difference in amount (percentage)
Difference between colors (c11 vs. c12)
● Further aspects:
Diversity and distribution
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 43
44. Dominant Color
http://www.uni-klu.ac.at
● Benefits:
Small feature vectors
Easily understandable & intuitive
Invariant to rotation, translation & reflection
● Disadvantages
Similarity of color pairs no trivial problem
Colors might not represent semantics
Find quantization fitting to domain / perception
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 44
45. Color Distribution
http://www.uni-klu.ac.at
● Index dominant color in image segment
e.g. 8*8 = 64 image segments
feature vector has 64 dimensions
• One for each segment
color index is the entry on segment dimension
• e.g. 16 colors [2, 0, 3, 3, 8, 4, ...]
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 45
46. Color Distribution
http://www.uni-klu.ac.at
● Similarity
L1 or L2 are commonly used
● Benefits
Works fine for many scenarios
• clouds in the sky, portrait photos, etc.
Mostly invariant to scaling
● Disadvantages
Colors might not represent semantics
Find quantization fitting to domain / perception
Rotation, translation & reflection are a problem
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 46
47. Color Correlogram
http://www.uni-klu.ac.at
● Histogram on
how often specific colors occur
in the neighbourhood of each other
● Histogram size is (# of colors)^2
For each color an array of neighboring colors
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 47
48. Color Correlogram
http://www.uni-klu.ac.at
● Extraction algorithm
Allocate array h[#colors][#colors] all zero
Visit next pixel p
For each pixel q in neighborhood of p:
• increment h[color(p)][color(q)]
IF pixels left THEN goto line 2
● Algorithm is rather slow
Depends on size of neighborhood
Typically determined by city block distance
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 48
49. Color Correlogram
http://www.uni-klu.ac.at
● Similarity
L1 or L2 are commonly used
● Benefits
Integrates color as well as distribution
Works fine for many scenarios
Mostly invariant to rotation & reflection
● Disadvantages
Find appropriate neighborhood size
Find quantization fitting to domain / perception
Rather slow indexing / extraction
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 49
50. Color Correlogram
http://www.uni-klu.ac.at
● Auto Color Correlogram
Just indexing how often color(p) occurs in
neighborhood of pixel p
Simplifies the histogram to size # of colors
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 50
51. Color Correlogram
http://www.uni-klu.ac.at
● Integrating different pixel features to
correlate
Gradient Magnitude (intensity of change in
the direction of maximum change)
Rank (intensity variation within a
neighborhood of a pixel)
Texturedness (number of pixels exceeding a
certain level in a neighborhood)
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 51
53. Tamura Features
http://www.uni-klu.ac.at
Features describing texture
● Coarseness
● Contrast
● Directionality
● Line-likeness
● Regularity
● Roughness
53
54. Tamura Features
http://www.uni-klu.ac.at
● Coarseness
Size of the texture elements
● Contrast
More or less picture quality
● Directionality
Focusing on the texture not the image
Same angle but different orientation is
considered as same directionality
54
55. Edge Histogram
http://www.uni-klu.ac.at
● Basic texture feature used in MPEG-7
Divides into 64 sub images
Classifies directionality of sub images
Stores directionality values in histogram
● Dissimilarity
L1-like
55
56. Edge & Texture Features
http://www.uni-klu.ac.at
● Benefits
Compact representation
Captures “overall” texture
Mostly invariant to scaling
● Disadvantages
Not very intuitive in all domains
Not invariant to rotation & translation
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 56
57. Local Features
http://www.uni-klu.ac.at
● Index small sub images
instead of global image
e.g. 14x14 or 17x17 pixels
typically 100-1000
selection based on
local variance of
gray values
idea of salience
from Gu et al. 1989: Comparison of Techniques for Measuring Cloud Texture in Remotely Sensed Satellite Meteorological Image Data.
57
58. Local Features
http://www.uni-klu.ac.at
● Features are too big
to reduce size PCA is applied
for instance reduced to 40 dimensions
still 1000*40*#bins
● Local features histograms
Clustering a reasonable number of features
Assigning numbers to clusters
Create a histogram of clusters
58
60. Local Features
http://www.uni-klu.ac.at
● Benefits
Work in general better than global features
Especially good for image classification
Invariant to translation
● Drawbacks
Too big features (without clustering)
Problems with scaling, rotation
60
61. Region Based Features
http://www.uni-klu.ac.at
● Segmentation of the image
roughly correlated to the objects in the image
e.g. based on pixel clustering
● Extraction of features per region
Note constraints of several features
• minimum size
• rectangular area
● Indexing of regions
61
62. Region Based Features
http://www.uni-klu.ac.at
● Benefits
Work better than global features
Invariant to translation
Mostly invariant to rotation & scaling
● Drawbacks
Heavily depends on segmentation
Segmentation is not a trivial problem
62
63. Regions of Interest
http://www.uni-klu.ac.at
● Identify interesting patches in images
● Automatic extraction of ROIs
Top-down, based on a model
Bottom-up, e.g. stimulus-driven
● Applications
Image re-targeting
Image cropping
Src.: Borba, Gamba, Marques and Mayron , “Extraction of salient regions of interest using visual
attention models”, SPIE Conference on Multimedia Content Access: Algorithms and Systems III, 2009
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 63
64. Bottom-Up Visual
Attention http://www.uni-klu.ac.at
● Attention Models
Find most interesting point in visual scene
Direct gaze towards this point
Selective or focal attention or attention for
perception
● Metaphor of a spotlight
Sweeping the scene
Highlighting most important parts
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 64
65. Model of Itti, Koch &
Niebur http://www.uni-klu.ac.at
● Biologically inspired
● Three low level dimensions of an image
Color, orientation and intensity
● Features are extracted in different scales
This results in feature maps
● Normalization -> conspicuity maps
● Normalization & summing -> saliency map
Peaks are salient points
Itti, Koch & Niebur, “A Model of Saliency-based Visual
Attention for Rapid Scene Analysis”, PAMI 1998
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 65
66. Model of Itti, Koch &
Niebur http://www.uni-klu.ac.at
● In iterations
Preserve prominent peaks
Inhibit small peaks
● Number of iterations decides on the
outcome
Src.: Borba, Gamba, Marques and Mayron , “Extraction of salient regions of interest using visual
attention models”, SPIE Conference on Multimedia Content Access: Algorithms and Systems III, 2009
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 66
67. Model of Stentiford
http://www.uni-klu.ac.at
● Suppress areas of repetitive color patterns
● For each pixel:
Compare a number of randomly selected pixels
Based on color in neighbourhood
High value: low number of similar areas
Low value: lots of similar areas
● Result added up to saliency map
F. W. M. Stentiford, “An estimator for visual attention through competitive novelty with application to
image compression,” Proc. Picture Coding Symposium, pp 101-104, Seoul, 24-27 April, 2001.
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 67
68. Model of Stentiford
http://www.uni-klu.ac.at
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 69
70. Intuitive Approach
http://www.uni-klu.ac.at
● Query by Example (QBE)
Extract indexed feature from query image
Compare with each indexed image
Using selected dissimilarity function
Linear search
● Compare to text search
Inverted list
Search time depends on terms
71
71. Indexing Visual
Information http://www.uni-klu.ac.at
● Visual information expressed by “vectors”
Combined with a metric capturing the
semantics of similarity
Inverted list does not work here
An “index of vectors” is needed
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 72
72. Indexing Visual
Information http://www.uni-klu.ac.at
● Vectors describe “points in a space”
Space is n-dimensional
n might be rather big
● Metric describes distance between points
E.g. L1 or L2 …
● Query is also a vector (point)
Searching for points (vectors) near to query
● Idea for index:
Index neighbourhood …
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 73
73. Spatial Indexes
http://www.uni-klu.ac.at
Using equally sized rectangles (Optimal for L1 …)
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 74
74. Spatial Indexes
http://www.uni-klu.ac.at
Using overlapping rectangles …
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 75
75. Spatial Indexes
http://www.uni-klu.ac.at
● Common data structures
R Tree
• R*, R+, ….
• Overlapping rectangles
• Search is a rectangle
Quadtree (Octtree)
• Equally sized regions, subdivided
• 4 quadrants or 8 octants
• Search selects quadrants
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 76
76. Spatial Indexes:
Drawbacks http://www.uni-klu.ac.at
● Data structures must minimize
false negatives (-> maximizes recall)
false positives (-> search time)
● Descriptors, metrics & parameters need to
be selected at index time
Searches combining multiple descriptors are a
complicated issue
● Work best for small n
MDS has to be applied …
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 79
77. Multidimensional
Scaling (MDS) http://www.uni-klu.ac.at
● Reducing the dimensions of a feature space
E.g. From 64 dimensions to 8
Without loosing too much information about
neighbourhoods
● Interpolation: FastMap
Linear in terms of objects
Used e.g. in IBM QBIC
● Iterative: Force Directed Placement
Iterative optimization of initial placement
Cubic runtime
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 80
78. Metric Index
http://www.uni-klu.ac.at
● Hierarchical clustering is applied for
indexing
Representative image for cluster
Search in m<n clusters instead of n images
● Problems
The same as for clustering
• How to get a balanced tree?
• Do clusters represent dissimilarity?
81
79. Hashing
http://www.uni-klu.ac.at
● Finding a hash function, which
Can be applied easily to features
Reflects dissimilarity
• Similar images have roughly the same hash
• Dissimilar images have “distant” hashes
● Example
Locality Sensitive Hashing (LSH)
Works in Euclidean spaces
82
80. Metric Spaces
http://www.uni-klu.ac.at
src. G. Amato & P. Savino, „Approximate
● M = (D,d) Similarity Search in Metric Spaces Using Inverted Files “,
Infoscale 2008
Data domain D
Total (distance) function d: D D R (metric
function or metric)
● The metric space postulates:
x, y D , d ( x, y ) 0
Non negativity
Symmetry x, y D , d ( x, y ) d ( y , x )
Identity x, y D , x y d ( x, y ) 0
Triangle inequality x, y , z D , d ( x, z ) d ( x, y ) d ( y , z )
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 83
81. Similarity Search in
Metric Spaces http://www.uni-klu.ac.at
● Objects close to one another see the space in a
“similar” way
● Choose a set of reference objects RO
● Orderings of RO according to the distances from
two similar data objects are similar as well
Represent every data object o as an ordering of RO
from o
Measure similarity between two data objects by
measuring the similarity between the
corresponding orderings
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 84
82. Similarity Search in
Metric Spaces http://www.uni-klu.ac.at
O1 := <5, 3, 4, 1, 2>
O2 := <1, 5, 3, 5, 2>
O3 := …
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 85
83. Similarity Search in
Metric Spaces http://www.uni-klu.ac.at
● Spearman Footrule Distance
SFD(S1 , S2 ) S2 (ro) S1 (ro)
ro RO
ITEC, Klagenfurt University, Austria – Multimedia Information Systems 86
89. Informedia
http://www.uni-klu.ac.at
● Database for search and browsing
Carnegie Mellon University, H.D. Wactlar
● Content based search in TV and radio
news
● ~ 1500 h video and audio
● Transcription, indexing and segmentation
Speech Recognition,
Image Analysis,
Natural Language Processing
93
92. Retrievr
http://www.uni-klu.ac.at
● Flickr images indexed
Based on some color feature
● Query by sketch interface
Ajax based implementation
96
93. Photosynth
http://www.uni-klu.ac.at
● SIFT to identify salient points
● Reconstruction of 3D model
● Selection through social annotation
97
94. References
http://www.uni-klu.ac.at
[Eidenberger 2004] Eidenberger, H., Introduction: Visual
Information Retrieval, Habilitation, 2004
[Datta et al. 2008] Datta, R., Joshi, D., Li, J., and Wang, J. Z.
2008. Image retrieval: Ideas, influences, and trends of the
new age. ACM Comput. Surv. 40, 2, Article 5 (April 2008)
98
95. Acknowledgements
http://www.uni-klu.ac.at
● Thanks to Oge Marques for kindly offering
his slides!
99
96. Thanks …
http://www.uni-klu.ac.at
… for your attention!
mlux@itec.uni-klu.ac.at
(CC) by prakhar, flickr.com/photos/prakhar/827192423/
100