2. “Image search and retrieval” is not a problem,
but rather a collection of related problems that
look like one.
10 years after “the end of the early years”,
research in image search and retrieval still has
many open problems, challenges, and
opportunities.
3. This is a highly interdisciplinary field, but …
Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems
Visual
Machine Computer
Learning Information Vision
Retrieval
Visual data
Human Visual
Data Mining modeling and
Perception
representation
4. There are many things that I believe…
… but cannot prove
6. It’s been 10 years since the “end of the early
years” [Smeulders et al., 2000]
◦ Are the challenges from 2000 still relevant?
◦ Are the directions and guidelines from 2000 still
appropriate?
7. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Driving forces
“[…] content-based image retrieval (CBIR) will continue
to grow in every direction: new audiences, new
purposes, new styles of use, new modes of interaction,
larger data sets, and new methods to solve the
problems.”
8. Yes, we have seen many new audiences, new
purposes, new styles of use, and new modes
of interaction emerge.
Each of these usually requires new methods
to solve the problems that they bring.
However, not too many researchers see them
as a driving force (as they should).
9. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Heritage of computer vision
“An important obstacle to overcome […] is to realize
that image retrieval does not entail solving the general
image understanding problem.”
10. I’m afraid I have bad news…
◦ Computer vision hasn’t made so much progress
during the past 10 years.
◦ Some classical problems
(including image
understanding)
remain unresolved.
◦ Similarly, CBIR from a
pure computer vision
perspective didn’t work
too well either.
11. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Influence on computer vision
“[…] CBIR offers a different look at traditional computer
vision problems: large data sets, no reliance on strong
segmentation, and revitalized interest in color image
processing and invariance.”
12. The adoption of large data sets became standard
practice in computer vision (see Torralba’s work).
No reliance on strong segmentation (still
unresolved) new areas of research, e.g.,
automatic ROI extraction and RBIR.
Color image processing and color descriptors
became incredibly popular, useful, and (to some
degree) effective.
Invariance still a huge problem
◦ But it’s cheaper than ever to have multiple views.
13.
14. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Similarity and learning
“We make a pledge for the importance of human-
based similarity rather than general similarity. Also,
the connection between image semantics, image data,
and query context will have to be made clearer in the
future.”
“[…] in order to bring semantics to the user, learning is
inevitable.”
15. Similarity is a tough problem to crack and
model.
See it for yourself…
18. Is the second or the third image more similar
to the first?
19. Which image fits better to the first two: the
third or the fourth?
20. Is learning really inevitable?
Maybe, maybe not, but it sure comes handy
in some specific cases…
◦ SVM anyone?
21. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Interaction
Better visualization options, more control to the user,
ability to provide feedback […]
22. Significant progress on visualization
interfaces and devices.
Relevance Feedback: still a very tricky
tradeoff (effort vs. perceived benefit), but
more popular than ever (rating, thumbs up/
down, etc.)
23. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Need for databases
“The connection between CBIR and database research
is likely to increase in the future. […] problems like the
definition of suitable query languages, efficient search
in high dimensional feature space, search in the
presence of changing similarity measures are largely
unsolved […]”
24. Very little progress
◦ Image search and retrieval has benefited much
more from document information retrieval than
from database research.
25. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ The problem of evaluation
CBIR could use a reference standard against which new
algorithms could be evaluated (similar to TREC in the
field of text recognition).
“A comprehensive and publicly available collection of
images, sorted by class and retrieval purposes,
together with a protocol to standardize experimental
practices, will be instrumental in the next phase of
CBIR.”
26. Significant progress on benchmarks,
standardized datasets, etc.
◦ ImageCLEF
◦ Pascal VOC Challenge
◦ MSRA dataset
◦ Simplicity dataset
◦ UCID dataset and ground truth (GT)
◦ Accio / SIVAL dataset and GT
◦ Caltech 101, Caltech 256
◦ LabelMe
27. Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:
◦ Semantic gap and other sources
“A critical point in the advancement of CBIR is the
semantic gap, where the meaning of an image is rarely
self-evident. […] One way to resolve the semantic gap
comes from sources outside the image by integrating
other sources of information about the image in the
query.”
28. The semantic gap problem has not been
solved (and maybe will never be…)
What are the alternatives?
1. Treat visual similarity and semantic relatedness
differently
Examples: Alipr, Google similarity search, etc.
2. Improve both (text-based and visual) search
methods independently
3. Trust the user
CFIR, collaborative filtering, crowdsourcing, games.
29. I postulate that image search and retrieval is
not a problem (but, instead, a collection of
related problems that look like one)
There are many potential opportunities for
good solutions to specific problems
One promising avenue: think about image
retrieval as added value (e.g., like.com, SPE,
etc.)
30. Google Similarity Search (VisualRank) [Jing &
Baluja, 2008]
Google Goggles (mobile visual search)
31. Google Goggles understands narrow-domain
search and retrieval
Several other apps for iPhone, iPad, and
Android (e.g., kooaba and Fetch!)
32. The Web 2.0 has brought about:
◦ New data sources
◦ New usage patterns
◦ New understanding about the users, their needs,
habits, preferences
◦ New opportunities
◦ Lots of metadata!
◦ A chance to experience a true paradigm shift
Before: image annotation is tedious, labor-intensive,
expensive
After: image annotation is fun!
33. Games!
◦ Google Image Labeler
◦ Games with a purpose (GWAP):
The ESP Game
Squigl
Matchin
34. New devices and services…
◦ Flickr (b. 2004)
◦ YouTube (b. 2005)
◦ Flip video cameras (b. 2006)
◦ iPhone (b. 2007)
◦ iPad (b. 2010)
35. New opportunities for narrowing the semantic
gap
◦ From bottom up: (semi-)automatic image
annotation
◦ From top down: using (content / context)
ontologies
◦ Combining top-down and bottom-up
New fields of research, including:
◦ Tag recommendation systems
◦ User intentions in image search
37. – I believe (but cannot prove…) that successful
Image Search & Retrieval solutions will:
• combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image
retrieval)
• only be truly successful in narrow domains
• include the user in the loop
– Relevance Feedback (RF)
– Collaborative efforts (tagging, rating, annotating)
• provide friendly, intuitive interfaces
• incorporate results and insights from cognitive
science, particularly human visual attention,
perception, and memory