Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)

Oge Marques
Florida Atlantic University
Boca Raton, FL - USA

  “Image search and retrieval” is not a problem,
but rather a collection of related problems that
look like one.

  10 years after “the end of the early years”,
research in image search and retrieval still has
many open problems, challenges, and
opportunities.

  This is a highly interdisciplinary field, but …

Image and (Multimedia)
Information
Video Database
Retrieval
Processing Systems

Visual
Machine Computer
Learning Information Vision
Retrieval

Visual data
Human Visual
Data Mining modeling and
Perception
representation

  There are many things that I believe…

  … but cannot prove

  It’s been 10 years since the “end of the early
years” [Smeulders et al., 2000]

◦  Are the challenges from 2000 still relevant?
◦  Are the directions and guidelines from 2000 still
appropriate?

  Revisiting the ‘Concluding Remarks’ from
[Smeulders et al., 2000]:

◦  Driving forces
  “[…] content-based image retrieval (CBIR) will continue
to grow in every direction: new audiences, new
purposes, new styles of use, new modes of interaction,
larger data sets, and new methods to solve the
problems.”

  Yes, we have seen many new audiences, new
purposes, new styles of use, and new modes
of interaction emerge.

  Each of these usually requires new methods
to solve the problems that they bring.

  However, not too many researchers see them
as a driving force (as they should).


◦  Heritage of computer vision
  “An important obstacle to overcome […] is to realize
that image retrieval does not entail solving the general
image understanding problem.”

  I’m afraid I have bad news…
◦  Computer vision hasn’t made so much progress
during the past 10 years.

◦  Some classical problems  
(including image  
understanding) 
remain unresolved.

◦  Similarly, CBIR from a  
pure computer vision 
perspective didn’t work  
too well either.


◦  Influence on computer vision
  “[…] CBIR offers a different look at traditional computer
vision problems: large data sets, no reliance on strong
segmentation, and revitalized interest in color image
processing and invariance.”

  The adoption of large data sets became standard
practice in computer vision (see Torralba’s work).
  No reliance on strong segmentation (still
unresolved)  new areas of research, e.g.,
automatic ROI extraction and RBIR.
  Color image processing and color descriptors
became incredibly popular, useful, and (to some
degree) effective.
  Invariance still a huge problem
◦  But it’s cheaper than ever to have multiple views.


◦  Similarity and learning
  “We make a pledge for the importance of human-
based similarity rather than general similarity. Also,
the connection between image semantics, image data,
and query context will have to be made clearer in the
future.”
  “[…] in order to bring semantics to the user, learning is
inevitable.”

  Similarity is a tough problem to crack and
model.

  See it for yourself…

  Are these two images similar?

  Is the second or the third image more similar
to the first?

  Which image fits better to the first two: the
third or the fourth?

  Is learning really inevitable?

  Maybe, maybe not, but it sure comes handy
in some specific cases…
◦  SVM anyone?


◦  Interaction
  Better visualization options, more control to the user,
ability to provide feedback […]

  Significant progress on visualization
interfaces and devices.

  Relevance Feedback: still a very tricky
tradeoff (effort vs. perceived benefit), but
more popular than ever (rating, thumbs up/
down, etc.)


◦  Need for databases
  “The connection between CBIR and database research
is likely to increase in the future. […] problems like the
definition of suitable query languages, efficient search
in high dimensional feature space, search in the
presence of changing similarity measures are largely
unsolved […]”

  Very little progress
◦  Image search and retrieval has benefited much
more from document information retrieval than
from database research.


◦  The problem of evaluation
  CBIR could use a reference standard against which new
algorithms could be evaluated (similar to TREC in the
field of text recognition).
  “A comprehensive and publicly available collection of
images, sorted by class and retrieval purposes,
together with a protocol to standardize experimental
practices, will be instrumental in the next phase of
CBIR.”

  Significant progress on benchmarks,
standardized datasets, etc.
◦  ImageCLEF
◦  Pascal VOC Challenge
◦  MSRA dataset
◦  Simplicity dataset
◦  UCID dataset and ground truth (GT)
◦  Accio / SIVAL dataset and GT
◦  Caltech 101, Caltech 256
◦  LabelMe


◦  Semantic gap and other sources
  “A critical point in the advancement of CBIR is the
semantic gap, where the meaning of an image is rarely
self-evident. […] One way to resolve the semantic gap
comes from sources outside the image by integrating
other sources of information about the image in the
query.”

  The semantic gap problem has not been
solved (and maybe will never be…)

  What are the alternatives?
1.  Treat visual similarity and semantic relatedness
differently
  Examples: Alipr, Google similarity search, etc.
2.  Improve both (text-based and visual) search
methods independently
3.  Trust the user
  CFIR, collaborative filtering, crowdsourcing, games.

  I postulate that image search and retrieval is
not a problem (but, instead, a collection of
related problems that look like one)

  There are many potential opportunities for
good solutions to specific problems

  One promising avenue: think about image
retrieval as added value (e.g., like.com, SPE,
etc.)

  Google Similarity Search (VisualRank) [Jing &
Baluja, 2008]

  Google Goggles (mobile visual search)

  Google Goggles understands narrow-domain
search and retrieval

  Several other apps for iPhone, iPad, and
Android (e.g., kooaba and Fetch!)

  The Web 2.0 has brought about:
◦  New data sources
◦  New usage patterns
◦  New understanding about the users, their needs,
habits, preferences
◦  New opportunities
◦  Lots of metadata!

◦  A chance to experience a true paradigm shift
  Before: image annotation is tedious, labor-intensive,
expensive
  After: image annotation is fun!

  Games!
◦  Google Image Labeler
◦  Games with a purpose (GWAP):
  The ESP Game
  Squigl
  Matchin

  New devices and services…

◦  Flickr (b. 2004)
◦  YouTube (b. 2005)
◦  Flip video cameras (b. 2006)
◦  iPhone (b. 2007)
◦  iPad (b. 2010)

  New opportunities for narrowing the semantic
gap
◦  From bottom up: (semi-)automatic image
annotation
◦  From top down: using (content / context)
ontologies
◦  Combining top-down and bottom-up

  New fields of research, including:
◦  Tag recommendation systems
◦  User intentions in image search

  Many opportunities await…

–  I believe (but cannot prove…) that successful
Image Search & Retrieval solutions will:
•  combine content-based image retrieval (CBIR) with
metadata (high-level semantic-based image
retrieval)
•  only be truly successful in narrow domains
•  include the user in the loop
–  Relevance Feedback (RF)
–  Collaborative efforts (tagging, rating, annotating)
•  provide friendly, intuitive interfaces
•  incorporate results and insights from cognitive
science, particularly human visual attention,
perception, and memory

Questions?

omarques@fau.edu

Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)

Ähnlich wie Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010) (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Oge Marques (FAU) - invited talk at WISMA 2010 (Barcelona, May 2010)