Visual Information Retrieval: Advances, Challenges and Opportunities

Visual Information Retrieval:
Advances, Challenges and Opportunities
Oge
Marques,
PhD

Professor

College
of
Engineering
and
Computer
Science

Florida
Atlan8c
University
–
Boca
Raton,
FL
(USA)

The Distinguished Speakers Program
is made possible by
For additional information, please visit http://dsp.acm.org/

About ACM
ACM, the Association for Computing Machinery is the world’s largest
educational and scientiﬁc computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the ﬁeld’s
challenges.
ACM strengthens the computing profession’s collective voice through strong
leadership, promotion of the highest standards, and recognition of technical
excellence.
ACM supports the professional growth of its members by providing
opportunities for life-long learning, career development, and professional
networking.

With over 100,000 members from over 100 countries, ACM works to advance
computing as a science and a profession. www.acm.org

Image and video everywhere!
•  1.4-1.7 billion people have a smartphone with
camera
•  350 million photos uploaded to Facebook every
day
•  Instagram (b. Oct' 2010) has
300 million users
•  Snapchat, Instagram, Facebook
and WhatsApp users
(combined) share 1.8 billion
photos each day.
Oge
Marques

The big mismatch
Oge
Marques

Example: your digital shoebox
•  Does this
screenshot look
familiar to you?
Oge
Marques

Example: the Web
•  I wanted a picture (e.g., a photo or clipart) to
illustrate my presentation, report or handout and
can't ﬁnd it!
Oge
Marques

Source: Flickr (https://www.ﬂickr.com/photos/83633410@N07/7658225516)

Working deﬁnition
Visual Information Retrieval (VIR)
techniques aim at solving
the problem of
Finding relevant
(documents containing)
images and videos
based on incomplete input/query,
which can be visual, text, or both.
Oge
Marques

Possible solutions
1. Text-based (also known as tag-based) search
•  Examples:
– Google image search (https://images.google.com/)
– Bing image search (https://www.bing.com/?scope=images)
– Yahoo image search (https://images.search.yahoo.com/)
– Creative Commons portal (http://search.creativecommons.org/)
– Many others (Flickr, Shutterstock, PhotoBucket, etc.)
Oge
Marques

Example: text-based web search
Google image search results for “thunderbird
Source: Google Image Search (http://images.google.com/)
Oge
Marques

Google image search results for “blue thunderbird”
Oge
Marques

Bing image search results for “thunderbird”
Source: Bing Image Search (http://bing.com/)
Oge
Marques

Bing image search results for “blue thunderbird”
Oge
Marques

Yahoo image search results for “thunderbird”
Source:Yahoo Image Search (http://images.search.yahoo.com)
Oge
Marques

Yahoo image search results for “blue thunderbird”
Source:Yahoo Image Search (http://images.search.yahoo.com)
Oge
Marques

Possible solutions
2. Content-based search (also known as
reverse image search or search by visual similarity)
•  Examples:
– TinEye (https://www.tineye.com/)
– ImageRaider (https://www.imageraider.com)
– Others
Oge
Marques

Example: content-based web search
Oge
Marques

Query image
Google image
search results

TinEye search results
Source:TinEye (http://www.tineye.com/)
Oge
Marques

ImageRaider search results
Source: ImageRaider (http://www.imageraider.com)
Oge
Marques

Possible solutions
3. Mixed search (text + visual aspects)
•  Examples:
– Bing image search (https://www.bing.com/?scope=images)
Oge
Marques

Example: mixed (text + image) search
Source: Google Image Search (http://images.google.com/) Oge
Marques

Query image
Google image
search results
Query text:
Ferrari

Example: mixed (text + image) search
Oge
Marques

Bing image search results

Question
Are you happy with the results so far?
Oge
Marques

Outline
•  Part I – Fundamental concepts
•  Part II – Challenges
•  Part III – Latest advances
•  Part IV – Research opportunities
Oge
Marques

Part I
Fundamental concepts
(a very brief overview)

Basic framework
Oge
Marques

Source: Lei Zhang andYong Rui. 2013. Image search—from thousands to billions in 20 years.
ACMTrans. Multimedia Comput. Commun.Appl. 9, 1s,Article 36.

Build your ownVIR solution
•  LIRE: Lucene Image Retrieval
–  Java library that provides a
simple way to retrieve images
and photos based on their
color and texture
characteristics.
–  LIRE creates a Lucene index
of image features for content
based image retrieval (CBIR).
–  Provides easy-to-use methods
for searching the index and
browsing results.
–  Open source (available under
the GNU GPL license).
Oge
Marques

LIRE
•  Web demo: http://demo-itec.uni-klu.ac.at/liredemo/
– 1 M index images (the MIRFLICKR-1M data set)
http://press.liacs.nl/mirﬂickr/
– 6 descriptors: CL, CEDD, EH, JCD, PH, SC
– Several other features (check back often)
•  For additional information:
http://www.lire-project.net/
Oge
Marques

Four selected challenges
•  Some challenges faced byVIR designers and
researchers include:
1.  the need to capture and measure similarity among
images;
2.  the semantic gap (along with other gaps);
3.  the need to take users’ intentions into account
when designingVIR systems; and
4.  the inherent difﬁculty in makingVIR solutions work
effectively in broad domains.
Oge
Marques

The elusive notion of similarity
•  Are these two images similar?
visually similar
Source: Lux, Mathias, and Oge Marques. Visual information retrieval using Java and LIRE. Morgan Claypool Synthesis Lectures
on Information Concepts, Retrieval, and Services 5.1 (2013): 1-112.
Oge
Marques

The elusive notion of similarity
•  Are these two images similar?
semantically related
Oge
Marques

Source: Lux, Mathias, and Oge Marques. Visual information retrieval using Java and LIRE. Morgan Claypool Synthesis Lectures
on Information Concepts, Retrieval, and Services 5.1 (2013): 1-112.

The semantic gap
•  The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.
•  “The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing.This is what we called the
semantic gap.” [Smeulders et al., 2000]
Oge
Marques

Other gaps
Ontology of gaps [1]:
14 types of gaps, in 4 categories:
•  Content-related
•  Feature-related
•  Performance-related
•  Usability-related
________
[1] Deserno,T. M.,Antani, S., Long, R. (2009). Ontology of Gaps in Content-Based
Image Retrieval. Journal of Digital Imaging:The Ofﬁcial Journal of the Society for
Computer Applications in Radiology, 22(2), 202–215. doi:10.1007/s10278-007-9092-x
Oge
Marques

The utility gap
•  Deﬁned as the gap between what people expect
from the MIR systems (results that would help them
in their further actions) and what they actually get
offered as the system output. [2]
•  Major steps towards bridging the utility gap [2]:
–  to revisit the notion of relevance in order to address the full
information need of the user,
–  to look beyond the relevance when designing and assessing
MIR solutions,
–  to steer the retrieval process towards the result that is
maximally helpful for the user.
________
[2] Hanjalic,A. 2013. Multimedia retrieval that matters. ACMTrans. Multimedia Comput.
Commun. Appl. 9, 1s,Article 44 (October 2013)
Oge
Marques

Users’ needs and intentions
•  The intention gap
– Users and developers have quite different views
– Cultural and contextual information should be taken
into account
– User intentions are hard to infer
•  Privacy issues
•  Users themselves don’t always know what they want
•  Who misses the MS Ofﬁce paper clip?
Oge
Marques

Broad domains
Source: Smeulders et al.,“Content-based image retrieval at the end of
the early years”, IEEETransactions on PAMI,Vol 22, Issue 12, Dec 2000 Oge
Marques

Medical image retrieval
Oge
Marques

Source: http://medgift.hevs.ch/silverstripe/

•  What has been achieved
– Moderately successful results in 2D (e.g., chest x-rays)
and 3D (e.g., brain MR slices) image retrieval
– Integration of visual and text-based search and
retrieval techniques
– Experiments with multiple query images (from
different modalities)
– Well-established thesaurus (MeSH) and ontologies
(SNOMED CT)
– Ongoing yearly challenge (ImageCLEF) addressing
many open topics, e.g., image classiﬁcation
Oge
Marques

•  Examples of successful solutions to speciﬁc
problems
•  IRMA (Image Retrieval in Medical Applications) -
Aachen University (Germany)
(http://ganymed.imib.rwth-aachen.de/irma/index_en.php)
•  MedGIFT (GNU Image Finding Tool) - Geneva
University (Switzerland)
(http://medgift.hevs.ch/silverstripe/)
•  WebMIRS - NIH / NLM (USA)
(https://ceb.nlm.nih.gov/proj/webmirs/index.php)
Oge
Marques

•  Ongoing challenges
– Development of effective user interfaces and
visualization techniques
– Feature selection algorithms that take multiple
modalities into account
– Direct use of multidimensional images
– More publicly available standardized datasets for
evaluation
•  Example:VISCERAL
(http://www.visceral.eu/benchmarks/retrieval2-benchmark/)
– Clinical adoption
Oge
Marques

•  Suggested reading:
– A. Kumar et al. (2013). Content-Based Medical Image
Retrieval:A Survey of Applications to Multidimensional
and Multimodality Data. Journal of Digital Imaging, 26(6),
1025–1039.
– Hwang, K. H., Lee, H., Choi, D. (2012). Medical Image
Retrieval: Past and Present. Healthcare Informatics
Research, 18(1), 3–9.
Oge
Marques

Mobile visual search (MVS)
Oge
Marques

Source: Girod et al. IEEE Multimedia 2011
IEEE SIGNAL PROCESSING MAGAZINE [62] JULY 2011
ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
retrieval use an approach that is referred to as bag of features
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from
text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
database, the document itself can be likewise represented by a
Finally, a geometric verificatio
most similar matches in the datab
spatial pattern between features of
didate database image to ensure
Example retrieval systems are pres
For mobile visual search, ther
to provide the users with an int
deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se
[FIG1] A snapshot of an outdoor mobile visual search system
being used. The system augments the viewfinder with
information about the objects it recognizes in the image taken
with a camera phone.
Query
Image
Feature
Extraction
[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
images in the database that have m
with the query image. The GV step
feature locations that cannot be pl
in viewing position.

•  What has been achieved
– Substantial research on compact image descriptors
– MPEG-7 Part 13: Compact descriptors for visual
search (CDVS) (2013)
– Standardized datasets, for example:
•  Stanford MobileVisual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)
– Many apps and APIs, such as:
•  CamFind (http://camﬁndapp.com/)
•  Moodstocks (https://moodstocks.com/)
Oge
Marques

•  Ongoing challenges
– Ensure low latency (and interactive queries) under
constraints such as:
•  Network bandwidth
•  Computational power
•  Battery consumption
– Achieve robust visual recognition in spite of low-
resolution cameras, varying lighting conditions, etc.
– Handle broad and narrow domains
– Explore the full potential of using MVS to bridge the
virtual (digital) world and real world
Oge
Marques

•  Suggested reading:
– Girod, B.; Chandrasekhar,V.; Chen, D.M.; Ngai-Man
Cheung; Grzeszczuk, R.; Reznik,Y.;Takacs, G.;Tsai, S.S.;
Vedantham, R., MobileVisual Search, Signal Processing
Magazine, IEEE, vol.28, no.4, pp.61-76, July 2011
– Ling-Yu Duan; Jie Lin; Jie Chen;Tiejun Huang;Wen Gao,
Compact Descriptors forVisual Search, MultiMedia,
IEEE, vol.21, no.3, pp.30-40, July-Sept. 2014
Oge
Marques

Part IV
Research Opportunities

Advice for [young] researchers
•  In this last part, I’ve compiled bits of advice that I
believe might help researchers who are entering
the ﬁeld.
•  They are based on almost 20 years of work inVIR
and related topics.
•  It is my sincere hope that they will be useful and
lead to successful research experiences.
Oge
Marques

•  Advice # 1
– Pick a speciﬁc problem and domain
•  Example (from our ongoing work)
– Veterinary radiology image retrieval
Oge
Marques

!
!
!

•  Advice # 2
– Find (or build) the appropriate dataset
•  Examples:
– ImageNet (http://www.image-net.org/)
– MIRFLICKR (http://press.liacs.nl/mirﬂickr/)
– INRIA Holidays dataset (http://lear.inrialpes.fr/~jegou/data.php )
– UCID: Uncompressed Color Image Database
(http://homepages.lboro.ac.uk/~cogs/datasets/ucid/ucid.html)
– Many others (associated with challenges – see next
slide)
Oge
Marques

•  Advice # 3
–  Participate in challenges related to the problem you are
working on
•  Examples:
–  ImageCLEF (http://www.imageclef.org/2015)
–  LifeCLEF (http://www.imageclef.org/lifeclef/2015)
–  ImageNet Large ScaleVisual Recognition Challenge
(ILSVRC2015) (http://image-net.org/challenges/LSVRC/2015/index)
–  MSR-Bing Image Retrieval Challenge (IRC)
(http://research.microsoft.com/en-us/projects/irc/)
–  MediaEval Benchmarking Initiative for Multimedia
Evaluation (http://www.multimediaeval.org/mediaeval2015)
Oge
Marques

•  Advice # 4
– Be mindful of related areas where you can make a
contribution
•  Example (from our own work):
– While working on the broader topic of Medical Case
Retrieval (MCR), Mario Taschwer and I participated in
the ImageCLEF 2015 compound ﬁgure separation task
(http://www.imageclef.org/2015/medical).
– Our current results are the best reported in the
literature.
Oge
Marques

•  Advice # 5
– Beware of the trap of building a solution is search of a
problem
•  Example:
– For many years, a tough selling point for CBIR was the
reliance on a query-by-example (QBE) paradigm.
•  What if I don't have a good example image to begin with?
– With the popularization of mobile visual search (MVS)
solutions, a use case for this model emerged naturally,
since the example is right in front of the user!
Oge
Marques

•  Advice # 6
– Think about creative ways to leverage the power of
human computation
•  Examples:
– Crowdsourcing campaigns
– Mining social media visual data and associated
metadata (text, URLs, hashtags, etc.)
– Games with a purpose (GWAPs)
Oge
Marques

•  Advice # 7
– Put yourself in the shoes of the user
•  More speciﬁcally:
– Take into account:
•  the context of the search
•  the usefulness of results to the user
– Understand (and model) user’s intentions, preferences
and needs
– Create better interfaces
– Provide a better user experience
Oge
Marques

Final remarks
•  “Visual information retrieval” is an active and vibrant research
area, with many open research challenges and market
opportunities.
•  There is a great need for good solutions to speciﬁc problems.
•  Pick one of the many open problems, challenges, and
opportunities and build a successful solution!
Oge
Marques

Contact information:
omarques@fau.edu

Visual Information Retrieval: Advances, Challenges and Opportunities

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (13)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Visual Information Retrieval: Advances, Challenges and Opportunities

Ähnlich wie Visual Information Retrieval: Advances, Challenges and Opportunities (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Visual Information Retrieval: Advances, Challenges and Opportunities