Visual Information Retrieval: Advances, Challenges and Opportunities
1. Visual Information Retrieval:
Advances, Challenges and Opportunities
Oge
Marques,
PhD
Professor
College
of
Engineering
and
Computer
Science
Florida
Atlan8c
University
–
Boca
Raton,
FL
(USA)
2. The Distinguished Speakers Program
is made possible by
For additional information, please visit http://dsp.acm.org/
3. About ACM
ACM, the Association for Computing Machinery is the world’s largest
educational and scientific computing society, uniting educators, researchers and
professionals to inspire dialogue, share resources and address the field’s
challenges.
ACM strengthens the computing profession’s collective voice through strong
leadership, promotion of the highest standards, and recognition of technical
excellence.
ACM supports the professional growth of its members by providing
opportunities for life-long learning, career development, and professional
networking.
With over 100,000 members from over 100 countries, ACM works to advance
computing as a science and a profession. www.acm.org
4. Image and video everywhere!
• 1.4-1.7 billion people have a smartphone with
camera
• 350 million photos uploaded to Facebook every
day
• Instagram (b. Oct' 2010) has
300 million users
• Snapchat, Instagram, Facebook
and WhatsApp users
(combined) share 1.8 billion
photos each day.
Oge
Marques
6. Example: your digital shoebox
• Does this
screenshot look
familiar to you?
Oge
Marques
7. Example: the Web
• I wanted a picture (e.g., a photo or clipart) to
illustrate my presentation, report or handout and
can't find it!
Oge
Marques
Source: Flickr (https://www.flickr.com/photos/83633410@N07/7658225516)
8. Working definition
Visual Information Retrieval (VIR)
techniques aim at solving
the problem of
Finding relevant
(documents containing)
images and videos
based on incomplete input/query,
which can be visual, text, or both.
Oge
Marques
9. Possible solutions
1. Text-based (also known as tag-based) search
• Examples:
– Google image search (https://images.google.com/)
– Bing image search (https://www.bing.com/?scope=images)
– Yahoo image search (https://images.search.yahoo.com/)
– Creative Commons portal (http://search.creativecommons.org/)
– Many others (Flickr, Shutterstock, PhotoBucket, etc.)
Oge
Marques
10. Example: text-based web search
Google image search results for “thunderbird
Source: Google Image Search (http://images.google.com/)
Oge
Marques
11. Example: text-based web search
Google image search results for “blue thunderbird”
Source: Google Image Search (http://images.google.com/)
Oge
Marques
12. Example: text-based web search
Bing image search results for “thunderbird”
Source: Bing Image Search (http://bing.com/)
Oge
Marques
13. Example: text-based web search
Bing image search results for “blue thunderbird”
Source: Bing Image Search (http://bing.com/)
Oge
Marques
14. Example: text-based web search
Yahoo image search results for “thunderbird”
Source:Yahoo Image Search (http://images.search.yahoo.com)
Oge
Marques
15. Example: text-based web search
Yahoo image search results for “blue thunderbird”
Source:Yahoo Image Search (http://images.search.yahoo.com)
Oge
Marques
16. Possible solutions
2. Content-based search (also known as
reverse image search or search by visual similarity)
• Examples:
– Google image search (https://images.google.com/)
– TinEye (https://www.tineye.com/)
– ImageRaider (https://www.imageraider.com)
– Others
Oge
Marques
17. Example: content-based web search
Source: Google Image Search (http://images.google.com/)
Oge
Marques
Query image
Google image
search results
18. Example: content-based web search
TinEye search results
Source:TinEye (http://www.tineye.com/)
Oge
Marques
19. Example: content-based web search
ImageRaider search results
Source: ImageRaider (http://www.imageraider.com)
Oge
Marques
20. Possible solutions
3. Mixed search (text + visual aspects)
• Examples:
– Google image search (https://images.google.com/)
– Bing image search (https://www.bing.com/?scope=images)
Oge
Marques
21. Example: mixed (text + image) search
Source: Google Image Search (http://images.google.com/) Oge
Marques
Query image
Google image
search results
Query text:
Ferrari
26. Basic framework
Oge
Marques
Source: Lei Zhang andYong Rui. 2013. Image search—from thousands to billions in 20 years.
ACMTrans. Multimedia Comput. Commun.Appl. 9, 1s,Article 36.
27. Build your ownVIR solution
• LIRE: Lucene Image Retrieval
– Java library that provides a
simple way to retrieve images
and photos based on their
color and texture
characteristics.
– LIRE creates a Lucene index
of image features for content
based image retrieval (CBIR).
– Provides easy-to-use methods
for searching the index and
browsing results.
– Open source (available under
the GNU GPL license).
Oge
Marques
28. LIRE
• Web demo: http://demo-itec.uni-klu.ac.at/liredemo/
– 1 M index images (the MIRFLICKR-1M data set)
http://press.liacs.nl/mirflickr/
– 6 descriptors: CL, CEDD, EH, JCD, PH, SC
– Several other features (check back often)
• For additional information:
http://www.lire-project.net/
Oge
Marques
30. Four selected challenges
• Some challenges faced byVIR designers and
researchers include:
1. the need to capture and measure similarity among
images;
2. the semantic gap (along with other gaps);
3. the need to take users’ intentions into account
when designingVIR systems; and
4. the inherent difficulty in makingVIR solutions work
effectively in broad domains.
Oge
Marques
31. The elusive notion of similarity
• Are these two images similar?
visually similar
Source: Lux, Mathias, and Oge Marques. Visual information retrieval using Java and LIRE. Morgan Claypool Synthesis Lectures
on Information Concepts, Retrieval, and Services 5.1 (2013): 1-112.
Oge
Marques
32. The elusive notion of similarity
• Are these two images similar?
semantically related
Oge
Marques
Source: Lux, Mathias, and Oge Marques. Visual information retrieval using Java and LIRE. Morgan Claypool Synthesis Lectures
on Information Concepts, Retrieval, and Services 5.1 (2013): 1-112.
33. The semantic gap
• The semantic gap is the lack of coincidence
between the information that one can extract
from the visual data and the interpretation that
the same data have for a user in a given situation.
• “The pivotal point in content-based retrieval is that the user
seeks semantic similarity, but the database can only provide
similarity by data processing.This is what we called the
semantic gap.” [Smeulders et al., 2000]
Oge
Marques
34. Other gaps
Ontology of gaps [1]:
14 types of gaps, in 4 categories:
• Content-related
• Feature-related
• Performance-related
• Usability-related
________
[1] Deserno,T. M.,Antani, S., Long, R. (2009). Ontology of Gaps in Content-Based
Image Retrieval. Journal of Digital Imaging:The Official Journal of the Society for
Computer Applications in Radiology, 22(2), 202–215. doi:10.1007/s10278-007-9092-x
Oge
Marques
35. The utility gap
• Defined as the gap between what people expect
from the MIR systems (results that would help them
in their further actions) and what they actually get
offered as the system output. [2]
• Major steps towards bridging the utility gap [2]:
– to revisit the notion of relevance in order to address the full
information need of the user,
– to look beyond the relevance when designing and assessing
MIR solutions,
– to steer the retrieval process towards the result that is
maximally helpful for the user.
________
[2] Hanjalic,A. 2013. Multimedia retrieval that matters. ACMTrans. Multimedia Comput.
Commun. Appl. 9, 1s,Article 44 (October 2013)
Oge
Marques
36. Users’ needs and intentions
• The intention gap
– Users and developers have quite different views
– Cultural and contextual information should be taken
into account
– User intentions are hard to infer
• Privacy issues
• Users themselves don’t always know what they want
• Who misses the MS Office paper clip?
Oge
Marques
37. Broad domains
Source: Smeulders et al.,“Content-based image retrieval at the end of
the early years”, IEEETransactions on PAMI,Vol 22, Issue 12, Dec 2000 Oge
Marques
40. Medical image retrieval
• What has been achieved
– Moderately successful results in 2D (e.g., chest x-rays)
and 3D (e.g., brain MR slices) image retrieval
– Integration of visual and text-based search and
retrieval techniques
– Experiments with multiple query images (from
different modalities)
– Well-established thesaurus (MeSH) and ontologies
(SNOMED CT)
– Ongoing yearly challenge (ImageCLEF) addressing
many open topics, e.g., image classification
Oge
Marques
41. Medical image retrieval
• Examples of successful solutions to specific
problems
• IRMA (Image Retrieval in Medical Applications) -
Aachen University (Germany)
(http://ganymed.imib.rwth-aachen.de/irma/index_en.php)
• MedGIFT (GNU Image Finding Tool) - Geneva
University (Switzerland)
(http://medgift.hevs.ch/silverstripe/)
• WebMIRS - NIH / NLM (USA)
(https://ceb.nlm.nih.gov/proj/webmirs/index.php)
Oge
Marques
42. Medical image retrieval
• Ongoing challenges
– Development of effective user interfaces and
visualization techniques
– Feature selection algorithms that take multiple
modalities into account
– Direct use of multidimensional images
– More publicly available standardized datasets for
evaluation
• Example:VISCERAL
(http://www.visceral.eu/benchmarks/retrieval2-benchmark/)
– Clinical adoption
Oge
Marques
43. Medical image retrieval
• Suggested reading:
– A. Kumar et al. (2013). Content-Based Medical Image
Retrieval:A Survey of Applications to Multidimensional
and Multimodality Data. Journal of Digital Imaging, 26(6),
1025–1039.
– Hwang, K. H., Lee, H., Choi, D. (2012). Medical Image
Retrieval: Past and Present. Healthcare Informatics
Research, 18(1), 3–9.
Oge
Marques
44. Mobile visual search (MVS)
Oge
Marques
Source: Girod et al. IEEE Multimedia 2011
IEEE SIGNAL PROCESSING MAGAZINE [62] JULY 2011
ROBUST MOBILE IMAGE RECOGNITION
Today, the most successful algorithms for content-based image
retrieval use an approach that is referred to as bag of features
(BoFs) or bag of words (BoWs). The BoW idea is borrowed from
text retrieval. To find a particular text document, such as a Web
page, it is sufficient to use a few well-chosen words. In the
database, the document itself can be likewise represented by a
Finally, a geometric verificatio
most similar matches in the datab
spatial pattern between features of
didate database image to ensure
Example retrieval systems are pres
For mobile visual search, ther
to provide the users with an int
deployed systems typically transm
the server, which might require t
large databases, the inverted file in
memory swapping operations slow
ing stage. Further, the GV step
and thus increases the response t
the retrieval pipeline in the follow
the challenges of mobile visual se
[FIG1] A snapshot of an outdoor mobile visual search system
being used. The system augments the viewfinder with
information about the objects it recognizes in the image taken
with a camera phone.
Query
Image
Feature
Extraction
[FIG2] A Pipeline for image retrieva
from the query image. Feature mat
images in the database that have m
with the query image. The GV step
feature locations that cannot be pl
in viewing position.
45. Mobile visual search (MVS)
• What has been achieved
– Substantial research on compact image descriptors
– MPEG-7 Part 13: Compact descriptors for visual
search (CDVS) (2013)
– Standardized datasets, for example:
• Stanford MobileVisual Search Data Set
(http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)
– Many apps and APIs, such as:
• CamFind (http://camfindapp.com/)
• Moodstocks (https://moodstocks.com/)
Oge
Marques
46. Mobile visual search (MVS)
• Ongoing challenges
– Ensure low latency (and interactive queries) under
constraints such as:
• Network bandwidth
• Computational power
• Battery consumption
– Achieve robust visual recognition in spite of low-
resolution cameras, varying lighting conditions, etc.
– Handle broad and narrow domains
– Explore the full potential of using MVS to bridge the
virtual (digital) world and real world
Oge
Marques
47. Mobile visual search (MVS)
• Suggested reading:
– Girod, B.; Chandrasekhar,V.; Chen, D.M.; Ngai-Man
Cheung; Grzeszczuk, R.; Reznik,Y.;Takacs, G.;Tsai, S.S.;
Vedantham, R., MobileVisual Search, Signal Processing
Magazine, IEEE, vol.28, no.4, pp.61-76, July 2011
– Ling-Yu Duan; Jie Lin; Jie Chen;Tiejun Huang;Wen Gao,
Compact Descriptors forVisual Search, MultiMedia,
IEEE, vol.21, no.3, pp.30-40, July-Sept. 2014
Oge
Marques
49. Advice for [young] researchers
• In this last part, I’ve compiled bits of advice that I
believe might help researchers who are entering
the field.
• They are based on almost 20 years of work inVIR
and related topics.
• It is my sincere hope that they will be useful and
lead to successful research experiences.
Oge
Marques
50. Advice for [young] researchers
• Advice # 1
– Pick a specific problem and domain
• Example (from our ongoing work)
– Veterinary radiology image retrieval
Oge
Marques
!
!
!
51. Advice for [young] researchers
• Advice # 2
– Find (or build) the appropriate dataset
• Examples:
– ImageNet (http://www.image-net.org/)
– MIRFLICKR (http://press.liacs.nl/mirflickr/)
– INRIA Holidays dataset (http://lear.inrialpes.fr/~jegou/data.php )
– UCID: Uncompressed Color Image Database
(http://homepages.lboro.ac.uk/~cogs/datasets/ucid/ucid.html)
– Many others (associated with challenges – see next
slide)
Oge
Marques
52. Advice for [young] researchers
• Advice # 3
– Participate in challenges related to the problem you are
working on
• Examples:
– ImageCLEF (http://www.imageclef.org/2015)
– LifeCLEF (http://www.imageclef.org/lifeclef/2015)
– ImageNet Large ScaleVisual Recognition Challenge
(ILSVRC2015) (http://image-net.org/challenges/LSVRC/2015/index)
– MSR-Bing Image Retrieval Challenge (IRC)
(http://research.microsoft.com/en-us/projects/irc/)
– MediaEval Benchmarking Initiative for Multimedia
Evaluation (http://www.multimediaeval.org/mediaeval2015)
Oge
Marques
53. Advice for [young] researchers
• Advice # 4
– Be mindful of related areas where you can make a
contribution
• Example (from our own work):
– While working on the broader topic of Medical Case
Retrieval (MCR), Mario Taschwer and I participated in
the ImageCLEF 2015 compound figure separation task
(http://www.imageclef.org/2015/medical).
– Our current results are the best reported in the
literature.
Oge
Marques
54. Advice for [young] researchers
• Advice # 5
– Beware of the trap of building a solution is search of a
problem
• Example:
– For many years, a tough selling point for CBIR was the
reliance on a query-by-example (QBE) paradigm.
• What if I don't have a good example image to begin with?
– With the popularization of mobile visual search (MVS)
solutions, a use case for this model emerged naturally,
since the example is right in front of the user!
Oge
Marques
55. Advice for [young] researchers
• Advice # 6
– Think about creative ways to leverage the power of
human computation
• Examples:
– Crowdsourcing campaigns
– Mining social media visual data and associated
metadata (text, URLs, hashtags, etc.)
– Games with a purpose (GWAPs)
Oge
Marques
56. Advice for [young] researchers
• Advice # 7
– Put yourself in the shoes of the user
• More specifically:
– Take into account:
• the context of the search
• the usefulness of results to the user
– Understand (and model) user’s intentions, preferences
and needs
– Create better interfaces
– Provide a better user experience
Oge
Marques
57. Final remarks
• “Visual information retrieval” is an active and vibrant research
area, with many open research challenges and market
opportunities.
• There is a great need for good solutions to specific problems.
• Pick one of the many open problems, challenges, and
opportunities and build a successful solution!
Oge
Marques
Contact information:
omarques@fau.edu