Machine learning and multimedia information retrieval
1. Machine Learning and Multimedia
Information Retrieval*
Integrated Knowledge Solutions
iksinc@yahoo.com
* Based on a talk at ICMLA Conference
2. Outline
• Introduction
• Bridging the Semantic Gap
• Events in Videos
• Use of Tagging in MIR
• Killer Apps of MIR
• Take Home Message
12/12/2010 ICMLA Talk 2
3. Too Much Information
Which is more frustrating?
Being stuck in traffic on way to or from work
Not being able to find information you
urgently need
According to a survey by
Xerox
12/12/2010 ICMLA Talk 3
4. Not a New Problem
Nalanda University was one of the first universities
in the world, founded in the 5th Century BC, and
reported to have been visited by the Buddha during
his lifetime. At its peak, in the 7th century AD,
Nalanda held some 10,000 students when it was
visited by the Chinese scholar Xuanzang.
The Royal Library of Alexandria, in Egypt, seems to have been
the largest and most significant great library of the ancient
world. It functioned as a major center of scholarship from its
construction in the third century B.C. until the Roman
conquest of Egypt in 48 B.C.
12/12/2010 ICMLA Talk 4
7. Some Relevant Numbers
Photobucket has 6.2 billion photos and Flickr
has over 2 billion.
Facebook has over 10 Billion photos and over
400 million active users.
12/12/2010 ICMLA Talk 7
8. Phenomenon
• 24 hours of videos are
uploaded to YouTube
every one minute
• YouTube streams 2
billions of videos every
day
12/12/2010 ICMLA Talk 8
9. So how do we get help in finding the desired
multimedia information?
MIR
12/12/2010 ICMLA Talk 9
10. So What is MIR?
• Also known as CBIR (Content-based Image Retrieval) and
CBVIR (Content-based Visual Information Retrieval)
• Deals with systems that manage and facilitate searching
for multimedia documents such as images, videos, audio
clips and slides etc based on content
12/12/2010 ICMLA Talk 10
11. History of MIR
• Conference on Database Applications of Pictorial Applications,
1979 (Florence, Italy)
• NSF Workshop on Visual Information Management Systems,
1992 (Redwood, CA)
• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage
and Retrieval for Image and Video Databases), Also First ACM
Multimedia Conference
• Shift to semantic similarity from signal similarity, 1999
• Community tagging, photo and video sharing sites, 2002
12/12/2010 ICMLA Talk 11
12. A Typical MIR System
Relevance
Feedback
Query Feature
Extraction
Indexing & Retrieved
Matching Results
Media Feature Features
Collection Extraction
12/12/2010 ICMLA Talk 12
13. Semantic Gap
Early systems produced results
wherein the retrieved
documents were visually similar
(signal level similar) but not
necessarily similar in showing
the same semantic concept.
http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/
Content-Based Image Retrieval at the End of the Early Years
Found in: IEEE Transactions on Pattern Analysis and Machine
Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini ,
Amarnath Gupta , Ramesh Jain , December 2000
12/12/2010 ICMLA Talk 13
14. Semantic Gap
Users also like to query using descriptive
words rather than query images or other
multimedia objects. This requires MIR
systems to correlate low-level features
with high level concepts.
Visually dissimilar
images representing
the same concept.
12/12/2010 ICMLA Talk 14
15. How to Bridge the Semantic Gap?
Exploit context
• Text surrounding images
• Associated sound track and
closed captions in videos
• Query history
Use machine learning to:
• Build image category classifiers to
perform semantic filtering of the
results
• Build specific detectors for objects
to associate concepts with images
•Build object models using low level
features
12/12/2010 ICMLA Talk 15
16. Exploiting Context: An Example
Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”
Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001
12/12/2010 ICMLA Talk 16
22. Issues Unique to ML for MIR
• Simultaneous presence of
multiple concepts
• How to extract/isolate
concept-specific features?
Segment or do not
segment?
• Imbalance between Romance, couple, beach, sundown
From: s163.photobucket.com
positive and negative
examples
• Extremely large number
of concepts for a general
purpose MIR
12/12/2010 ICMLA Talk 22
23. A Template Relating Concepts with Pictures
Concepts Image Tokens Images
12/12/2010 ICMLA Talk 23
24. Feature Extraction Issues
Whole image based features.
Easy to use but not very
effective
Region based features. Both
regular region structure and
segmented regions are popular
Salient objects based features.
Connected regions
corresponding to dominant
visual properties of objects in an
image
12/12/2010 ICMLA Talk 24
25. Scale Invariant Feature Transform
(SIFT) Descriptors
SIFT descriptors or its variants are
currently the most popular features
in use. Each image generates
thousands of features (key point
descriptors) with each feature
typically consisting of 128 values
http://www.vlfeat.org/
D. G. Lowe, “Distinctive image
features from scale-invariant
keypoints,” IJCV, 2004.
12/12/2010 ICMLA Talk 25
26. Feature Discovery
Basic idea is to discover
features that are best
suitable for a given
collection
Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on
Automated Shape Feature Discovery,” ISMSE 2004
12/12/2010 ICMLA Talk 26
27. Image Category Classifiers (ICC)
• Trained using both supervised and
unsupervised learning methods (SVM,
DT, AdaBoost, VQ etc)
• Early work limited to few tens of
categories; however some of the current
systems can work with thousands of
categories/concepts
12/12/2010 ICMLA Talk 27
28. VQ Based Image Category Classifier
Fire Codebook
Test Best
Image Codebook
Sky Codebook Label
Mustafa & Sethi (2004)
Water Codebook
12/12/2010 ICMLA Talk 28
30. Project
Web-based annotation tool to segment and label image
regions. Labeled objects in images are used as training images
to build object detectors.
12/12/2010 http://labelme.csail.mit.edu/
ICMLA Talk 30
31. Image Category Classifiers Examples
IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings,
activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM
alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos
per day. Several demos of IMARS are available (see IMARS demos)
12/12/2010 ICMLA Talk 31
32. Image Classification via Probabilistic
Modeling
Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic
concept and learns a probabilistic model for each concept. (b) The system represents
each image by a vector of posterior concept probabilities.
From Pixels to Semantic Spaces: Advances in Content-Based Image
Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)
12/12/2010 ICMLA Talk 32
33. Retrieving Events in Videos
• An event in MIR implies an interesting
spatiotemporal instance
• Considerable work in MIR community on
events because of popularity of sports videos
• Also tremendous interest in detecting and
recognizing events with potential homeland
security applications
12/12/2010 ICMLA Talk 33
37. Retrieval By Cross-Modal Associations
- Using query from one modality (e.g. audio) to
retrieve content on a different modality (e.g. video)
- Directly on low-level features
Approaches:
Latent semantic indexing (LSI)
Li, Dimitrova, Li and Sethi (ACM Cross-modal factor analysis (CFA)
MM 03) Canonical correlation analysis (CCA)
12/12/2010 ICMLA Talk 37
38. Talking Face Example
Feature
Extraction
Query
Retrieval Results
Collection
Cross-Modal
of Image
Sequences Association
.
. Feature
. Extraction
M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME,
2003
12/12/2010 ICMLA Talk 38
39. Tagging in MIR
All time most popular tags at Flickr
12/12/2010 ICMLA Talk 39
40. About Tags
• User centered
• Imprecise and often overly personalized
• Tag distribution follows power law
• Most users use very few distinct tags while a small group of users works
with extremely large set of tags
12/12/2010 ICMLA Talk 40
41. How are Tags Being Used in MIR?
Relating tags in different languages through visual features
Aurnhammer, Hanappe and Steels Proc. WWW2006
12/12/2010 ICMLA Talk 41
42. Tag Suggester
Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)
12/12/2010 ICMLA Talk 42
43. Collaborative Tags
• Also known as Folksonomy, social tagging, and social
classification
• Great for content characterization
• The tag size represents the number of times the tag has
been applied to the same item by different users. It kind of
represents the level of agreement /confidence in a tag.
12/12/2010 ICMLA Talk 43
44. Decision Tree Based Tagger
• Uses social tags in binary/weighted mode
• Generates/suggests multiple tags through a single decision
tree classifier
First, the label vectors associated
with training vectors are
clustered into two initial groups
Next, the SVM is used on training
vectors to yield the split that best
matches the clustering result
An impurity based measure is
used to iteratively adjust the split,
if needed
Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)
12/12/2010 ICMLA Talk 44
47. Current Status of MIR
• Extensive interest as evident from conferences,
journals, and special issues
• Most in the MM community happy with the progress
• Gap between published results and results from
publicly available systems on web.
(http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)
• Lack of application focus
• Plenty of scope for machine learning to help
improve MIR systems performance
• Killer applications are beginning to emerge
12/12/2010 ICMLA Talk 47
48. MIR Application Examples
Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,
Jung-Eun Lee, and Rong Jin)
12/12/2010 ICMLA Talk 48
49. Biological and Medical Data Retrieval
http://www.cs.washington.edu/research/VACE/Multimedia/
12/12/2010 ICMLA Talk 49
55. Take Home Message
• MIR is emerging in the commercial domain.
Lot more activity is expected in near future
• MIR community is obsessed with general
purpose retrieval engine; a folly pursued by
computer vision community for a long time
• ML is playing a vital role in MIR
• Approaches combining social search and
visual search techniques are expected to gain
prominence
12/12/2010 ICMLA Talk 55
56. Acknowledgement
• This presentation is based on the work of
numerous researchers from the MIR/ML/CVPR
community. I have tried to give
credit/references wherever possible. Any
omission is unintentional and I apologize for
that.
• Also want to thank my present and past
students and collaborators.
12/12/2010 ICMLA Talk 56