Machine learning and multimedia information retrieval

Machine Learning and Multimedia
Information Retrieval*
Integrated Knowledge Solutions
iksinc@yahoo.com

* Based on a talk at ICMLA Conference

Outline
• Introduction
• Bridging the Semantic Gap
• Events in Videos
• Use of Tagging in MIR
• Killer Apps of MIR
• Take Home Message

12/12/2010 ICMLA Talk 2

Too Much Information

Which is more frustrating?

Being stuck in traffic on way to or from work

Not being able to find information you
urgently need
According to a survey by
Xerox
12/12/2010 ICMLA Talk 3

Not a New Problem
Nalanda University was one of the first universities
in the world, founded in the 5th Century BC, and
reported to have been visited by the Buddha during
his lifetime. At its peak, in the 7th century AD,
Nalanda held some 10,000 students when it was
visited by the Chinese scholar Xuanzang.

The Royal Library of Alexandria, in Egypt, seems to have been
the largest and most significant great library of the ancient
world. It functioned as a major center of scholarship from its
construction in the third century B.C. until the Roman
conquest of Egypt in 48 B.C.

12/12/2010 ICMLA Talk 4

However, Earlier
Data Producers

Data Consumers

12/12/2010 ICMLA Talk 5

But Now a Days

12/12/2010 ICMLA Talk 6

Some Relevant Numbers
Photobucket has 6.2 billion photos and Flickr
has over 2 billion.

Facebook has over 10 Billion photos and over
400 million active users.

12/12/2010 ICMLA Talk 7

Phenomenon
• 24 hours of videos are
uploaded to YouTube
every one minute
• YouTube streams 2
billions of videos every
day

12/12/2010 ICMLA Talk 8

So how do we get help in finding the desired
multimedia information?

MIR

12/12/2010 ICMLA Talk 9

So What is MIR?

• Also known as CBIR (Content-based Image Retrieval) and
CBVIR (Content-based Visual Information Retrieval)
• Deals with systems that manage and facilitate searching
for multimedia documents such as images, videos, audio
clips and slides etc based on content

12/12/2010 ICMLA Talk 10

History of MIR
• Conference on Database Applications of Pictorial Applications,
1979 (Florence, Italy)
• NSF Workshop on Visual Information Management Systems,
1992 (Redwood, CA)
• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage
and Retrieval for Image and Video Databases), Also First ACM
Multimedia Conference
• Shift to semantic similarity from signal similarity, 1999
• Community tagging, photo and video sharing sites, 2002

12/12/2010 ICMLA Talk 11

A Typical MIR System
Relevance
Feedback

Query Feature
Extraction

Indexing & Retrieved
Matching Results

Media Feature Features
Collection Extraction

12/12/2010 ICMLA Talk 12

Semantic Gap

Early systems produced results
wherein the retrieved
documents were visually similar
(signal level similar) but not
necessarily similar in showing
the same semantic concept.

http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/

Content-Based Image Retrieval at the End of the Early Years
Found in: IEEE Transactions on Pattern Analysis and Machine
Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini ,
Amarnath Gupta , Ramesh Jain , December 2000

12/12/2010 ICMLA Talk 13

Semantic Gap
Users also like to query using descriptive
words rather than query images or other
multimedia objects. This requires MIR
systems to correlate low-level features
with high level concepts.

Visually dissimilar
images representing
the same concept.

12/12/2010 ICMLA Talk 14

How to Bridge the Semantic Gap?
Exploit context
• Text surrounding images
• Associated sound track and
closed captions in videos
• Query history

Use machine learning to:
• Build image category classifiers to
perform semantic filtering of the
results
• Build specific detectors for objects
to associate concepts with images
•Build object models using low level
features

12/12/2010 ICMLA Talk 15

Exploiting Context: An Example

Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”
Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001

12/12/2010 ICMLA Talk 16

Example of Using Surrounding Text

12/12/2010 ICMLA Talk 17

Context via Surrounding Text

12/12/2010 ICMLA Talk 18

Context Via Surrounding Text: One
More Example

12/12/2010 ICMLA Talk 19

Better Context with More Text

12/12/2010 ICMLA Talk 20

Improving Context via More Words per
Query

12/12/2010 ICMLA Talk 21

Issues Unique to ML for MIR
• Simultaneous presence of
multiple concepts
• How to extract/isolate
concept-specific features?
Segment or do not
segment?
• Imbalance between Romance, couple, beach, sundown
From: s163.photobucket.com
positive and negative
examples
• Extremely large number
of concepts for a general
purpose MIR

12/12/2010 ICMLA Talk 22

A Template Relating Concepts with Pictures
Concepts Image Tokens Images

12/12/2010 ICMLA Talk 23

Feature Extraction Issues
Whole image based features.
Easy to use but not very
effective

Region based features. Both
regular region structure and
segmented regions are popular

Salient objects based features.
Connected regions
corresponding to dominant
visual properties of objects in an
image
12/12/2010 ICMLA Talk 24

Scale Invariant Feature Transform
(SIFT) Descriptors
SIFT descriptors or its variants are
currently the most popular features
in use. Each image generates
thousands of features (key point
descriptors) with each feature
typically consisting of 128 values

http://www.vlfeat.org/

D. G. Lowe, “Distinctive image
features from scale-invariant
keypoints,” IJCV, 2004.

12/12/2010 ICMLA Talk 25

Feature Discovery

Basic idea is to discover
features that are best
suitable for a given
collection

Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on
Automated Shape Feature Discovery,” ISMSE 2004

12/12/2010 ICMLA Talk 26

Image Category Classifiers (ICC)
• Trained using both supervised and
unsupervised learning methods (SVM,
DT, AdaBoost, VQ etc)
• Early work limited to few tens of
categories; however some of the current
systems can work with thousands of
categories/concepts

12/12/2010 ICMLA Talk 27

VQ Based Image Category Classifier

Fire Codebook

Test Best
Image Codebook
Sky Codebook Label

Mustafa & Sethi (2004)
Water Codebook

12/12/2010 ICMLA Talk 28

Object Detectors

PASCAL Visual Object Classes Challenge

12/12/2010 ICMLA Talk 29

Project
Web-based annotation tool to segment and label image
regions. Labeled objects in images are used as training images
to build object detectors.

12/12/2010 http://labelme.csail.mit.edu/
ICMLA Talk 30

Image Category Classifiers Examples

IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings,
activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM
alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos
per day. Several demos of IMARS are available (see IMARS demos)
12/12/2010 ICMLA Talk 31

Image Classification via Probabilistic
Modeling

Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic
concept and learns a probabilistic model for each concept. (b) The system represents
each image by a vector of posterior concept probabilities.

From Pixels to Semantic Spaces: Advances in Content-Based Image
Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)

12/12/2010 ICMLA Talk 32

Retrieving Events in Videos
• An event in MIR implies an interesting
spatiotemporal instance
• Considerable work in MIR community on
events because of popularity of sports videos
• Also tremendous interest in detecting and
recognizing events with potential homeland
security applications

12/12/2010 ICMLA Talk 33

Event Retrieval Examples: Supervised
Approach

Mustafa & Sethi AVSS Conference 2005

12/12/2010 ICMLA Talk 34

Unsupervised Learning for Event Retrieval

Mustafa & Sethi, ICTAI 2007
12/12/2010 ICMLA Talk 35

Unsupervised Learning Based Event
Retrieval

Mustafa & Sethi, ICTAI 2007

12/12/2010 ICMLA Talk 36

Retrieval By Cross-Modal Associations

- Using query from one modality (e.g. audio) to
retrieve content on a different modality (e.g. video)
- Directly on low-level features
Approaches:
Latent semantic indexing (LSI)
Li, Dimitrova, Li and Sethi (ACM Cross-modal factor analysis (CFA)
MM 03) Canonical correlation analysis (CCA)
12/12/2010 ICMLA Talk 37

Talking Face Example
Feature
Extraction
Query

Retrieval Results
Collection
Cross-Modal
of Image
Sequences Association

.
. Feature
. Extraction

M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME,
2003

12/12/2010 ICMLA Talk 38

Tagging in MIR

All time most popular tags at Flickr

12/12/2010 ICMLA Talk 39

About Tags
• User centered
• Imprecise and often overly personalized
• Tag distribution follows power law
• Most users use very few distinct tags while a small group of users works
with extremely large set of tags

12/12/2010 ICMLA Talk 40

How are Tags Being Used in MIR?

Relating tags in different languages through visual features

Aurnhammer, Hanappe and Steels Proc. WWW2006

12/12/2010 ICMLA Talk 41

Tag Suggester

Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)

12/12/2010 ICMLA Talk 42

Collaborative Tags
• Also known as Folksonomy, social tagging, and social
classification
• Great for content characterization
• The tag size represents the number of times the tag has
been applied to the same item by different users. It kind of
represents the level of agreement /confidence in a tag.

12/12/2010 ICMLA Talk 43

Decision Tree Based Tagger
• Uses social tags in binary/weighted mode
• Generates/suggests multiple tags through a single decision
tree classifier

First, the label vectors associated
with training vectors are
clustered into two initial groups

Next, the SVM is used on training
vectors to yield the split that best
matches the clustering result

An impurity based measure is
used to iteratively adjust the split,
if needed

Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)

12/12/2010 ICMLA Talk 44

Current Status of MIR
• Extensive interest as evident from conferences,
journals, and special issues
• Most in the MM community happy with the progress
• Gap between published results and results from
publicly available systems on web.
(http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)
• Lack of application focus
• Plenty of scope for machine learning to help
improve MIR systems performance
• Killer applications are beginning to emerge
12/12/2010 ICMLA Talk 47

MIR Application Examples

Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,
Jung-Eun Lee, and Rong Jin)

12/12/2010 ICMLA Talk 48

Biological and Medical Data Retrieval

http://www.cs.washington.edu/research/VACE/Multimedia/

12/12/2010 ICMLA Talk 49

Killer Apps?

12/12/2010 ICMLA Talk 50

http://www.iqengines.com/applications.php

12/12/2010 ICMLA Talk 51

http://www.iqengines.com/applications.php

12/12/2010 ICMLA Talk 52

http://www.thingd.com

Bloomberg Businessweek, Nov29, 2010
12/12/2010 ICMLA Talk 53

Take Home Message
• MIR is emerging in the commercial domain.
Lot more activity is expected in near future
• MIR community is obsessed with general
purpose retrieval engine; a folly pursued by
computer vision community for a long time
• ML is playing a vital role in MIR
• Approaches combining social search and
visual search techniques are expected to gain
prominence
12/12/2010 ICMLA Talk 55

Acknowledgement
• This presentation is based on the work of
numerous researchers from the MIR/ML/CVPR
community. I have tried to give
credit/references wherever possible. Any
omission is unintentional and I apologize for
that.
• Also want to thank my present and past
students and collaborators.

12/12/2010 ICMLA Talk 56

Questions?

12/12/2010 ICMLA Talk 57

Machine learning and multimedia information retrieval

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Machine learning and multimedia information retrieval

Ähnlich wie Machine learning and multimedia information retrieval (20)

Mehr von Si Krishan

Mehr von Si Krishan (8)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Machine learning and multimedia information retrieval