Multimedia Semantics:Metadata, Analysis and Interaction

Multimedia Semantics:
Metadata, Analysis and Interaction

Raphael Troncy <raphael.troncy@eurecom.fr>
Multimedia Semantics, EURECOM

Some BIG numbers
User Generated Content (Jul'09)
3.7+ billion photos
10+ billion photos
110+ million videos
20 hours uploaded / min ≈ 75 000 full length movies / week

Archived TV content
1.5 million hours ≈ 120 km of shelves
300000 hours | 1 petabyte / year

News content

Content difficult to search and reuse
Barely invisible for the search engines
04/08/2009 - Multimedia Semantics: Metadata, Analysis and Interaction - LACNEM 2009 -2

Image/Video indexing

Techniques used by mainstream search engines
search term occurs in the filename or in the caption or in user tags
no semantics
Image indexing: main problem
an image is not alphabetic: there is no countable discrete units, that, in
combination will provide the meaning of the image
image descriptors are not given with the image: one needs to extract or
interpret them
Video indexing: additional problem
a video has additionally a temporal dimension to take into account
a video has a priori no discrete units neither (i.e. frames, shots, sequences
cannot be absolutely defined)


Why is it so difficult to find
appropriate multimedia content, to
reuse and repurpose content
previously published and to present
this content in interfaces that vary
with user needs?

Sounds Familiar?
[Arnold Smeulders,
PAMI, 2000]
The semantic gap is the
lack of coincidence
between the information
that one can extract from
the sensory data and the
interpretation that the
same data has for a user
in a given situation


a little drop of semantics goes a
long way
Jim Hendler [1997]

Agenda
1. Semantics in multimedia analysis
• Detecting concepts for video indexing
• Evaluating interactive search tasks

2. Semantics in metadata
• Multimedia metadata interoperability
• Expose your data following 4 basic principles
• Re-use a growing amount of publicly open datasets

3. Semantics in user interfaces
• Provide meaningful presentation of underlying data
• Explore large knowledge bases powered by linked data


The science of labeling

Automatically detecting the presence of a
concept in a video stream

airplane

Naming visual information


The Computer Vision Approach

Building detectors one-at-the-time

a face detector for
frontal faces

3 years later

a face detector for
non-frontal faces

One (or more) PhD for
every new concept


So how about these?

04/08/2009 - Multimedia Semantics: Metadata, Analysis and Interaction - LACNEM 2009 - 10

A Simple Concept Detector


K-nearest neighbor


Linear Classification


Support Vector Machine


Supervised Learner


NIST TRECVID Evaluation

Until 2001, everybody defined his own concepts
Using specific and small data sets
Hard to compare methodologies

Since 2001, worldwide evaluation by NIST
Promote progress in video retrieval search
Provide common datasets (shots, ASR, key frames)
Use open, metrics-based evaluation

Large-Scale Concept
Ontology for Multimedia


Success and Criticism

More and more concept detectors available:
TRECVID 2005: 101 concept lexicon
TRECVID 2006: 491 concept lexicon
MediaMill Challenge 2007: 572 concept lexicon

... but focus is on the final result
relative merit of indexing methods: ignore intermediary
steps while systems become more complex (several
features and learning methods)

... but concept detectors developed mismatch
user information needs


TRECVID Interactive Video Search Task
Query selection:
by keyword,
by concept,
by example

Topics unknown
Test set
English (2004)
Chinese (2005-6)
Dutch (2007-8-9)


VideOlympics
Benchmark performance cannot be sole criterion
Experience of searcher counts
Usability of systems matters

VideoOlympics: live interactive search task
Simultaneous exposure
of video retrieval systems
Showcase that goes
beyond a regular demo
session
Fun to do (participants)
& Fun to watch (audience)


VideOlympics Setup

One display
TRECVID like queries
Results pushed by searchers

Agenda




Multimedia: Description methods

MPEG-21

MPEG-7

MPEG-4

MPEG-2

MPEG-1

ISO W3C


MPEG-7: a multimedia description language?

ISO standard
since December
of 2001 Content organization
Collections Models User
interaction

Main
components: Creation &
Navigation & User
Access Preferences
Descriptors Production
Summaries
(Ds) and Media Usage
Content management User
Description Views History
Schemes Content description

(DSs) Structural
aspects
Semantic
aspects
Variations

DDL (XML
Schema +
Basic elements
extensions) Schema Basic Links & media Basic
Tools datatypes localization Tools
Concern all
types of media Part 5 – MDS
Multimedia Description Schemes

MPEG-7 and the Semantic Web
MDS Upper Layer represented in RDFS
2001: Hunter
Later on: link to the ABC upper ontology

MDS fully represented in OWL-DL
2004: Tsinaraki et al., DS-MIRF model

MPEG-7 fully represented in OWL-DL
2005: Garcia and Celma, Rhizomik model
Fully automatic translation of the whole standard

MDS and Visual parts represented in OWL-DL
2007: Arndt et al., COMM model
Re-engineering MPEG-7 using DOLCE design patterns


Example 1: Region Annotation

http://en.wikipedia.org/wiki/
Image:Yalta_Conference.jpg

dns:realized-by

dns:setting
core:semantic-
core:image-data
annotation

dns:plays dns:defines foaf:Person

loc:region- loc:spatial-mask- core:semantic-label-
locator-descriptor role role
dns:played-by
rdf:type
dns:defines dns:played-by

http://en.wikipedia.org/wiki/
loc:bounding-box 5 25 10 20 15 15 10 10 5 15"^^xsd:string
Churchill
data:has-rectangle


Example 2: Sequence Annotation

http://www.reuters.com/news/video/
summitVideo?videoId=56114

dns:realized-by

dns:setting
core:semantic-
core:image-data
annotation

dns:plays dns:defines tgn:Sweden

loc:media-time- loc:temporal- core:semantic-label-
descriptor mask-role role
dns:played-by
skos:broader
dns:defines dns:played-by

loc:media-time-
"1:21"^^xsd:time tgn:Gothenburg
point
data:has-time

Image Annotation with Linked Data
Reg1
The "Big Three" at the Yalta
Conference (Wikipedia)

Localize a region (bounding box)
Annotate the content (interpretation)
Tag: Winston Churchill, UK Prime Minister, Allied Forces, WWII
Link to knowledge on the Web
:Reg1 foaf:depicts dbpedia:Winston_Churchill
----------------------------------------------
dbpedia:Winston_Churchill dbpedia:spouse
dbpedia:Clementine_Churchill
dbpedia:Winston_Churchill owl:sameAs
fbase:Winston_Churchill

Video Annotation with Linked Data
Seq4

Seq1
A history of G8 violence (video)
(© Reuters)

Localize a region
Annotate the content
Tag: G8 Summit, Heiligendamn, 2007
Link to knowledge on the Web EU Summit, Gothenburg, 2001
:Seq1 foaf:depicts dbpedia:34th_G8_Summit
----------------------------------------------
dbpedia:33rd_G8_Summit foaf:based_near geo:Heilegendamn
geo:Heilegendamn skos:broader geo:Germany

What is linked data?
URIs, possibly identifying
media fragments wp:2006_FIFA_World_Cup#Final

+ annotations (tags)
events:id
+ links among fragments
& annotations

geonames:2950159
nar:subject

nar:location nc:15054000

foaf:depicts
dbpedia:Zidane

31

Linked Data Principles

Tim Berners Lee [2006] (Design Issues)
1. Use URIs to identify things
(anything, not just documents);
2. Use HTTP URIs – globally unique names, distributed
ownership –
so that people can look up those names;
3. Provide useful information in RDF –
when someone looks up a URI;
4. Include RDF links to other URIs –
to enable discovery of related information


An Example: DBpedia

DBpedia is a community effort to:
extract structured "infobox" information from Wikipedia
interlink DBpedia with other datasets on the Web


Scraping infobox data

http://dbpedia.org/resource/Bogotá


Automatic Links Among Open Datasets

<http://dbpedia.org/resource/Bogotá>
owl:sameAs <http://sws.geonames.org/3688689/>
owl:sameAs
<http://rdf.freebase.com/ns/guid.9202a8c04000641f DBpedia
8000000000167bab>
dbpedia:population "6776009"
...

<http://sws.geonames.org/3688689/>
owl:sameAs <http://dbpedia.org/resource/Bogotá>
wgs84_pos:lat "4.6"
Geonames wgs84_pos:long "-74.0833333"
geo:population "7102602"
...


sameAs.org


Bogotá on Freebase


Bogotá on Geonames


How Much Linked Data is there ?


Linked Data Cloud – August 2007


Linked Data Cloud – March 2008


Linked Data Cloud – September 2008


Linked Data Cloud – March 2009


The Web of Data

Expose open datasets in RDF
Set RDF links among the data items for
different datasets
Over 4.5 billion triples, 5 millions links
(March 2009)
... still counting


Who are the users?
Why would they use the cloud?
What tasks can be supported?
How will the semantics help?


Agenda




Provide meaningful presentation of data


... and behind the scene


... link an artist to more data


... myspace


... last.fm


... IMDb


Going through the Walled Gardens

David Simonds: Everywhere and nowhere. 19 May 2008, The Economist.

How can semantics help?

Query construction
disambiguate input (auto-completion)
selection of available terms (grouping and ranking algorithms)

(Semantic) search algorithm
graph traversal
query expansion
RDFS/OWL reasoning

Presentation of search results
grouping by property
visualization on timeline, map, etc.

54

News Workflow Interoperability

No integration of media (stories, photo, animation, video)
Little (or no) context in the news presentation
Lack of interoperability in the current workflow

NAR Schema Broadcaster Schema
User
NewsCodes Controlled Vocabularies Vocabulary

55

Exploratory Search

(Ultimate) Goal:
Provide an environment for searching and browsing
contextualized multimedia news information

Required integration:
Data: various media, different forms, various sources
Metadata: schema integration, semantic models

Influence and implications of UI:
How to represent semantic multimedia metadata
to facilitate presenting information?
in other words ... What constraints do end-user
interfaces put on the modeling of the metadata?
56

News and Multimedia Formats

NewsML EventsML SportsML
G2 G2 G2

News Architecture
(NAR)


Modeling the News + Media Ontology

dc:Subject ≈
nar:Subject
foaf:Person ≈
nar:Person

sioc:Item ≈
+ nar:Item

geo:lat
geo:long


Enriching the News Metadata

Concepts/Entities that
are subject of news
Thematic categories
People
Organizations
Geopolitical Areas
Points of Interest
Events
Products or artefacts



Named Entity
Recognition

Domain Ontologies

NAR Ontology
NewsCodes
Thesaurus



Concept
Detectors

Domain Ontologies

NAR Ontology
NewsCodes
Thesaurus


Presenting News Information

Dimensions used for searching news items
When time 10/07/2006
Where location Paris
What is depicted J. Chirac, Z. Zidane Metadata
Why event WC 2006
Who photographer Bertrand Guay, AFP


Semantic Search of Multimedia News
Description Number of RDF Triples
General Ontologies: NAR, DC, FOAF 7,336

Domain Specific Ontologies: football 104,358

Thesauri: newscodes 34,903

DBpedia, Geonames 53,468

AFP News Feed (June/July 2006) 804,446

AFP Photos (June/July 2006) 61,311
a
INA Broadcast Video (June/July 2006)
P atri 1,932
Cl io
by
Total r ed lpha 3 1,067,754
P owe 1.0 a


Provide New Dimensions for Exploring


Take Home Message
Concept detection challenges: machine learning and IR
Features can be extracted and used to describe multimedia content
Show generality of approach, dynamic nature of video (event)
Show that an ontology can help

Semantic metadata representation challenges: KR
Media and metadata can be passed around and among systems
Reuse what is there
Expose what you make

Interaction challenges: CHI
Users can be given much richer
and more flexible access to (semantically annotated) content
... but we are still figuring out how to do this!


Credits

Many people
Cees Snoek, Alex Hauptmann, Alan Smeaton,
Ivan Herman, Krishna Chandramouli, David Simonds,
Laurent Le Meur
Colleagues from the Interactive Information Access
Group, CWI Amsterdam

Datasets

http://www.slideshare.net/troncy


Multimedia Semantics:Metadata, Analysis and Interaction

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Multimedia Semantics:Metadata, Analysis and Interaction

Ähnlich wie Multimedia Semantics:Metadata, Analysis and Interaction (20)

Mehr von Raphael Troncy

Mehr von Raphael Troncy (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)