Exploring Video Hyperlinking in Broadcast Media

Maria Eskevich, Quoc-Minh Bui, Hoang-An Le,
Benoit Huet
Multimedia Department
EURECOM
Sophia-Antipolis, France
Exploring Video Hyperlinking
in Broadcast Media

Motivation
 CURRENTLY: Constantly growing quantity of
multimedia content produced by both
professionals and individual users.
 NEED: navigation systems that allow access to this
data on different levels of granularity in order to
contribute to further discovery of a topic of interest
for the user or to facilitate individual user browsing
within a collection.
10/30/15 SLAM Workshop @ ACM MM 2015, Brisbane, Australia 2

Task and Challenges
 We envisage the users to be interested in creating their
own path within the multimedia collection based on
their level of awareness/knowledge of it, and tasks.
 How do we create a hyperlinked collection to allow
each individual user to follow their own path of interest
within?
 Why is it challenging?
 Lack of interpretability of the content:
Overall textual description on the video level which leads to:
Retrieval process is based on text features, while the rich multimodality of
videos is not exploited;
Not possible to partially retrieve a video, a specific fragment without having
to watch the entire video.
 Archives are not static:
Need for dynamic creation of links

Insights in Hyperlinking
 Hyperlinking
Creating “links” between media
 Video Hyperlinking
video to video
video fragment to video fragment

Related experiments: Search and
Hyperlinking @ MediaEval/TRECVid
 Different techniques based on:
 Video segmentation method:
 Fixed-length units (same length, sentences, speech segments) with
further adjustment to match full sentences, using speech segment
boundaries and pauses. they build a probabilistic framework to
model the importance of words and refine segment boundaries
accordingly;
 Meaningful segments, based on topics derived from transcripts:
classification trees to define the starting and ending times of these
segments; lexical cohesion within segment.
 Features used for retrieval:
 Text similarity (vector-based models and TF-IDF weightings);
 Named entities or synonyms;
 Visual information: visual concepts or SURF and SIFT features,
detected at the shot level.

Search and Hyperlinking @
MediaEval/TRECVid
 Our solution:
 Scene segmentation technique based on visual
and temporal coherence of the video segments
that constitute the scenes of the video.
 Experiment with hybrid segmentation
Visual, Topic and Temporal Coherence
 Use visual features to improve the ranking of
results of the hyperlinking including visual
analysis during the search process.

System overview
Broadcast Media: BBC content
Manually transcribed
subtitles
Metadata:
title, cast, description, broadcast time
Shot segmentation,
keyframes
Visual content analysis:
Scene segmentation
Concept detection
on shot level
Lucene/Solr:
Indexing/Retrieval on shot level
Media database:
76 214 fragments
Webservice:
(HTML5/AJAX/PHP)
Videos:
2323 items/1697 hours
User Interface
Shot segmentation,
keyframe extraction
Result list:
Media fragments level

151 Visual Concepts (TrecVid 2012)
 3_Or_More_People
 Actor
 Adult
 Adult_Female_Human
 Adult_Male_Human
 Airplane
 Airplane_Flying
 Airport_Or_Airfield
 Anchorperson
 Animal
 Animation_Cartoon
 Armed_Person
 Athlete
 Baby
 Baseball
 Basketball
 Beach
 Bicycles
 Bicycling
 Birds
 Boat_Ship
 Boy
• Building
• Bus
• Car
• Car_Racing
• Cats
• Cattle
• Chair
• Charts
• Child
• Church
• City
• Cityscape
• Classroom
• Clouds
• Construction_Vehicles
• Court
• Crowd
• Dancing
• Daytime_Outdoor
• Demonstration_Or_Protest
• Desert
• Dogs
• Emergency_Vehicles
• Explosion_Fire
• Face
• Factory
• Female-Human-Face-Closeup
• Female_Anchor
• Female_Human_Face
• Female_Person
• Female_Reporter
• Fields
• Flags
• Flowers
• Football
• Forest
• Girl
• Golf
• Graphic
• Greeting
• Ground_Combat
• Gun
• Handshaking
• Harbors
• Helicopter_Hovering
• Helicopters
• Highway
• Hill
• Hockey
• Horse
• Hospital
• Human_Young_Adult
• Indoor
• Insect
• Kitchen
• Laboratory
• Landscape
• Machine_Guns
• Male-Human-Face-Closeup
• Male_Anchor
• Male_Human_Face
• Male_Person
• Male_Reporter
• Man_Wearing_A_Suit
• Maps
• Meeting
• …

Scene segmentation
 Scene : group of shots based on content similarity
and temporal consistency among shots;
 Content similarity : visual similarity between HSV
histograms extracted from the keyframes of
different shots;
 Grouping is performed using two extensions of the
Scene Transition Graph (STG):
 reduces the computational cost of STG-based shot grouping
by considering shot linking transitivity,
 builds on the former to construct a probabilistic framework that
alleviates the need for manual STG parameter selection.
[Apostolidis et al. “Automatic fine-grained hyperlinking of videos within a closed
collection using scene segmentation.” ACM MM 2014]

Handling Visual Concepts in Solr
http://localhost:8983/solr/collection_mediaEval/select?q=text:(Children
out on poetry trip Exploration of poetry by school children Poem writing)
Animal:[0.2 TO 1] Building:[0.2 TO 1]
 Schema = structure of document using fields of
different types
 Query
<doc>
<field name="id"> 20080401_013000_bbcfour_legends_marty_feldman_six_degrees_of#t=399,402</field>
<field name="begin">00:06:39.644</field>
<field name="end">00:06:42.285</field>
<field name="videoId">20080401_013000_bbcfour_legends_marty_feldman_six_degrees_of</field>
<field name="subtitle">'It was very, very successful.'</field>
<field name="Actor">0.143</field>
<field name="Adult">0.239</field>
<field name="Animal">0.0572</field>
</doc>

Results (S&H 2014)

Results discussion
 Text (Audio) performs best on the 2014
MediaEval Hyperlinking task
 Using visual features isn’t straightforward
 The ‘MoreLikeThis’ approach is outperformed
by the Text only search
Possibly due to query formulation differences
Sentences vs Keywords
 Visual Scenes outperform other fragmentation
levels (Topic, Sentences)
[Safadi et al. “When textual and visual information join forces
for multimedia retrieval”, ICMR 2014]

DEMO
 Hyper Video Browser
Search/Browse Hyperlinking

Conclusion and Future work
 We proposed and evaluated an approach to include
visual properties in the search of video segments for
hyperlinking.
 Experimental results show that :
 Visual properties provide meaningful cues for segmenting the video
 Mapping text-based queries to visual concepts is not easy
 Automatic selection of relevant concepts is required (human
intervention is impractical and does not necessarily lead to perfect
results)
 Operating at the keyword level rather than full text offers
improvements
 Current/Future work: incorporate query semantics
when identifying key visual semantic concepts
 based on named entity recognition approaches and keyword/visual
concepts co-occurrences

References
 E. Apostolidis, V. Mezaris, M. Sahuguet, B. Huet, B. Cervenkova, D. Stein, S. Eickeler, J.
L. Redondo Garcia,R. Troncy, and L. Pikora. “Automatic ne-grained hyperlinking of videos
within a closed collection using scene segmentation”. In ACMMM 2014, 22nd ACM
International Conference on Multimedia, Orlando, Florida, USA, 11 2014.
 H. Le, Q. Bui, B. Huet, B. Cervenkova, J. Bouchner, E. Apostolidis, F. Markatopoulou, A.
Pournaras, V. Mezaris, D. Stein, S. Eickeler, and M. Stadtschnitzer,.“LinkedTV at
MediaEval 2014 Search and Hyperlinking Task”. In Proceedings of MediaEval 2014
Workshop, 2014.
 R. J. F. Ordelman, M. Eskevich, R. Aly, B. Huet, and G. J. F. Jones. “Dening and
evaluating video hyperlinking for navigating multimedia archives”. In Proceedings of the
24th International Conference on World Wide Web Companion, WWW 2015, Florence,
Italy, May 18-22, 2015 - Companion Volume, pages 727-732, 2015.
 M. Sahuguet, B. Huet, B. Cervenkova, E. Apostolidis, V. Mezaris, D. Stein, S. Eickeler, J.
L. Redondo Garcia, and L. Pikora. “LinkedTV at MediaEval 2013 search and hyperlinking
task”. In MediaEval 2013 Workshop, Barcelona, Spain, 10 2013.
 B. Safadi, M. Sahuguet, and B. Huet. “When textual and visual information join forces for
multimedia retrieval”. In Proceedings of International Conference on Multimedia
Retrieval (ICMR '14). ACM, New York, NY, USA, , Pages 265 , 8 pages, 2014.

Questions?
Thank you for your
attention!

Exploring Video Hyperlinking in Broadcast Media

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Exploring Video Hyperlinking in Broadcast Media

Editor's Notes