A framework for visual search in broadcasting companies' multimedia archives

A framework for visual search in
broadcast archives
Speakers:
Federico Maria Pandolfi, Davide Desirello
Rai Teche

 Importance of proper organization and management of contents
 Efficient search and retrieval methodologies are a must
 Typical MAM systems: text-based queries, search over textual
information and metadata
 Pros: reliability, robustness
 Cons: metadata extraction is expensive, time consuming and may not be
available for each entry
 No semantic or analytical representations of contents
 No query-by-example or near-duplicate detection
Introduction

Rai's digital archives include (videos and images as of end 2015):
 1.540.032 hours of video material
 102.300 music sheets and documents
 18.720 photos of scenic costumes
 1.700 photos of sets furniture
 1.552 photos of Centro Elettronico Rai
The number increases with a rate of approx. 130.000hr/year both
because of new and old (digitized) material
Only about 46% is annotated
Case study: numbers

Case study: possible IR scenarios
Archives
 Correlating non-annotated material
with similar pre-annotated contents
(video-to-video search)
 Retrieve specific video/image in the
multimedia archive from a clip,
single frame or similar image
(image/video-to-image/video
search)
News
Link an edited
news/reportage to
its raw footage and,
viceversa (video-to-
video search)
Web
Find a specific show
from an image/clip
(image/video-to-
video search)

 Content-Based Image Retrieval (CBIR) solutions are
necessary
 Representation of images by means of features
automatically extractable from the contents
themselves, no annotation needed
 Large number of CBIR solutions available
 Highly customizable to address specific needs (e.g.:
global/local/DCNN features, lots of efficient indexing
and retrieval options, etc...)
The importance of CBIR

 Issue: lots of options for
image search, few for video-
to-video search
 Issue: Cutting-edge solutions
with solid absolute
performance but complex
systems and/or non patent-
free algorithms
 Expensive and difficult to
maintain platform: not ideal in
enterprise environment
Our approach
 Solution: new framework based on
ready-to-use solutions compatible
with Rai's enterprise infrastructure
 Solution: first approach based on
simpler, open-source solution
 LIRe (Lucene Image Retrieval):
• CBIR platform with strong community support
• Easy to integrate with Apache Solr
(widely used in Rai)
• Easy distributed search, index replication
and scalability

Modules composing the framework
(and their implementation):
 Listener (custom files and folders manager)
 Scene detector/key-frames extractor (FFMpeg)
 Feature extractor (CEDD, LIRESOLR Plugin)
 Indexer (LIRESOLR Plugin)
 Retriever (LIRESOLR Plugin)
Goal
Implement independent workflow's logic
blocks to:
 Develop code in parallel
 Easily replace blocks with
better/more efficient solutions
 Allow faster debugging and
maintenance operations
Proposed workflow: modularity

Proposed workflow: Listener and indexing
 Chain starts by indexing reference videos in
the database, various entry-points:
 Shared folder
 RESTful APIs
 Listener:
 Is a background process watching a shared
folder (container) with files to be indexed
 Manages the whole flow by giving specific
commands to the various components
 Manages the folders-structure
 Triggered by JSON Token file containing
file-list and parameters for indexing process
 Framework targeted at image search on
video files
 Scene detection and key-frame extraction
with FFMpeg
 Generation of CEDD features descriptor
(light, low computational power) for each
key-frame
 Indexing entries in Solr, two cores:
 ImageCore (ID, URI, Descriptor)
 MetaCore (other available metadata)

JSONListener
Video
+JSON
Watch
Folder
Proposed workflow: modularity
Index
(Solr)
Feature
Extractor
Indexer
Keyframe
Extractor
TV
Radio
Tape
Other

Simple retrieval algorithm:
1. Computation of query image descriptor
2. Descriptor-specific distance evaluated for each entry in the database
3. LIRe tweakable parameters:
 Accuracy
 Number of candidates
4. Results sorted by relevance using distance as score
Proposed workflow: retrieval

How to test the framework?
 Lack of copyright-free datasets and evaluation frameworks that target our
specific use-case (to use as reference)
 Impossible to perform image search on the whole Rai’s archive, datasets
selected (not annotated):
 TG Leonardo (2200 episodes, approx. 360hrs): thematic, scientific focused newscast,
suitable for news/reportage and raw footage retrieval
 Medita (2000 episodes, approx. 2000hrs): educational show, suitable to test pure image
search and tagging-aid capabilities
 Query images extracted from indexed videos using different techniques:
 FFMpeg shot detection
 Rai’s Shotfinder
Preliminary evaluation

Preliminary evaluation
The best match is not always
found among the very first results
 CEDD is a very compact
descriptor, images with similar
colours and textures may have
very similar descriptors
 Changing the accuracy
increases retrieval time,
slightly better results
Difficult to evaluate precision and
recall for query images different
than the indexed images (datasets
not annotated yet)
 If query shot is indexed: pat(1)≃1,
otherwise the distance increases
substantially
 Might be good enough for raw
footage/final edit match use-case

 Not able yet to find instances of same objects within different videos
and under different conditions (e.g. different video quality, framing,
etc..), no semantic search
 Might be because of CEDD and, in general, global descriptors
 Compact global descriptors may be good for specific tasks but a more
semantic approach is required
 Quantitative tests presented are not mature yet
 Making a proper dataset requires time and our framework is still in early stage
of development
 We plan to build our own annotated dataset using the company’s archive
material
Conclusions

Future work
Creation of a new
annotated dataset
containing raw and
edited material
Evaluation of better key-frame extraction and shot detection
algorithms:
 Reduce the number of extracted key-frames
 Weight key-frames according to their relevance within the
related sequence
 Improve retrieval performances, decrease index size and,
reduce disk occupation and speed-up search times
Evaluation of more sophisticated feature extraction
algorithms (local features, BoVW, DCNN feature vectors, ...)
 In some cases a semantic search (based on image
contents) might be more useful

Thank you for watching
F. M. Pandolfi, D. Desirello
Rai Teche

A framework for visual search in broadcasting companies' multimedia archives

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie A framework for visual search in broadcasting companies' multimedia archives

Ähnlich wie A framework for visual search in broadcasting companies' multimedia archives (20)

Mehr von FIAT/IFTA

Mehr von FIAT/IFTA (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A framework for visual search in broadcasting companies' multimedia archives