SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
Interactive Video Search and Browsing Systems

                    Marco Bertini, Alberto Del Bimbo, Andrea Ferracani, Daniele Pezzatini
                               Universit` degli Studi di Firenze - MICC, Italy
                                         a
                             {bertini,delbimbo,ferracani,pezzatini}@dsi.unifi.it


                         Abstract                                      Within the EU VidiVideo1 and IM3I2 projects have been
                                                                   conducted two surveys, formulated following the guide-
   In this paper we present two interactive systems for video      lines of the IEEE 830-1984 standard, to gather user re-
search and browsing; one is a web application based on the         quirements for a video search engine. Overall, more than
Rich Internet Application paradigm, designed to obtain the         50 qualified users from 10 different countries participated;
levels of responsiveness and interactivity typical of a desk-      roughly half of the users were part of the scientific and cul-
top application, while the other exploits multi-touch devices      tural heritage sector (e.g. academy and heritage audio-visual
to implement a multi-user collaborative application. Both          archives) and half from the broadcasting, entertainment and
systems use the same ontology-based video search engine,           video archive industry. The survey campaigns have shown a
that is capable of expanding user queries through ontology         strong request for systems that have a web-based interface:
reasoning, and let users to search for specific video seg-          around 75% of the interviewees considered this “manda-
ments that contain a semantic concept or to browse the con-        tory” and around 20% considered it “desirable”. Users also
tent of video collections, when it’s too difficult to express a     requested the possibility to formulate complex composite
specific query.                                                     queries and to expand queries based on ontology reasoning.
                                                                   Since many archives use controlled lexicons and ontologies
                                                                   in their annotation practices, they also requested to have an
1. Introduction                                                    interface able to show concepts and their relations.
                                                                       As for multi-touch interfaces they have been popularized
                                                                   by personal devices like smartphones and tablet computers,
    Video search engines are the result of advancements in         and are mostly designed for single user interaction. We ar-
many different research ares: audio-visual feature extrac-         gue however, that multi-touch and multi-user interfaces may
tion and description, machine learning techniques, as well         be effective in certain production environments [8] like TV
as visualization, interaction and user interface design. Au-       news, where people with different roles may have to collab-
tomatic video annotation systems are based on large sets of        orate in order to produce a new video, thanks to the possi-
concept classifiers [12], typically based on supervised ma-         bility to achieve a common view with mutual monitoring of
chine learning techniques such as SVMs; the use of these           others activities, and verbal and gestural utterances [13]; in
supervised learning approaches requires tools to easily cre-       these cases a web based interface that forces people to work
ate ground-truth annotations of videos, indicating the ob-         separately on different screens may not be the most efficient
jects and events of interest, in order to train appropriate con-   tool.
cept classifiers. The current video search engines are based
                                                                       Regarding video annotation, some manual annotation
on lexicons of semantic concepts and perform keyword-
                                                                   tools have become popular also in several mainstream video
based queries [10,11]. These systems are generally desktop
                                                                   sharing sites such as YouTube and Viddler, because they ex-
applications or have simple web interfaces that show the re-
                                                                   tend the popular idea of image tagging, made popular by
sults of the query as a ranked list of keyframes [7,12], simi-
                                                                   Flickr and Facebook, applying it to videos. This informa-
larly to the interfaces made common by web search engines.
                                                                   tion is managed in a collaborative web 2.0 style, since an-
These systems do not let users to perform composite queries
                                                                   notations can be inserted not only by content owners (video
that can include temporal relations between concepts and
                                                                   professionals, archivists, etc.), but also by end users, pro-
do not allow to look for concepts that are not in the lexicon.
                                                                   viding an enhanced knowledge representation of multime-
In addition, desktop applications require installation on the
                                                                   dia content.
end-user computer and can not be used in a distributed envi-
ronment, while the web-based tools allow only limited user           1 http://www.vidivideo.info

interaction                                                          2 http://www.im3i.eu
Figure 1. Automatically created ontology structure: view of the relations between some concepts
   related to “people”.

    In this paper we present an integrated system composed        specific visual descriptors. Concepts, concepts relations,
by i) a video search engine that allows semantic retrieval by     video annotations and visual concept prototypes are defined
content for different domains (possibly modelled with dif-        using the standard Web Ontology Language (OWL) so that
ferent ontologies) with query expansion and ontology rea-         the ontology can be easily reused and shared. The queries
soning; ii) web-based interfaces for interactive query com-       created in each interface are translated by the search engine
position, archive browsing, annotation and visualization;         into SPARQL, the W3C standard ontology query language.
iii) a multi-touch tangible interface featuring a collaborative   The system extends the queries adding synonyms and con-
natural interaction application.                                  cept specializations through ontology reasoning and the use
                                                                  of WordNet. As an example consider the query “Find
2. The System                                                     shots with vehicles”: the concept specializations expansion
The search engine The Orione search engine permits dif-           through inference over the ontology structure permits to re-
ferent query modalities (free text, natural language, graph-      trieve the shots annotated with vehicle, and also those an-
ical composition of concepts using Boolean and tempo-             notated with the concept’s specializations (e.g. trucks, cars,
ral relations and query by visual example) and visualiza-         etc.). In particular, WordNet query expansion, using syn-
tions, resulting in an advanced tool for retrieval and explo-     onyms, is required when using free-text queries, since it is
ration of video archives for both technical and non-technical     not desirable to force the user to formulate a query selecting
users. It uses an ontology that has been created semi-            only the terms from a predefined lexicon.
automatically from a flat lexicon and a basic light-weight             Automatic video annotations are provided by a system
ontology structure, using WordNet to create concept rela-         based on the bag-of-words approach, that exploits SIFT,
tions (is a, is part of and has part) as shown, in a frag-        SURF and MSER visual features and the Pyramid Match
ment, in Fig. 1. The ontology is modelled following the           Kernel [5]; these annotations can be complemented by man-
Dynamic Pictorially Enriched Ontology model [3], that in-         ual annotations added through a web-based interface.
cludes both concepts and visual concept prototypes. These         The web-based user interface The web-based Sirio
prototypes represent the different visual modalities in which     search system3 , based on the Rich Internet Application
a concept can manifest; they can be selected by the users         paradigm (RIA), does not require any software installation,
to perform query by example, using MPEG-7 descriptors
(e.g. Color Layout and Edge Histogram) or other domain              3 http://shrek.micc.unifi.it/im3i/
Figure 3. The Sirio web-based user interfaces: left) simple user interface; right) advanced user inter-
   face with ontology graph and geo tagging.




   Figure 2. Manual video annotation system:
   adding geographical annotations to a visual
   concept                                                         Figure 4. Browsing videos: select concepts
                                                                   in the tag cloud, then browse the video
is extremely responsive. It is complemented by a system            archive or start a search from a concept of
for manual annotation of videos (Fig. 2), developed with the       the ontology.
aim of creating, collaboratively, manual annotations e.g. to    zoom in each result displaying it in a larger player that
provide geographical annotations and ground truth annota-       shows more details on the video metadata and allows bet-
tions that can be used for correcting and integrating auto-     ter video inspection. The extended video player allows also
matic annotations or for training and evaluating automatic      to search for visually similar video shots. Furthermore an-
video annotation systems. Sirio is composed by two dif-         other interface (Andromeda), integrated with Sirio, allows
ferent interfaces shown in Fig. 3: a simple search interface    to browse video archives navigating through the relations
with only a free-text field for Google-like searches, and an     between the concepts of the ontology and providing direct
advanced search interface with a GUI to build composite         access to the instances of these concepts; this functionality
queries that may include Boolean and temporal operators,        is useful when a user does not have a clear idea regarding
metadata (like programme broadcast information and geo          the query that he wants to make.
tags) and visual examples. The GUI interface allows also
to inspect and use a local view of the ontology graph, when        This interface is based on some graphical elements typ-
building queries, to better understand how one concept is re-   ical of web 2.0 interfaces, such as the tag cloud. The user
lated to the others and thus suggesting to the users possible   starts selecting concepts from a “tag cloud”, than navigates
changes of the composition of the query.                        the ontology that describes the video domain, shown as
    For each result of the query it is shown the first frame     a graph with different types of relations, and inspects the
of the video clip. These frames are obtained from the video     video clips that contain the instances of the concepts used
streaming server, and are shown within a small video player.    as annotations (Fig. 4). Users can select a concept from the
Users can then play the video sequence and, if interested,      ontology graph to build a query in the advanced search in-
                                                                terface at any moment.
The tangible user interface MediaPick is a system that
allows semantic search and organization of multimedia con-
tents via multi-touch interaction. It has an advanced user
centered interaction design, developed following specific
usability principles for search activities [1], which allows
users collaboration on a tabletop [9] about specific topics
that can be explored, thanks to the use of ontologies, from
general to specific concepts. Users can browse the ontol-
ogy structure in order to select concepts and start the video
retrieval process. Afterwards they can inspect the results re-
turned by the Orione video search engine and organize them
according to their specific purposes. Some interviews to po-
tential end-users have been conducted with the archivists of
RAI, the Italian public broadcaster, in order to study their
workflow and collect suggestions and feedbacks; at present
RAI journalists and archivists can search the corporate dig-
ital libraries through a web-based system. This provides a
simple keyword based search on textual descriptions of the
archived videos. Sometimes these descriptions are not very
detailed or very relevant to the video content, thus making
the document difficult to find. The cognitive load required
for an effective use of the system often makes the journalists       Figure 5. MediaPick system: top) concepts
delegate their search activities to the archivists that could        view: a) the ontology graph; b) controller
be not familiar with the specific topic and therefore could           module; bottom)results view: a) video players
hardly choose the right search keyword. The goal of the              and user’s result picks; b) controller module
MediaPick design is to provide to the broadcast editorial            and results list
staff an intuitive and collaborative interface to search, vi-    able to change the state of the interface and go to the re-
sualize and organize video results archived in huge digital      sults view. Also the results view shows the controller, to
libraries with a natural interaction approach.                   let the user to select the archived concepts and launch the
                                                                 query against the database. The returned data are then dis-
   The user interface adopts some common visualization           played within a horizontal list (Fig. 5 bottom), in a new vi-
principles derived from the discipline of Information Visu-      sual component placed at the top of the screen, which con-
alization [4] and is equipped with a set of interaction func-    tains returned videos and their related concepts. Each video
tionalities designed to improve the usability of the system      element has three different states: idle, playback and in-
for the end-users. The GUI consists of a concepts view           formation. In the idle state the video is represented with a
(Fig. 5, top), to select one or more keywords from an on-        keyframe and a label visualizing the concept used for the
tology structure and use them to query the digital library,      query. During the playback state the video starts playing
and a results view (Fig. 5, bottom), which shows the videos      from the frame in which the selected concept was annotated.
returned from the database, so that the user can navigate and    A longer touch of the video element activates the informa-
organize the extracted contents.                                 tion state, that shows a panel with some metadata (related
    The concepts view consists of two different interactive      concepts, quality, duration, etc.) over the video.
elements: the ontology graph (Fig. 5, top), to explore               At the bottom of the results list there are all the concepts
the concepts and their relations, and the controller module      related to the video results. By selecting one or more of
(Fig. 5, bottom), to save the selected concepts and switch       these concepts, the video clips returned are filtered in order
to the results view. The user chooses the concepts as query      to improve the information retrieval process. The user can
from the ontology graph. Each node of the graph consists         select any video element from the results list and drag it out-
of a concept and a set of relations. The concept can be se-      side. This action can be repeated for other videos, returned
lected and then saved into the controller, while a relation      by the same or other queries. Videos placed out of the list
can be triggered to list the related concepts, which can be      can be moved along the screen, resized or played. A group
expanded and selected again; then the cycle repeats. The         of videos can be created by collecting two or more video
related concepts are only shown when a precise relation is       elements in order to define a subset of results. Each group
triggered, in order to minimize the number of visual ele-        can be manipulated as a single element through a contextual
ments present at the same time in the interface. After sav-      menu: it can be expanded to show a list of its elements or
ing all the desired concepts into the controller, the user is    released in order to ungroup the videos.
Streaming
                                                                                          media server, as well as the rendering of the GUI elements
                                              Core logic
                                                                           media server   on the screen (Fig. 6). The input management module is
                                                Gesture
                                                                                          driven by the TUIO dispatcher: this component is in charge
                                              framework
                                                                             Semantic
                                                                                          of receiving and dispatching the TUIO messages sent by
                                                                           video search
                                                                              engine      the multi-touch overlay to the gesture framework through
     Multi-touch screen                   Server socket
                                                                                          the server socket (Fig. 6). This module is able to manage
                                                                                          the events sent by the input manager, translating them into
                                                                                          commands for the gesture framework and core logic.
                          Multi-touch input                Generic input
                                                                                             The logic behind the multi-touch interfaces needs a dic-
                             manager                        manager                       tionary of gestures which users are allowed to perform. It is
                                                                                          possible to see each digital object on the surface like an ac-
                              XML                             XML
                                                                                          tive touchable area; for each active area a set of gestures is
                           configuration                    configuration
                                                                                          defined for the interaction, thus it is useful to link each touch
                                                                                          with the active area in which it is enclosed. For this reason
                                                                                          each active area has its own set of touches and allows the
   Figure 6. Multitouch interface system archi-
                                                                                          gesture recognition through the interpretation of their asso-
   tecture
                                                                                          ciated behavior. All the user interface actions mentioned
3. Architecture                                                                           above are triggered by natural gestures shown in the Tab. 1.
    The system backend and the search engine are currently
                                                                                            Gesture                Actions
based on open source tools (i.e. Apache Tomcat and Red
5 video streaming server) or freely available commercial                                                           - Select concept
                                                                                                                   - Trigger the controller module
tools (Adobe Media Server has a free developer edition).                                    Single tap
                                                                                                                   switch
Videos are streamed using the RTMP video streaming pro-
                                                                                                                   - Select video group contextual
tocol. The search engine is developed in Java and supports                                                         menu options
multiple ontologies and ontology reasoning services. Ontol-                                                        - Play video element
ogy structure and concept instances serialization have been                                                        - Trigger video information
designed so that inference can be execute simultaneously                                    Long pressure touch
                                                                                                                   state
on multiple ontologies, without slowing the retrieval; this                                                        - Show video group contextual
design allows to avoid the need of selecting a specific on-                                                         menu
tology when creating a query with the Google-like interface.                                Drag                   - Move video element
The engine has also been designed to fit into a service ori-                                                        - Resize video
                                                                                            Two-fingers drag
ented architecture, so that it can be incorporated into the                                                        - Group videos
customizable search systems, other than Sirio and Medi-
aPick, that are developed within IM3I and euTV projects.                                     Table 1. Gesture/Actions association for the
Audio-visual concepts are automatically annotated using ei-                                  multi-touch user interface
ther IM3I and euTV automatic annotation engines. The
search results are produced in RSS 2.0 XML format, with                                   4. Usability test
paging, so that they can be used as feeds by any RSS reader                                   The video search engine (Sirio+Orione) has been tested
tool and it is possible to subscribe to a specific search. Both                            to evaluate the usability of the system, in a set of field
the web-based interface and the multi-touch interface have                                trials. A group of 19 professionals coming from broad-
been developed in Flex+Flash, according to the Rich Inter-                                casting, media industry, cultural heritage institutions and
net Application paradigm.                                                                 video archives in Italy, The Netherlands, Hungary and Ger-
Multitouch MediaPick exploits a multi-touch technology                                    many have tested the system on-line (running on the MICC
chosen among various approaches experimented in our lab                                   servers), performing a set of pre-defined queries and activ-
since 2004 [2]. Our solution uses an infrared LED array as                                ities, and interacting with the system. The methodology
an overlay built on top of an LCD standard screen (capable                                used follows the practices defined in the ISO 9241 stan-
of a full-HD resolution). The multi-touch overlay can de-                                 dard, and gathered: i) observational notes taken during test
tect fingers and objects on its surface and sends information                              sessions by monitors, ii) verbal feedback noted by test mon-
about touches using the TUIO [6] protocol at a rate of 50                                 itor and iii) an online survey completed after the tests by all
packets per second. MediaPick architecture is composed by                                 the users. The observational notes and verbal feedbacks of
an input manager layer that communicates trough the server                                the users have been analyzed to understand the more critical
socket with the gesture framework and core logic. The lat-                                parts of the system and, during the development of the IM3I
ter is responsible of the connection to the web services and                              project, have been used to redesign the functionalities of the
Figure 7. Overview of usability tests: overall usability of the system, usability of the combination of
   search modalities.

system. Fig. 7 summarizes two results of the tests. The               [4] S. K. Card, J. D. Mackinlay, and B. Shneiderman, editors.
overall experience is very positive and the system proved to              Readings in information visualization: using vision to think.
be easy to use, despite the objective difficulty of interacting            Morgan Kaufmann Publishers Inc., 1999.
                                                                      [5] K. Grauman and T. Darrell. The pyramid match kernel: Ef-
with a complex system for which the testers received only
                                                                          ficient learning with sets of features. Journal of Machine
a very limited training. Users appreciated the combination                Learning Research (JMLR), 8:725–760, 2007.
of different interfaces. The type of interaction that proved          [6] M. Kaltenbrunner, T. Bovermann, R. Bencina, and
to be more suitable for the majority of the users is the ad-              E. Costanza. Tuio a protocol for table-top tangible user in-
vanced interface, because of its many functionalities.                    terfaces. In Proc. of International Workshop on Gesture in
                                                                          Human-Computer Interaction and Simulation, 2005.
5. Conclusions                                                                                      sc
                                                                      [7] A. Natsev, J. Smith, J. Teˇi´, L. Xie, R. Yan, W. Jiang, and
                                                                          M. Merler. IBM Research TRECVID-2008 video retrieval
   In this paper we have presented two semantic video                     system. In Proc. of TRECVID Workshop, 2008.
search systems based on web and multi-touch multi-user                [8] C. Shen. Multi-user interface and interactions on direct-
interfaces, developed within three EU funded research                     touch horizontal surfaces: collaborative tabletop research at
projects and tested by a large number of professionals in                 merl. In Proc. of IEEE International Workshop on Horizon-
real-world field trials. Our future work will deal with fur-               tal Interactive Human-Computer Systems, 2006.
                                                                      [9] C. Shen.      From clicks to touches: Enabling face-to-
ther development of the interfaces, especially considering                face shared social interface on multi-touch tabletops. In
the new HTML5 technologies, extensive testing of the tan-                 D. Schuler, editor, Online Communities and Social Com-
gible user interface and thorough comparison of the two                   puting, volume 4564 of Lecture Notes in Computer Science,
systems.                                                                  pages 169–175. Springer Berlin / Heidelberg, 2007.
                                                                     [10] A. Smeaton, P. Over, and W. Kraaij. High-level feature de-
Acknowldgements The authors thank Giuseppe Becchi for
                                                                          tection from video in TRECVid: a 5-year retrospective of
his work on software development. This work was partially sup-
                                                                          achievements. Multimedia Content Analysis, Theory and Ap-
ported by theprojects: EU IST VidiVideo (www.vidivideo.info -
                                                                          plications, pages 151–174, 2009.
contract FP6-045547), EU IST IM3I (http://www.im3i.eu/ - con-        [11] C. Snoek, K. van de Sande, O. de Rooij, B. Huurnink, J. van
tract FP7-222267) and EU IST euTV (http://www.eutvweb.eu/ -               Gemert, J. Uijlings, J. He, X. Li, I. Everts, V. Nedovi´,  c
contract FP7-226248).                                                     M. van Liempt, R. van Balen, F. Yan, M. Tahir, K. Miko-
References                                                                lajczyk, J. Kittler, M. de Rijke, J. Geusebroek, T. Gevers,
                                                                          M. Worring, A. Smeulders, and D. Koelma. The MediaMill
                                                                          TRECVID 2008 semantic video search engine. In Proc. of
 [1] R. S. Amant and C. G. Healey. Usability guidelines for in-
                                                                          TRECVID Workshop, 2008.
     teractive search in direct manipulation systems. In Proc. of    [12] C. G. M. Snoek, K. E. A. van de Sande, O. de Rooij,
     the International Joint Conference on Artificial Intelligence,        B. Huurnink, J. R. R. Uijlings, M. van Liempt, M. Bugalho,
     volume 2, pages 1179–1184, San Francisco, CA, USA, 2001.             I. Trancoso, F. Yan, M. A. Tahir, K. Mikolajczyk, J. Kit-
     Morgan Kaufmann Publishers Inc.                                      tler, M. de Rijke, J.-M. Geusebroek, T. Gevers, M. Worring,
 [2] S. Baraldi, A. Bimbo, and L. Landucci. Natural interaction
                                                                          D. C. Koelma, and A. W. M. Smeulders. The MediaMill
     on tabletops. Multimedia Tools and Applications (MTAP),
                                                                          TRECVID 2009 semantic video search engine. In Proc. of
     38:385–405, July 2008.
 [3] M. Bertini, A. Del Bimbo, G. Serra, C. Torniai, R. Cuc-              TRECVID Workshop, November 2009.
                                                                     [13] E. Tse, S. Greenberg, C. Shen, and C. Forlines. Multimodal
     chiara, C. Grana, and R. Vezzani. Dynamic pictorially en-
                                                                          multiplayer tabletop gaming. Computers in Entertainment
     riched ontologies for digital video libraries. IEEE MultiMe-
                                                                          (CIE), 5(2), April 2007.
     dia, 16(2):42–51, Apr/Jun 2009.

Weitere ähnliche Inhalte

Ähnlich wie Interactive Video Search and Browsing Systems

Second Serenoa Newsletter
Second Serenoa NewsletterSecond Serenoa Newsletter
Second Serenoa NewsletterSerenoa Project
 
Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization  Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization ijcga
 
Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization  Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization ijcga
 
Exploiting semantics-in-collaborative-software-development-tasks
Exploiting semantics-in-collaborative-software-development-tasksExploiting semantics-in-collaborative-software-development-tasks
Exploiting semantics-in-collaborative-software-development-tasksDimitris Panagiotou
 
A Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceA Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceNathan Mathis
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype librariesMartin Chapman
 
Development Tools - Abhijeet
Development Tools - AbhijeetDevelopment Tools - Abhijeet
Development Tools - AbhijeetAbhijeet Kalsi
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrievaleSAT Journals
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayMartin Turner
 
Advene As A Tailorable Hypervideo Authoring Tool A Case Study
Advene As A Tailorable Hypervideo Authoring Tool  A Case StudyAdvene As A Tailorable Hypervideo Authoring Tool  A Case Study
Advene As A Tailorable Hypervideo Authoring Tool A Case StudyLaurie Smith
 
Srs for virtual eucation
Srs for virtual eucationSrs for virtual eucation
Srs for virtual eucationSusheel Thakur
 

Ähnlich wie Interactive Video Search and Browsing Systems (20)

Sirio Orione and Pan
Sirio Orione and PanSirio Orione and Pan
Sirio Orione and Pan
 
Sirio
SirioSirio
Sirio
 
Semantic browsing
Semantic browsingSemantic browsing
Semantic browsing
 
On Linked Open Data (LOD)-based Semantic Video Annotation Systems
On Linked Open Data (LOD)-based  Semantic Video Annotation SystemsOn Linked Open Data (LOD)-based  Semantic Video Annotation Systems
On Linked Open Data (LOD)-based Semantic Video Annotation Systems
 
Media Pick
Media PickMedia Pick
Media Pick
 
On Annotation of Video Content for Multimedia Retrieval and Sharing
On Annotation of Video Content for Multimedia  Retrieval and SharingOn Annotation of Video Content for Multimedia  Retrieval and Sharing
On Annotation of Video Content for Multimedia Retrieval and Sharing
 
Arneb
ArnebArneb
Arneb
 
Second Serenoa Newsletter
Second Serenoa NewsletterSecond Serenoa Newsletter
Second Serenoa Newsletter
 
Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization  Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization
 
Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization  Video Data Visualization System : Semantic Classification and Personalization
Video Data Visualization System : Semantic Classification and Personalization
 
hailpern-interact09
hailpern-interact09hailpern-interact09
hailpern-interact09
 
hailpern-interact09
hailpern-interact09hailpern-interact09
hailpern-interact09
 
Exploiting semantics-in-collaborative-software-development-tasks
Exploiting semantics-in-collaborative-software-development-tasksExploiting semantics-in-collaborative-software-development-tasks
Exploiting semantics-in-collaborative-software-development-tasks
 
A Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceA Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning Experience
 
Scalable architectures for phenotype libraries
Scalable architectures for phenotype librariesScalable architectures for phenotype libraries
Scalable architectures for phenotype libraries
 
Development Tools - Abhijeet
Development Tools - AbhijeetDevelopment Tools - Abhijeet
Development Tools - Abhijeet
 
Review on content based video lecture retrieval
Review on content based video lecture retrievalReview on content based video lecture retrieval
Review on content based video lecture retrieval
 
CREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 MayCREW VRE Release 5 - 2009 May
CREW VRE Release 5 - 2009 May
 
Advene As A Tailorable Hypervideo Authoring Tool A Case Study
Advene As A Tailorable Hypervideo Authoring Tool  A Case StudyAdvene As A Tailorable Hypervideo Authoring Tool  A Case Study
Advene As A Tailorable Hypervideo Authoring Tool A Case Study
 
Srs for virtual eucation
Srs for virtual eucationSrs for virtual eucation
Srs for virtual eucation
 

Mehr von Media Integration and Communication Center (13)

Interactive Video Search and Browsing Systems
Interactive Video Search and Browsing SystemsInteractive Video Search and Browsing Systems
Interactive Video Search and Browsing Systems
 
Danthe. Digital and Tuscan heritage
Danthe. Digital and Tuscan heritageDanthe. Digital and Tuscan heritage
Danthe. Digital and Tuscan heritage
 
IM3I Presentation
IM3I PresentationIM3I Presentation
IM3I Presentation
 
IM3I flyer
IM3I flyerIM3I flyer
IM3I flyer
 
IM3I brochure
IM3I brochureIM3I brochure
IM3I brochure
 
IM3I flyer
IM3I flyerIM3I flyer
IM3I flyer
 
PASCAL VOC 2010: semantic object segmentation and action recognition in still...
PASCAL VOC 2010: semantic object segmentation and action recognition in still...PASCAL VOC 2010: semantic object segmentation and action recognition in still...
PASCAL VOC 2010: semantic object segmentation and action recognition in still...
 
The harmony potential: fusing local and global information for semantic image...
The harmony potential: fusing local and global information for semantic image...The harmony potential: fusing local and global information for semantic image...
The harmony potential: fusing local and global information for semantic image...
 
MediaPick
MediaPickMediaPick
MediaPick
 
Andromeda
AndromedaAndromeda
Andromeda
 
Sirio, Orione and Pan
Sirio, Orione and PanSirio, Orione and Pan
Sirio, Orione and Pan
 
Vidivideo and IM3I
Vidivideo and IM3IVidivideo and IM3I
Vidivideo and IM3I
 
Accurate Evaluation of HER-2 Ampli cation in FISH Images Poster at Internatio...
Accurate Evaluation of HER-2 Amplication in FISH Images Poster at Internatio...Accurate Evaluation of HER-2 Amplication in FISH Images Poster at Internatio...
Accurate Evaluation of HER-2 Ampli cation in FISH Images Poster at Internatio...
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Interactive Video Search and Browsing Systems

  • 1. Interactive Video Search and Browsing Systems Marco Bertini, Alberto Del Bimbo, Andrea Ferracani, Daniele Pezzatini Universit` degli Studi di Firenze - MICC, Italy a {bertini,delbimbo,ferracani,pezzatini}@dsi.unifi.it Abstract Within the EU VidiVideo1 and IM3I2 projects have been conducted two surveys, formulated following the guide- In this paper we present two interactive systems for video lines of the IEEE 830-1984 standard, to gather user re- search and browsing; one is a web application based on the quirements for a video search engine. Overall, more than Rich Internet Application paradigm, designed to obtain the 50 qualified users from 10 different countries participated; levels of responsiveness and interactivity typical of a desk- roughly half of the users were part of the scientific and cul- top application, while the other exploits multi-touch devices tural heritage sector (e.g. academy and heritage audio-visual to implement a multi-user collaborative application. Both archives) and half from the broadcasting, entertainment and systems use the same ontology-based video search engine, video archive industry. The survey campaigns have shown a that is capable of expanding user queries through ontology strong request for systems that have a web-based interface: reasoning, and let users to search for specific video seg- around 75% of the interviewees considered this “manda- ments that contain a semantic concept or to browse the con- tory” and around 20% considered it “desirable”. Users also tent of video collections, when it’s too difficult to express a requested the possibility to formulate complex composite specific query. queries and to expand queries based on ontology reasoning. Since many archives use controlled lexicons and ontologies in their annotation practices, they also requested to have an 1. Introduction interface able to show concepts and their relations. As for multi-touch interfaces they have been popularized by personal devices like smartphones and tablet computers, Video search engines are the result of advancements in and are mostly designed for single user interaction. We ar- many different research ares: audio-visual feature extrac- gue however, that multi-touch and multi-user interfaces may tion and description, machine learning techniques, as well be effective in certain production environments [8] like TV as visualization, interaction and user interface design. Au- news, where people with different roles may have to collab- tomatic video annotation systems are based on large sets of orate in order to produce a new video, thanks to the possi- concept classifiers [12], typically based on supervised ma- bility to achieve a common view with mutual monitoring of chine learning techniques such as SVMs; the use of these others activities, and verbal and gestural utterances [13]; in supervised learning approaches requires tools to easily cre- these cases a web based interface that forces people to work ate ground-truth annotations of videos, indicating the ob- separately on different screens may not be the most efficient jects and events of interest, in order to train appropriate con- tool. cept classifiers. The current video search engines are based Regarding video annotation, some manual annotation on lexicons of semantic concepts and perform keyword- tools have become popular also in several mainstream video based queries [10,11]. These systems are generally desktop sharing sites such as YouTube and Viddler, because they ex- applications or have simple web interfaces that show the re- tend the popular idea of image tagging, made popular by sults of the query as a ranked list of keyframes [7,12], simi- Flickr and Facebook, applying it to videos. This informa- larly to the interfaces made common by web search engines. tion is managed in a collaborative web 2.0 style, since an- These systems do not let users to perform composite queries notations can be inserted not only by content owners (video that can include temporal relations between concepts and professionals, archivists, etc.), but also by end users, pro- do not allow to look for concepts that are not in the lexicon. viding an enhanced knowledge representation of multime- In addition, desktop applications require installation on the dia content. end-user computer and can not be used in a distributed envi- ronment, while the web-based tools allow only limited user 1 http://www.vidivideo.info interaction 2 http://www.im3i.eu
  • 2. Figure 1. Automatically created ontology structure: view of the relations between some concepts related to “people”. In this paper we present an integrated system composed specific visual descriptors. Concepts, concepts relations, by i) a video search engine that allows semantic retrieval by video annotations and visual concept prototypes are defined content for different domains (possibly modelled with dif- using the standard Web Ontology Language (OWL) so that ferent ontologies) with query expansion and ontology rea- the ontology can be easily reused and shared. The queries soning; ii) web-based interfaces for interactive query com- created in each interface are translated by the search engine position, archive browsing, annotation and visualization; into SPARQL, the W3C standard ontology query language. iii) a multi-touch tangible interface featuring a collaborative The system extends the queries adding synonyms and con- natural interaction application. cept specializations through ontology reasoning and the use of WordNet. As an example consider the query “Find 2. The System shots with vehicles”: the concept specializations expansion The search engine The Orione search engine permits dif- through inference over the ontology structure permits to re- ferent query modalities (free text, natural language, graph- trieve the shots annotated with vehicle, and also those an- ical composition of concepts using Boolean and tempo- notated with the concept’s specializations (e.g. trucks, cars, ral relations and query by visual example) and visualiza- etc.). In particular, WordNet query expansion, using syn- tions, resulting in an advanced tool for retrieval and explo- onyms, is required when using free-text queries, since it is ration of video archives for both technical and non-technical not desirable to force the user to formulate a query selecting users. It uses an ontology that has been created semi- only the terms from a predefined lexicon. automatically from a flat lexicon and a basic light-weight Automatic video annotations are provided by a system ontology structure, using WordNet to create concept rela- based on the bag-of-words approach, that exploits SIFT, tions (is a, is part of and has part) as shown, in a frag- SURF and MSER visual features and the Pyramid Match ment, in Fig. 1. The ontology is modelled following the Kernel [5]; these annotations can be complemented by man- Dynamic Pictorially Enriched Ontology model [3], that in- ual annotations added through a web-based interface. cludes both concepts and visual concept prototypes. These The web-based user interface The web-based Sirio prototypes represent the different visual modalities in which search system3 , based on the Rich Internet Application a concept can manifest; they can be selected by the users paradigm (RIA), does not require any software installation, to perform query by example, using MPEG-7 descriptors (e.g. Color Layout and Edge Histogram) or other domain 3 http://shrek.micc.unifi.it/im3i/
  • 3. Figure 3. The Sirio web-based user interfaces: left) simple user interface; right) advanced user inter- face with ontology graph and geo tagging. Figure 2. Manual video annotation system: adding geographical annotations to a visual concept Figure 4. Browsing videos: select concepts in the tag cloud, then browse the video is extremely responsive. It is complemented by a system archive or start a search from a concept of for manual annotation of videos (Fig. 2), developed with the the ontology. aim of creating, collaboratively, manual annotations e.g. to zoom in each result displaying it in a larger player that provide geographical annotations and ground truth annota- shows more details on the video metadata and allows bet- tions that can be used for correcting and integrating auto- ter video inspection. The extended video player allows also matic annotations or for training and evaluating automatic to search for visually similar video shots. Furthermore an- video annotation systems. Sirio is composed by two dif- other interface (Andromeda), integrated with Sirio, allows ferent interfaces shown in Fig. 3: a simple search interface to browse video archives navigating through the relations with only a free-text field for Google-like searches, and an between the concepts of the ontology and providing direct advanced search interface with a GUI to build composite access to the instances of these concepts; this functionality queries that may include Boolean and temporal operators, is useful when a user does not have a clear idea regarding metadata (like programme broadcast information and geo the query that he wants to make. tags) and visual examples. The GUI interface allows also to inspect and use a local view of the ontology graph, when This interface is based on some graphical elements typ- building queries, to better understand how one concept is re- ical of web 2.0 interfaces, such as the tag cloud. The user lated to the others and thus suggesting to the users possible starts selecting concepts from a “tag cloud”, than navigates changes of the composition of the query. the ontology that describes the video domain, shown as For each result of the query it is shown the first frame a graph with different types of relations, and inspects the of the video clip. These frames are obtained from the video video clips that contain the instances of the concepts used streaming server, and are shown within a small video player. as annotations (Fig. 4). Users can select a concept from the Users can then play the video sequence and, if interested, ontology graph to build a query in the advanced search in- terface at any moment.
  • 4. The tangible user interface MediaPick is a system that allows semantic search and organization of multimedia con- tents via multi-touch interaction. It has an advanced user centered interaction design, developed following specific usability principles for search activities [1], which allows users collaboration on a tabletop [9] about specific topics that can be explored, thanks to the use of ontologies, from general to specific concepts. Users can browse the ontol- ogy structure in order to select concepts and start the video retrieval process. Afterwards they can inspect the results re- turned by the Orione video search engine and organize them according to their specific purposes. Some interviews to po- tential end-users have been conducted with the archivists of RAI, the Italian public broadcaster, in order to study their workflow and collect suggestions and feedbacks; at present RAI journalists and archivists can search the corporate dig- ital libraries through a web-based system. This provides a simple keyword based search on textual descriptions of the archived videos. Sometimes these descriptions are not very detailed or very relevant to the video content, thus making the document difficult to find. The cognitive load required for an effective use of the system often makes the journalists Figure 5. MediaPick system: top) concepts delegate their search activities to the archivists that could view: a) the ontology graph; b) controller be not familiar with the specific topic and therefore could module; bottom)results view: a) video players hardly choose the right search keyword. The goal of the and user’s result picks; b) controller module MediaPick design is to provide to the broadcast editorial and results list staff an intuitive and collaborative interface to search, vi- able to change the state of the interface and go to the re- sualize and organize video results archived in huge digital sults view. Also the results view shows the controller, to libraries with a natural interaction approach. let the user to select the archived concepts and launch the query against the database. The returned data are then dis- The user interface adopts some common visualization played within a horizontal list (Fig. 5 bottom), in a new vi- principles derived from the discipline of Information Visu- sual component placed at the top of the screen, which con- alization [4] and is equipped with a set of interaction func- tains returned videos and their related concepts. Each video tionalities designed to improve the usability of the system element has three different states: idle, playback and in- for the end-users. The GUI consists of a concepts view formation. In the idle state the video is represented with a (Fig. 5, top), to select one or more keywords from an on- keyframe and a label visualizing the concept used for the tology structure and use them to query the digital library, query. During the playback state the video starts playing and a results view (Fig. 5, bottom), which shows the videos from the frame in which the selected concept was annotated. returned from the database, so that the user can navigate and A longer touch of the video element activates the informa- organize the extracted contents. tion state, that shows a panel with some metadata (related The concepts view consists of two different interactive concepts, quality, duration, etc.) over the video. elements: the ontology graph (Fig. 5, top), to explore At the bottom of the results list there are all the concepts the concepts and their relations, and the controller module related to the video results. By selecting one or more of (Fig. 5, bottom), to save the selected concepts and switch these concepts, the video clips returned are filtered in order to the results view. The user chooses the concepts as query to improve the information retrieval process. The user can from the ontology graph. Each node of the graph consists select any video element from the results list and drag it out- of a concept and a set of relations. The concept can be se- side. This action can be repeated for other videos, returned lected and then saved into the controller, while a relation by the same or other queries. Videos placed out of the list can be triggered to list the related concepts, which can be can be moved along the screen, resized or played. A group expanded and selected again; then the cycle repeats. The of videos can be created by collecting two or more video related concepts are only shown when a precise relation is elements in order to define a subset of results. Each group triggered, in order to minimize the number of visual ele- can be manipulated as a single element through a contextual ments present at the same time in the interface. After sav- menu: it can be expanded to show a list of its elements or ing all the desired concepts into the controller, the user is released in order to ungroup the videos.
  • 5. Streaming media server, as well as the rendering of the GUI elements Core logic media server on the screen (Fig. 6). The input management module is Gesture driven by the TUIO dispatcher: this component is in charge framework Semantic of receiving and dispatching the TUIO messages sent by video search engine the multi-touch overlay to the gesture framework through Multi-touch screen Server socket the server socket (Fig. 6). This module is able to manage the events sent by the input manager, translating them into commands for the gesture framework and core logic. Multi-touch input Generic input The logic behind the multi-touch interfaces needs a dic- manager manager tionary of gestures which users are allowed to perform. It is possible to see each digital object on the surface like an ac- XML XML tive touchable area; for each active area a set of gestures is configuration configuration defined for the interaction, thus it is useful to link each touch with the active area in which it is enclosed. For this reason each active area has its own set of touches and allows the Figure 6. Multitouch interface system archi- gesture recognition through the interpretation of their asso- tecture ciated behavior. All the user interface actions mentioned 3. Architecture above are triggered by natural gestures shown in the Tab. 1. The system backend and the search engine are currently Gesture Actions based on open source tools (i.e. Apache Tomcat and Red 5 video streaming server) or freely available commercial - Select concept - Trigger the controller module tools (Adobe Media Server has a free developer edition). Single tap switch Videos are streamed using the RTMP video streaming pro- - Select video group contextual tocol. The search engine is developed in Java and supports menu options multiple ontologies and ontology reasoning services. Ontol- - Play video element ogy structure and concept instances serialization have been - Trigger video information designed so that inference can be execute simultaneously Long pressure touch state on multiple ontologies, without slowing the retrieval; this - Show video group contextual design allows to avoid the need of selecting a specific on- menu tology when creating a query with the Google-like interface. Drag - Move video element The engine has also been designed to fit into a service ori- - Resize video Two-fingers drag ented architecture, so that it can be incorporated into the - Group videos customizable search systems, other than Sirio and Medi- aPick, that are developed within IM3I and euTV projects. Table 1. Gesture/Actions association for the Audio-visual concepts are automatically annotated using ei- multi-touch user interface ther IM3I and euTV automatic annotation engines. The search results are produced in RSS 2.0 XML format, with 4. Usability test paging, so that they can be used as feeds by any RSS reader The video search engine (Sirio+Orione) has been tested tool and it is possible to subscribe to a specific search. Both to evaluate the usability of the system, in a set of field the web-based interface and the multi-touch interface have trials. A group of 19 professionals coming from broad- been developed in Flex+Flash, according to the Rich Inter- casting, media industry, cultural heritage institutions and net Application paradigm. video archives in Italy, The Netherlands, Hungary and Ger- Multitouch MediaPick exploits a multi-touch technology many have tested the system on-line (running on the MICC chosen among various approaches experimented in our lab servers), performing a set of pre-defined queries and activ- since 2004 [2]. Our solution uses an infrared LED array as ities, and interacting with the system. The methodology an overlay built on top of an LCD standard screen (capable used follows the practices defined in the ISO 9241 stan- of a full-HD resolution). The multi-touch overlay can de- dard, and gathered: i) observational notes taken during test tect fingers and objects on its surface and sends information sessions by monitors, ii) verbal feedback noted by test mon- about touches using the TUIO [6] protocol at a rate of 50 itor and iii) an online survey completed after the tests by all packets per second. MediaPick architecture is composed by the users. The observational notes and verbal feedbacks of an input manager layer that communicates trough the server the users have been analyzed to understand the more critical socket with the gesture framework and core logic. The lat- parts of the system and, during the development of the IM3I ter is responsible of the connection to the web services and project, have been used to redesign the functionalities of the
  • 6. Figure 7. Overview of usability tests: overall usability of the system, usability of the combination of search modalities. system. Fig. 7 summarizes two results of the tests. The [4] S. K. Card, J. D. Mackinlay, and B. Shneiderman, editors. overall experience is very positive and the system proved to Readings in information visualization: using vision to think. be easy to use, despite the objective difficulty of interacting Morgan Kaufmann Publishers Inc., 1999. [5] K. Grauman and T. Darrell. The pyramid match kernel: Ef- with a complex system for which the testers received only ficient learning with sets of features. Journal of Machine a very limited training. Users appreciated the combination Learning Research (JMLR), 8:725–760, 2007. of different interfaces. The type of interaction that proved [6] M. Kaltenbrunner, T. Bovermann, R. Bencina, and to be more suitable for the majority of the users is the ad- E. Costanza. Tuio a protocol for table-top tangible user in- vanced interface, because of its many functionalities. terfaces. In Proc. of International Workshop on Gesture in Human-Computer Interaction and Simulation, 2005. 5. Conclusions sc [7] A. Natsev, J. Smith, J. Teˇi´, L. Xie, R. Yan, W. Jiang, and M. Merler. IBM Research TRECVID-2008 video retrieval In this paper we have presented two semantic video system. In Proc. of TRECVID Workshop, 2008. search systems based on web and multi-touch multi-user [8] C. Shen. Multi-user interface and interactions on direct- interfaces, developed within three EU funded research touch horizontal surfaces: collaborative tabletop research at projects and tested by a large number of professionals in merl. In Proc. of IEEE International Workshop on Horizon- real-world field trials. Our future work will deal with fur- tal Interactive Human-Computer Systems, 2006. [9] C. Shen. From clicks to touches: Enabling face-to- ther development of the interfaces, especially considering face shared social interface on multi-touch tabletops. In the new HTML5 technologies, extensive testing of the tan- D. Schuler, editor, Online Communities and Social Com- gible user interface and thorough comparison of the two puting, volume 4564 of Lecture Notes in Computer Science, systems. pages 169–175. Springer Berlin / Heidelberg, 2007. [10] A. Smeaton, P. Over, and W. Kraaij. High-level feature de- Acknowldgements The authors thank Giuseppe Becchi for tection from video in TRECVid: a 5-year retrospective of his work on software development. This work was partially sup- achievements. Multimedia Content Analysis, Theory and Ap- ported by theprojects: EU IST VidiVideo (www.vidivideo.info - plications, pages 151–174, 2009. contract FP6-045547), EU IST IM3I (http://www.im3i.eu/ - con- [11] C. Snoek, K. van de Sande, O. de Rooij, B. Huurnink, J. van tract FP7-222267) and EU IST euTV (http://www.eutvweb.eu/ - Gemert, J. Uijlings, J. He, X. Li, I. Everts, V. Nedovi´, c contract FP7-226248). M. van Liempt, R. van Balen, F. Yan, M. Tahir, K. Miko- References lajczyk, J. Kittler, M. de Rijke, J. Geusebroek, T. Gevers, M. Worring, A. Smeulders, and D. Koelma. The MediaMill TRECVID 2008 semantic video search engine. In Proc. of [1] R. S. Amant and C. G. Healey. Usability guidelines for in- TRECVID Workshop, 2008. teractive search in direct manipulation systems. In Proc. of [12] C. G. M. Snoek, K. E. A. van de Sande, O. de Rooij, the International Joint Conference on Artificial Intelligence, B. Huurnink, J. R. R. Uijlings, M. van Liempt, M. Bugalho, volume 2, pages 1179–1184, San Francisco, CA, USA, 2001. I. Trancoso, F. Yan, M. A. Tahir, K. Mikolajczyk, J. Kit- Morgan Kaufmann Publishers Inc. tler, M. de Rijke, J.-M. Geusebroek, T. Gevers, M. Worring, [2] S. Baraldi, A. Bimbo, and L. Landucci. Natural interaction D. C. Koelma, and A. W. M. Smeulders. The MediaMill on tabletops. Multimedia Tools and Applications (MTAP), TRECVID 2009 semantic video search engine. In Proc. of 38:385–405, July 2008. [3] M. Bertini, A. Del Bimbo, G. Serra, C. Torniai, R. Cuc- TRECVID Workshop, November 2009. [13] E. Tse, S. Greenberg, C. Shen, and C. Forlines. Multimodal chiara, C. Grana, and R. Vezzani. Dynamic pictorially en- multiplayer tabletop gaming. Computers in Entertainment riched ontologies for digital video libraries. IEEE MultiMe- (CIE), 5(2), April 2007. dia, 16(2):42–51, Apr/Jun 2009.