The effect of films with and without subtitles on listening
Reference 5
1. Semantic annotation and retrieval
of documentary media objects
Dimitris Kanellopoulos
Educational Software Development Laboratory, Department of Mathematics,
University of Patras, Rio Patras, Greece
Abstract
Purpose – This paper aims to propose a system for the semantic annotation of audio-visual media
objects, which are provided in the documentary domain. It presents the system’s architecture, a
manual annotation tool, an authoring tool and a search engine for the documentary experts. The paper
discusses the merits of a proposed approach of evolving semantic network as the basis for the
audio-visual content description.
Design/methodology/approach – The author demonstrates how documentary media can be
semantically annotated, and how this information can be used for the retrieval of the documentary
media objects. Furthermore, the paper outlines the underlying XML schema-based content description
structures of the proposed system.
Findings – Currently, a flexible organization of documentary media content description and the
related media data is required. Such an organization requires the adaptable construction in the form of
a semantic network. The proposed approach provides semantic structures with the capability to
change and grow, allowing an ongoing task-specific process of inspection and interpretation of source
material. The approach also provides technical memory structures (i.e. information nodes), which
represent the size, duration, and technical format of the physical audio-visual material of any media
type, such as audio, video and 3D animation.
Originality/value – The proposed approach (architecture) is generic and facilitates the dynamic use
of audio-visual material using links, enabling the connection from multi-layered information nodes to
data on a temporal, spatial and spatial-temporal level. It enables the semantic connection between
information nodes using typed relations, thus structuring the information space on a semantic as well
as syntactic level. Since the description of media content holds constant for the associated time
interval, the proposed system can handle multiple content descriptions for the same media unit and
also handle gaps. The results of this research will be valuable not only for documentary experts but for
anyone with a need to manage dynamically audiovisual content in an intelligent way.
Keywords Documentary, Semantic annotation, Video, Temporal and spatial levels of audiovisual data,
Content management, Audiovisual media, Multimedia
Paper type Research paper
1. Introduction
In the last few years, the general public’s interest in documentaries has grown
enormously. A documentary is the presentation of factual events, often consisting of
footage recorded at the time and place of their occurrence and generally accompanied by
a narrator (Rosenthal and Corner, 2005). Documentary is a media work category, applied
to photography, film and television. It has been developed internationally across a wide
range of formats, including the use of dramatization, observational sequences and
various combinations of interview material with images that portray the real with
deferent degrees of referentiality and aesthetic crafting. Documentaries often depict
various important topics (e.g. animal life, historical events, tourist attractions etc) by
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0264-0473.htm
Documentary
media objects
721
Received October 2011
Revised February 2012
Accepted March 2012
The Electronic Library
Vol. 30 No. 5, 2012
pp. 721-747
q Emerald Group Publishing Limited
0264-0473
DOI 10.1108/02640471211275756
2. mixing photos and videos with commentaries and opinions from experts. All these
elements are organized in narrative form. The definition of documentary often
undertakes a discursive path. Two factors play consistently in various definitions:
(1) reality is captured in some forms of documents; and
(2) the documents are subjected to assemblage to serve a larger context.
For the definition of documentary, we adopt the simplest task definition, that of Vertov:
“to capture fragments of reality and combine them meaningfully” (Barnouw, 1993, p. 55).
It can be said that making documentaries is not a piece of science. Documentaries can
relate data from science, but they are not scientific reports. They mix science, narrative,
images, while the filmmakers’ point of view affects the way these are mixed. For
example, a travel documentary is a documentary film (or television program) that
describes travel or tourist attractions in a non-commercial way. It is not a scientific report
but it is based on knowledge about tourist attractions. A representative travel
documentary is Word Travels (IMDb, n.d.) that follows the lives of two young
professional travel writers (Robin Esrock and Julia Dimon), as they journey around the
world in search of stories to experience, write about, and file for their editors.
According to Nichols (2001) in documentary film and video, we can identify six
modes of representation that function something like sub-genres of the documentary
film genre itself: poetic, expository, participatory, observational, reflexive, and
performative. Table I shows the main characteristics and deficiencies of these
documentary modes.
Modern lightweight digital video cameras and computer based-editing have really
aided documentary makers. The first film to take full advantage of this change was
Martin Kunert and Eric Manes’ Voices of Iraq, where 150 digital video cameras were sent
to Iraq during the war and passed out to Iraqis to record themselves. Multimedia
technology allows text, graphics, photos, and audio to be transmitted effectively and
Documentary mode Main characteristics Deficiencies
Poetic documentary (1920s) Reassemble fragments of the
world poetically
Lack of specificity, too
abstract
Expository documentary (1920s) Directly address issues in the
historical world
Overly didactic
Observational documentary (1960s) Eschew commentary and
reenactment; observe things
as they happen
Lack of history, context
Participatory documentary (1960s) Interview or interact with
subjects; use archival film to
retrieve history
Excessive faith in witnesses,
naive history, too intrusive
Reflexive documentary (1980s) Question documentary form,
defamiliarize the other modes
Too abstract, lose sight of
actual issues
Performative documentary (1980s) Stress subjective aspects of a
classically objective discourse
Loss of emphasis on
objectivity may relegate such
films to the avant-garde;
“excessive” use of style
Table I.
Documentary modes
EL
30,5
722
3. rapidly across media platforms. Media organizations must cope with multimedia
changes that move exponentially to the next competing delivery device. Nowadays, there
is a potentially wide range of applications in the media domain such as search, filtering
of information, media understanding (surveillance, intelligent vision, smart cameras etc.)
or media conversions (speech to text, picture to speech, visual transcoding etc).
Understanding semantics and meaning of documentaries is directly needed (Choi, 2010).
Finding the bits of interest (the important part of a documentary) becomes increasingly
difficult, frustrating, and a time consuming task. Internet users need an intelligent search
engine for performing complex media search and help users finding media chunks based
on semantics in media itself (Dorai et al., 2002). However, media is so rich in its content
variety that it will never sufficiently be described by text or words (Dorai and Venkatesh,
2001). Besides, humans must take the time to annotate the media chunks.
Media information systems for documentaries should incorporate mechanisms that
interpret, manipulate and generate visual media as well as audible information. A
media infrastructure for documentaries should manipulate self-sufficient components
of documentaries, which can be used in any given production. In order to use such an
independent media item, it is required to extract the relationship between the signs of
the audio-visual information unit and the semantics they represent (Eco, 1997). As a
result, media information systems for documentaries such as Terminal_Time (Mateas,
2000) should manage independent media objects and their representations for use in
many different productions. Therefore, we need tools that utilize human actions to
extract the important syntactic, semantic and semiotics aspects of its content
(Brachman and Levesque, 1983) in order descriptions (based on a formal language) can
be constructed. The increasing amount of various documentaries and their
combinatorial use requires the annotation of media during their production.
Media annotation and querying for documentaries is still a major challenge, as the
gap between the documentary features and the existing media tools is wide. In the last
two decades, many authoring tools have been proposed for multimedia data (Tien and
Cecile, 2003; Ryn et al., 1989). These authoring tools are either application dependent or
provide insufficient authoring features. High-level annotation facilities like annotation
of objects, time, location, events etc can be provided by existing video annotation tools
such as Vannotator (Costa et al., 2002), IBM VideoAnnEx (IBM, n.d.), ELAN (The
Language Archive, n.d.), CAVIAR (The University of Edinburgh, n.d.), and ViPER-GT
(Sourcegorge.net, n.d.). Rincon and Martinez-Cantos (2007) describe a video annotation
tool (called AVISA) for video understanding. They analyze the features that must be
present in a video annotation tool for video understanding. However, these features
need to be complemented with finer level annotation methods that are required for the
video documentaries. Automatic video generation systems use descriptions
(annotations) of the media items in order to make decisions about how to create a
video sequence. The structure of annotations is composed of two parts:
(1) The structure of the description (e.g. a documentary film can be described by
fields, such as title, director).
(2) The structure of the values used to fill the description (e.g. “The Civil War” can
be the value of the field title).
According to Bocconi et al. (2008) there are three different types of description
structures:
Documentary
media objects
723
4. (1) Keywords-based description structures (or K-annotations), in which each item is
associated with a list of words that represent the item’s content. Representative
video generation systems that use K-annotations are Lev Manovich’s Soft Cinema
(n.d.) and the Korsakow System (Korsakow, n.d.) , systems that edit in real-time
by selecting media items from a database. ConTour (Murtaugh, 1996) is another
indicative system that supports evolving documentaries, i.e. documentaries that
could incorporate new media items as soon as they were made.
(2) Properties-based description schemes (or P-annotations) in which items are
annotated with property-value pairs. Representative system of this category is
SemInfo (Little et al., 2002).
(3) Structure-based on relations (or R-annotations). Here, items are annotated with
property-value pairs as in P-annotations only that some of these values are
references to other annotations. A representative system is DISC (Geurts et al.,
2003), which is a multimedia presentation generation system for the domain of
cultural heritage. DISC uses the annotated multimedia repository of the
Rijksmuseum (n.d.) to create multimedia presentations.
Benitez et al. (2000) presented description schemes (DSs) for image, video, multimedia,
home media, and archive content proposed to the MPEG-7 standard. They used the
XML to illustrate and exemplify their description schemes by presenting applications
that already use the proposed structures. These applications are the visual apprentice,
the AMOS-search system, a multimedia broadcast news browser, a storytelling
system, and an image meta-search engine, MetaSEEk.
The AUTEUR system (Nack and Parkes, 1997) synchronizes automatic story
generation for visual media with the stylistic requirements of narrative and medium
related presentation. The AUTEUR system consists of an ontological representation of
narrative elements such as actions, events, and emotional and visual codes, based on a
semantic net of conceptual structures related via six types of semantic links
(e.g. synonym, sub-action, opposition, ambiguity, association, conceptual). A coherent
action-reaction dynamic is provided by the introduction of three event phases,
i.e. motivation, realization and resolution. The essential categories for the structures
are action, character, object, relative position, screen position, geographical space,
functional space and time. The textual representation of this ontology describes
semantic, temporal and relational features of video in hierarchically organized
structures, which overcomes the limitations of keyword-based approaches.
We believe that formal semantics can support the annotation, analysis, retrieval or
reasoning about multimedia assets in the documentary industry. The proliferation of
documentaries and their applications require media annotation that bridges the gap
between documentary technology and media semantics. In line with this, Dorai and
Venkatesh (2001, p. 10) state:
A serious need exists to develop algorithms and technologies that can annotate content with
deep semantics and establish semantic connections between media’s form and function, for
the first time letting users access indexed media and navigate content in unforeseeable and
surprising ways.
The aim of this paper is to propose an agent-oriented programming approach using a
framework for describing the inherent semantics of the documentaries pieces. In
EL
30,5
724
5. agent-oriented programming, agent-oriented objects typically have just one method,
with a single parameter. This parameter is a sort of message that is interpreted by the
receiving object, or “agent”, in a way specific to that object or class of objects.
Documentaries pieces are unique to video documentaries. For this reason, we have
created a domain specific representation for the documentary pieces to improve the
retrieval accuracy of the documentary video queries.
The remainder of the paper is structured as follows. In Section 2, we discuss issues
concerning documentary authoring, while in Section 3 we present the semantics of
documentary media. In Section 4 we describe the system architecture. In Section 5, we
present our approach for implementing the repository for documentaries; our semantic
network based approach for the data storage and management and we illustrate the
proposed XML schema-based representational structures. In Section 6, we explain the use
of the proposed system through the tools for annotation, semi-automatic authoring and
semantic retrieval that we have implemented for the documentary video environments.
Finally, in Section 7 we conclude the paper and give directions for further work.
2. Documentary authoring
The conventional understanding of documentary production involves a three-phase
workflow:
(1) pre-production;
(2) production; and
(3) post-production.
Figure 1 illustrates a traditional documentary production model.
The production model formalizes a cyclic process as opposed to a linear workflow.
Pre-production is a phase of research and ideation where visions are selectively audited
through sketches mostly in text and graphical forms. Production and Post-production
are the phases of iterative processes for gathering and assessing media resources.
Screening is a main method for assessment through daily production and plays an
important role in assessments of daily results and edited sequences, determining
further materials needed and methods for acquiring the materials. In particular, a
documentary screening is the displaying of a documentary referring to a special
showing as part of a documentary’s production and release cycle. The different types
of screenings follow here in their order within a documentary’s development:
(1) Test screening. For early edits of a documentary, informal test screenings are
shown to small target audiences to judge if a documentary will require editing,
reshooting or rewriting.
(2) Focus group screenings are formal test screenings of a documentary with very
detailed documentation of audience responses.
(3) Critic screenings are held for national and major market critics well in advance of
print and television production-cycle deadlines, and are usually by invitation only.
(4) Public preview screenings may serve as final test screenings used to adjust
marketing strategy (radio and TV promotion, etc) or the documentary itself.
(5) A sneak preview is an unannounced documentary screening before formal
release, generally with the usual charge for admission.
Documentary
media objects
725
6. Actually, media production for documentaries is a complex, resource demanding
process that provides a multidimensional network of relationships among the
multimedia information.
Documentary authoring is based on the fundamental processes of media or
hypervideo production. Aubert et al. (2008) identified these fundamental (or canonical)
processes that can be supported in semantically aware media production tools.
According to Aubert et al. (2008) these processes are:
.
Premeditate (1) Inscription of marks/organization/browsing. The premeditate
process takes place in every step of the authoring activity. Input: thoughts of the
author. Output: necessary schemas, annotations, queries or views.
.
Create (2) This process exploits existing audiovisual documents.
.
Package (3) Inscription of marks/organization/browsing. The metadata structure
and accompanying queries and views are present, and can be materialized
package.
.
Annotate (4) Inscription of marks. Creation of the annotations, with
spatio-temporal links to the media assets. Input: Media sources. Output:
annotation structure.
.
Query (5) Organization. Queries allow selecting appropriate annotations. Input:
basic elements. Output: basic elements matching a specify query.
.
Construct message (6) Organization. Structuration of the presentation of data.
Input: the ideas from the premeditate process, the annotation structure, queries.
Output: draft of views.
Figure 1.
Traditional documentary
production model
EL
30,5
726
7. .
Organize (7) Organization. Definition of views to render the selected annotations.
Input: basic elements. Output: view definitions.
.
Publish (8) Browsing, Publishing. Content packaging-publishing, means
generation of documents from the templates, occurs in the browsing phase
and also in the publishing phase. Input: basic elements. Output: a package and/or
rendered views.
.
Distribute (9) Browsing, Publishing. The rendition of view is currently done
through a standard web browser, or the instrumented video player integrated
into the prototype.
Hardman et al. (2008) identified a small set of canonical processes and specified their
inputs and outputs, but deliberately do not specify their inner workings, concentrating
rather on the information flow between them. Indicative examples of invoking
canonical processes are given in (Aubert et al., 2008). Currently, many standards
facilitate the exchange between the different media process stages (Pereira et al., 2008),
such as MXF (Media Exchange Format), AAF (Advance Authoring Format), MOS
(Media Object Server Protocol), and Dublin Core.
The process of documentary authoring can be arranged in three phases: modeling,
annotation and authoring of documentary media.
(1) The modeling phase identifies the various semantics that exist in the
documentary media.
(2) The annotation phase provides the human annotator the various utilities for the
free text representation of their perception of the documentary.
(3) The authoring phase is meant for the semiautomatic translation of the
annotated media information into XML, validated by the XML Schema
validation tools. Using XML technologies, the semantic multimedia content of
the documentary can be represented in an interoperable way. It is a good idea to
propose substantial customizations based on XML technologies for the
documentaries. Thus, the produced item will be an XML document that
represents the annotation of the real-time video documentary.
Documentary information systems must accommodate these three phases, providing a
common framework for the storage of the authored documentary and for its presentation
interface. Documentary analysis tools should perform the interpretation of
documentaries in the context of culture, mode of documentary, mode of speech, action,
gestures and emotions. Existing tools and systems provide annotation features for the
documentary videos often based on a particular type of documentary (Mateas, 2000). In
addition, they offer a limited number of annotation facilities, thus it becomes difficult to
derive generic facilities. These tools do not provide semiautomatic authoring, which is an
important requirement. It is worth mentioning that Bocconi et al. (2008) describe a model
for automatically generating video documentaries. This allows viewers to specify the
subject and the point of view of the documentary to be generated. However, the domain
of Bocconi et al. is matter-of opinion documentaries based on interview.
Agius and Angelides (2005) proposed the COSMOS-7 system that models the objects
along with a set of events in which the objects participate, as well as events along with a
set of objects and temporal relationships between the objects. This system/model
Documentary
media objects
727
8. represents the events at a higher level only like speak, play, listen and not at the level of
actions, gestures and movements. Harry and Angelides (2001) proposed a semantic
content-based model for semantic-level querying that makes full use of the explicit media
structure, objects, spatial relationships between objects, events and actions involving
objects, temporal relationships between events and actions, and integration between
syntactic and semantic information. Ramadoss and Rajkumar (2007) considered a system
for the semiautomatic annotation of an audio-visual media of dance domain, while Nack
and Putz (2004) presented a framework for the creation, manipulation, and
archiving/retrieval of media documents, applied for the domain of News. In the digital
games and entertainment industry, Burger (2008) stressed the importance of the use of
formal semantics (ontologies) by providing a potential solution based on semantic
technologies. AKTive Media (Chakravarthy et al., 2006) is an ontology-based cross-media
annotation (images and text) system. It includes an automatic process of annotation by
suggesting knowledge to the user in an interactive way while the user is annotating. This
system actively works in the background, interacting with web services and queries the
central annotational store to look for context specific knowledge. Chakravarthy et al.
(2009) present OntoFilm, a core ontology for film production. OntoFilm provides a
standardized model, which conceptualizes the domain and workflows used at various
stages of the film production process starting from pre-production and planning, shooting
on set, right through to editing and post-production.
In this paper, we propose a documentary video framework in order to incorporate
media semantics for documentaries. This framework provides the XML authored
content of the documentary from the supplied semantic and semiotic annotations by
the human annotators. The proposed requirements are:
(1) A layer oriented model depicting the documentary pieces as events, which
incorporates the gesture, actions and spatial-temporal relationships of the
subjects (e.g. documentarists) and objects in a documentary. Besides
documentary pieces, other examples for events are setup, background scene
change, role change by a documentarist.
(2) A semantic network representing the documentary, the individual documentary
pieces, besides the cognitive aspects, setting, cultural features and story.
(3) An annotation tool for the documentary experts to manually perform the
semantic and semiotic annotations of the documentary media objects like
documentary, documentarists etc.
(4) A semantic querying tool for the documentary experts and users/spectators to
browse and query the documentary media features for designing new
documentary sequences. Some examples of documentary media or video
queries are:
.
show me all the pieces of natural history documentaries from Africa;
.
tell me all documentary pieces where documentarist is in danger; and
.
find all historical documentary pieces representing the invasion of
Normandy etc.
The query engine should be assisted by proper representations so that the retrieved
result achieves high precision and high recall.
EL
30,5
728
9. 3. The semantics of documentary media
The spatial-temporal delivery of a sequence of the documentary pieces is recorded in a
documentary video, in which each documentary piece consists of a set of subject’s
actions. Each subject action denotes the action of the characters, such as commentarist,
speaker, interviewee etc. The action is represented as , subject-verb-object-adverb .
using verb-argument structure (Sarkar and Tripasai, 2002) that exists in Linguistics.
This section explains some of the characteristics of documentary media briefly.
Definition 3.1 (Documentary)
The documentary numbered i DCi;n
À Á
consists of a set of documentary video clips
Ci;j
À Á
performed at a particular setting. That is, DCi;n ¼ Ci;1; Ci;2; . . . ; Ci;n
È É
where n
is the total number of documentary clips. In this sense, the documentary DC2;3 ¼
C2;1; C2;2 C2;3
È É
denotes the second documentary that consists of three documentary
clips C2;1; C2;2; C2;3
À Á
. For example, if the second documentary DC2;3 is a travel
documentary and is presenting Holidays in Greece, then the three video clips could be
C2,1 ¼ Arriving at the airport of Athens, C2,2 ¼ Touring Athens and C2,3 ¼ Cruise in
the Rodos island.
Definition 3.2 (Documentary Clip)
A documentary clip Ci;j of the documentary DCi;n consists of a set of documentary
pieces (DP) that are performed by the documentarists. That is, Ci;j;m ¼
DPi;j1; DPi;j2; . . . ; DPi;jm
È É
where m is the total number of documentary pieces.
For example, the documentary clip C2;3;7 ¼
DP2;3;1; DP2;3;2; DP2;3;3; DP2;3;4; DP2;3;5; DP2;3;6; DP2;3;7
È É
denotes the third video clip
(in our example Cruise in the Greek islands) of the second documentary. This clip
includes seven documentary pieces: DP2,3,1, DP2,3,2, DP2,3,3, DP2,3,4, DP2,3,5, DP2,3,6,
DP2,3,7.
Definition 3.3 (Documentary Piece)
A documentary piece is the basic semantic unit of a documentary, which has a set of
subject’s actions that are performed either sequentially or concurrently by the subjects
(documentarists). It encapsulates the mood, genre, culture, and characters, apart from
the actions. A documentary piece DPi;j;k
À Á
of the video clip Ci, j represents a meaningful
sequence of subject’s (documentarist) actions (A). DPi;j;k ¼ A1; A2; . . .Akf g where k is
the total number of subject’s actions in this documentary piece. For example, the
documentary piece DP2;3;4 ¼ A1; A2; A3; A4f g denotes that piece of the third video clip
that (belongs to the second documentary) includes the first four sequential actions
A1; A2; A3; A4
À Á
performed by the subject (documentarist). In our example, these
actions could be:
A1: “The documentarist is visiting the main attractions of the Rodos island in
Greece”.
A2: “The documentarist is taking a swim”.
A3: “The documentarist is participating in the local festival”.
A4: “The documentarist is taking a taste of Rodos nightlife”.
Documentary
media objects
729
10. Definition 3.4 (Subject’s (documentarist) action)
The subject/documentarist’s action (A) is represented by an action of a character and is
defined as a tuple, , Agent-Action-Target-Speed . where agent and target are the
body-parts of the subject/object, action represents the static poses and gestures in the
universe of actions and speed denotes the speed of the delivery of the actions, that is
speed ¼ (low, medium, fast, gradual ascending, gradual descending). If only one agent
involves in an action, then it is called primitive action. That is, the target agent is empty
or Nil. For example, , documentaristi.larm move- nil-fast . shows that documentarist i
moves his left arm fast. If multiple agents involve in an action or gesture, then the action
is known as composite action. For instance, , Documentaristi.rhand – touch –
gorillaj.head – low . denotes that documentarist i touches the head of gorilla j slowly
with his right hand. The content representational structures for these documentary
media semantics are discussed in following sections.
4. The architecture for authoring and querying documentaries
The proposed system (shown in Figure 2) provides an environment supporting the
annotation, authoring, archiving and querying of the documentary media objects. The
aim is to apply the framework to all sorts of documentary types such as natural history
documentary, travel documentary etc.
The environment is based on various modules: annotation, archival, querying,
representation structures and the underlying database. The documentary experts
access each of these modules to carryout their specific tasks. It is essential for our
developments that these modules need to be easy and simple for use, thereby
minimizing the complexity of acquaintance with the system. The annotation module
takes the raw digital video as input and allows the human annotator to annotate the
different documentary media objects. The generated annotations are described in the
representational structures such as linked lists and hash tables. The authoring module
takes the annotations representing the documentary sequence and translates them into
XML instances automatically. The XML Schema instances that are instantiated by the
authoring module are stored in the back-end database. The query-processing module
allows the documentary experts to pose the different free-text documentary video
queries to the XML annotation, performs search using XQuery (after stemming,
Figure 2.
The architecture of the
proposed system
EL
30,5
730
11. removing the stop words and converting the tokens into XQuery form) and returns the
results of these queries back to the users. Based on the observation, we have identified
a set of required data structures and the associated relations and have developed tools
for accomplishing the documentary video tasks. Figures 3-5 depict the annotation,
query and semantic annotation processes correspondingly.
Figure 5.
The semantic annotation
process in a UML class
diagram
Figure 4.
The query process in a
UML class diagram
Figure 3.
The annotation process in
a UML class diagram
Documentary
media objects
731
12. 5. The model of semantics for documentary media
According to Nack and Putz (2004) annotation is a dynamic and iterative process, and
thus annotations should be incomplete and change over time. Consequently, it is
imperative to provide semantic representation schemes with the capability to change
and grow. In addition, the relation between the different types of structures should be
flexible and dynamic. To achieve this, media annotation should not result to a
monolithic document, rather it should be organized as a semantic network of content
description documents (Ramadoss and Rajkumar, 2007).
5.1 Layer oriented event description
In the design of the proposed system, we adopted the strata-oriented approach
(Aguierre Smith and Davenport, 1992) and setting (Parkes, 1989) for describing the
events such as documentary pieces. Strata oriented content modeling is an
important knowledge representation method and more suitable to model the events
of the documentary presentation. In our framework, each video documentary is
technically described using the size, duration, technical format of the material such
as such as mpg, avi etc. Therefore, each documentary can be represented partially
using technical details that belong to the layer of technical details. In addition, each
video documentary is conceptually annotated using high-level semantic descriptors
and thus it can be complementarily represented using such semantic descriptors
that belong to the layer of semantic annotations. The connection between the
different layers is accomplished by a triple , media identifier, start time, end
time . . The proposed representation structure includes many layers (one layer for
each description). The triple identifier is applied in order to be achieved the
connection between the different layers and the data to be described (e.g. the actual
audio, video, or audio visual stream). For instance, a documentarist may perform a
number of actions in the same time span. Start and end time can be used to identify
the temporal relation between the actions. Documentary pieces can be represented in
this way, thereby enabling semantic retrieval. Figure 6 depicts the layered
representation of a shot of 100 frames, representing three actions. Suppose a query
“find a documentary piece of a natural history documentary from Africa, where
documentarist is speaking and touching a gorilla, while gorilla is eating a banana”.
This question can be easily retrieved by isolating the common parts of the shot as
depicted in shaded portion of Figure 6. The temporal relationship between them can
be identified using the start and end point with which those actions are associated.
In this way, complex structured behavior concepts can be represented and hence the
audio-visual material retrieved on this basis.
Figure 6.
Layered annotation of
actions and isolated
segment of a shot a query
EL
30,5
732
13. 5.2 Nodes of the proposed framework
Nodes are used to build linked data structures concerning documentaries. Each node
contains some data and possibly links to other nodes. A node can be thought of as a
logical placeholder for some data. It is a memory block, which contains some data unit
and perhaps references to other nodes, which in turn contain data and perhaps references
to yet more nodes. Links between nodes are implemented by pointers or references. By
forming chains of interlinked nodes, very large and complex data structures concerning
documentaries can be formed. As a consequence, semantic structures of documentary’s
pieces can be implemented easily. In our framework, we distinguish two types of nodes,
i.e. data nodes (D-nodes) and conceptual annotation nodes (CA-nodes):
(1) A D-node represents physical audio-visual material of any media type, such as
text, audio, video, 3D animation, 2D image, 3D image, and graphic. The size,
duration, and technical format of the material is not restricted, nor are any
limitations present with respect to the content, i.e. number of persons, actions
and objects. A data node might contain a complete documentary film or merely
a scene. The identification of the node is realised via a URI.
(2) A CA-node provides high-level descriptions of a video documentary. A
high-level description is one that describes “top-level” goals, overall features of
a documentary, is more abstracted, and is typically more concerned with the
video documentary as a whole, and its goals. For example, the events occur in a
documentary (as well as the location, date and time of an event) can be
described by high-level descriptors. The mood (e.g. subjective content-happy,
sorrow, romantic etc) of a documentary and so many other features can also be
described by high-level descriptors. Such descriptors are usually difficult to
retrieve using automatic extraction methods. This type of nodes is usually
created manually.
Each node is best understood as an instantiated schema. The available number of node
schemata is restricted, thus indexing and classification can be performed in a
controlled way, whereas the number of provided nodes in the descriptional information
space might consist of just one node or up to n nodes. The obvious choice for
representing CA-nodes, each of them describing audiovisual content, would have been
using the DDL of MPEG-7 or suggested schemata by MPEG-7. The MPEG-7 standard
(Martinez et al., 2002; Salembier and Smith, 2002) concentrates on multimedia content
description and constitutes the greatest effort for multimedia description. It is based on
a set of XML Schemas that define 1,182 elements, 417 attributes and 377 complex
types. It is divided into four main components:
(1) the Description Definition Language (DDL, the basic building blocks for the
MPEG-7 metadata language);
(2) audio (the descriptive elements for audio);
(3) visual (those for video); and
(4) the Multimedia Description Schemes (MDS, the descriptors for capturing the
semantic aspects of multimedia contents, e.g. places, documentarists, objects,
events, etc).
Documentary
media objects
733
14. We do not choose using MPEG-7 because the main weakness of the MPEG-7 standard
is that formal semantics are not included in the definition of the descriptors in a way
that can be easily implementable in a system (Nack et al., 2005). Therefore, we chose to
use XML Schema as a representational scheme for the documentary media due to its
simplicity and maturity. The use of XML technologies implies that a great part of the
semantics remains implicit. Therefore, each time an application is developed;
semantics must be extracted from the standard and re-implemented.
For our documentary media environment, we have developed a set of 14 schemata
that describe the denotative and technical content of the documentary video. The
schemata are designed such a way that they are semi-automatically instantiated or
authored. These are shown in Table II.
The XML schema representation of the 14 schemes can be found in Subsection 5.4.
With these schemes one can perform the browse (e.g. documentary, actions,
documentarists, documentary piece, culture, objects etc) and semantic search (e.g. show
me all natural history documentary pieces).
5.3 Relationships
In our framework, all metadata about the actual audio and video streams of the
documentary are organized in the form of a semantic network. A semantic network is a
network that represents semantic relations among concepts. This is often used as a
form of knowledge representation and it is a directed or undirected graph consisting of
vertices, which represent concepts, and edges. Figure 7 depicts a possible semantic net
of a documentary annotation.
From this figure, we can also understand the two ways of annotating documentary
data, based on the requirements of the documentary expert.
Schema Description
Documentary High-level organizational scheme of a documentary presentation
containing all documentary clips
Documentary Clip High-level scheme of a documentary consisting of all annotations and
relations to other clips
Documentary Piece An event representing a meaningful collection of the actions of
documentarists
Subject/Documentarist’s
Action
The basic pose, gesture or action done by the documentarist
Event The event that occurs in a documentary clip
Person Person participating in a documentary, e.g. documentarists,
interviewees, narrators, speakers
Emotion Subjective content like mood or feeling etc
Setting The location, date time of an event
LifeSpan Duration with start and end times
Relation Between documentary media elements
STRelation Spatial-temporal relationships of the documentarist
Link Connections between the media source and the document schemes
Resource Relation to any URI address
Basic Info Basic information about the documentary such as language, video
type, recording information, archive information, access rights etc
Table II.
Schemata for
documentaries
EL
30,5
734
15. (1) either as part of a documentary; or
(2) as a single documentary clip representing one documentary.
Annotation networks of a documentary, clip, documentary piece, media source can be
interconnected together with the links and relations. There are two types of
connections among the nodes:
(1) Link type: to connect media source and description nodes (represented using
arrow).
(2) Relation type: to connect different annotation nodes (represented using line).
Link connects the media source (audio and video files) to the data node along with its
life spans (i.e. on a temporal level). The XML schema representation of Link type is
shown below.
5.4 Description schemes for documentaries in XML Schema
The XML schema representation of the relation types is presented hereafter (Figures 8-10).
In our environment, DocumentaryDS and DocumentaryClipDS hold link types,
enabling connections to the documentary video and audio sources. Note that, these two
description schemes serve as an entry point to the semantic network. Our front-end
annotation tool performs the semiautomatic instantiation of links. Relation types
perform the connection among the description schemes that are represented as
CA-nodes. Between two nodes, there may exist up to m relationships and we define the
following relations for our documentary media environment.
.
For events: follows, precedes.
.
For character, setting, object: part of, association, before, equal, meets, overlaps,
during, starts, finishes.
.
For documentary pieces: we propose two temporal semantic relationships for the
documentary pieces: follows and precedes.
These temporal semantic relationships help to infer the type of documentary during
query processing. In our environment, relationships are instantiated
Figure 7.
A semantic net of a
documentary annotation
Documentary
media objects
735
16. semi-automatically by the tool. We now introduce our documentary annotation and
querying tool to instantiate the description schemes that have been designed based on
the concepts of semantic net. Also, we then introduce our search engine that allows the
users to browse and query the documentary features for composing new
documentaries and for learning purposes.
Figure 8.
EL
30,5
736
19. 6. Tools for documentaries
6.1 Annotation and authoring tool
Documentary experts can annotate the documentary or clip by looking at the running
video and using the annotation tool. The video player provides all the standard
facilities like play, start, stop, pause and replay. We used the Cinepak codec for the
conversion of the running video (WinAmp media file) to AVI format. The annotation
tool provides to the documentary experts the facility to annotate the documentary
pieces using free-text and controlled vocabulary independently on the storage
organization of the generated annotations. We developed the annotation tool by using
J2SE1.5 and Java Media Framework 2.0. Figure 11 depicts the GUI of the initial screen
for determining the documentary information.
It is noteworthy that a documentary, a documentary clip constitutes an entry point
to the annotation. The annotation process begins by the documentary expert with
describing the metadata about the documentary. The basic metadata (descriptions)
those are common for all documentaries are shown in Table III.
Once the annotation of the documentary has been completed, the documentary
expert can describe individual documentary presentations that are part of that
documentary. We have identified a set of features that correspond to a documentary
clip as depicted in Table IV. The metadata describing a documentary piece that can
be annotated through the annotation tool are as follows (Table V). The metadata
about the person, object and basic media info are shown in Tables VI-VIII,
respectively.
Figure 11.
A snapshot of the
annotation tool for
determining the
documentary information
Documentary
media objects
739
20. The semi-automated editing suite (Figure 12) provides the documentary expert with an
instant overview of the available material and its essential relations represented
through the spatial order of its presentation. The documentary expert can mark the
relevant video clips or pieces by pointing at the preferred clips or pieces. The order of
pointing indicates the sequential appearance of the clips or pieces. The editing suite
based on a simple planner performs an automated composition of the documentary
clip. At the present stage of development our editing suite uses the meta-information
obtained from the annotation tool to support the video editing process.
Documentary piece Description
MoodID Subjective content-happy, sorrow, romantic, etc
Culture Indian, western, etc documentary pieces
Genre Such as poetic, expository, observational participatory, reflexive,
performative
Mode of documentary speech Commentary speech, presenter speech, interview speech in shot,
overhead interchange, dramatic dialogue
Object Background and foreground objects used in a documentary piece
Action Spatial-temporal actions, gestures, poses of the characters
Agent Body parts involved
Related action Associated action
Target Target body part of the opponent if any
Speed Slow, medium, fast, gradual ascending, gradual descending
Life span Duration of the documentary piece
Table V.
Metadata of a
documentary piece
Documentary clip Description
Character name, role, gender,
life span
Role played by the documentarist such as commentarist, presenter etc.
Life span of the character is necessary. Because several roles by the
same documentarist in a documentary clip are possible
Context Identifies whether it is a historical, travel or documentary without
words etc
Documentary genre Such as poetic, expository, observational participatory, reflexive,
performative
Language Language used by the documentarists in the audio. Several languages
may be used in the same documentary
Life span Duration of the documentary clip
Table IV.
Metadata of a
documentary clip
Documentary Description
Date and time Date and time of video recording of the documentary
Media locator Links to video and audio streams
Media format Format of the video such as mpg, avi etc
Media type Type of the media like video, audio, text etc
Title Name of the documentary
Origin Originating country of the documentary
Duration Life span, i.e. length of the documentary in minutes
Table III.
Metadata of a
documentary
EL
30,5
740
21. 6.2 Search engine
The search engine facilitates the documentary experts to design a new documentary
and users to view the documentary pieces themselves. In particular, user can search in
many dimensions for specific documentary pieces belonging to a video clip. For
example, user can search for all documentary pieces denoting specific objects such as
sun, moon etc. In addition, user can search for certain subject’s actions incorporated
into documentary pieces. Furthermore, user can search for documentary pieces, where
the subject (e.g. documentarist) has certain mood (happy, angered etc). In another case,
user can search for documentary pieces, in which the speed of the delivery of subject’s
actions are low, or medium or fast or gradual ascending or gradual descending. User
can also search for documentary pieces in which a “specific” song is played. Finally,
user can use this search engine as a browsing tool with several built in categories of the
documentary information and as a query tool to pose free text documentary queries.
The retrieval tool facilitates several browsing features for the users. These are:
Documentary To browse all documentary clips along with their video of the
documentary pieces. Output is rendered in the output window.
Documentary clip To view all documentary pieces of a clip.
Documentary piece To view all subject/documentarist actions of a particular clip.
Objects Displays all documentary pieces denoting sun, moon, etc.
Tempo Users can browse the documentary pieces according to the
speed categories.
Person Description
Name Name of the person
Function Commentarist, speaker, interviewee
E-mail Contact details
Table VI.
Metadata of persons
Object Description
Name Name of the background or foreground object
Type Background or foreground object
Number of Number of objects
Shape Shape of the object (in text)
Color Color of the object (in text)
Texture Pattern
Table VII.
Metadata of objects
Basic information Description
Recording speed Speed of recording
Camera details Description of the camera used while recording the documentary
Access rights Access information
Table VIII.
Metadata of media
Documentary
media objects
741
22. Mood To browse according to the feeling like happy, romantic, etc.
Culture Indian, western, etc.
Documentarist All documentary pieces that are part of a documentarist.
Genre Poetic, expository, observational, participatory, reflexive,
performative, etc.
Speech mode Commentary speech, presenter speech, interview speech in
shot, overhead interchange, dramatic dialogue.
Actions View by specific actions.
Song View documentary pieces of a song.
Documentary users/spectators can submit their documentary queries in the query
window using keywords as free text. For example, consider the query Q: Show me all
pieces of natural history documentaries. Our framework uses a semantic information
retrieval mechanism, which is similar to that presented in Chen et al. (2010). The use of
semantic information, especially which derived from spatio-temporal analysis is of great
value in multimedia annotation, archiving and retrieval. Ren et al. (2009) survey the use
of spatiotemporal semantic knowledge for information-based video retrieval and draw
important conclusions on where future research is headed. Liu and Chen (2009) present a
novel framework for content-based video retrieval. They use an unsupervised learning
Figure 12.
The semi-automated
editing suite for
documentary clips
EL
30,5
742
23. method to automatically discover and locate the object of interest in a video clip. This
unsupervised learning algorithm alleviates the need for training a large number of object
recognizers. Regional image characteristics are extracted from the object of interest to
form a set of descriptors for each video. A novel ensemble-based matching algorithm
compares the similarity between two videos based on the set of descriptors each video
contains. Videos containing large pose, size, and lighting variations are used to validate
their approach. Finally, Chen et al. (2010) developed a semantic-enable information
retrieval mechanism that handles the processing, recognition, extraction, extensions and
matching of content semantics to achieve the following objectives to:
.
analyze and determine the semantic features of content, to develop a semantic
pattern that represents semantic features of the content, and to structuralize and
materialize semantic features;
.
analyze user’s query and extend its implied semantics through semantic
extension so as to identify more semantic features for matching; and
.
generate contents with approximate semantics by matching against the
extended query to provide correct contents to the querist.
This mechanism is capable of improving the traditional problem of keyword search
and enables the user to perform a semantic-based query and search for the required
information, thereby improving the reusing and sharing of information.
7. Future work: an ontology for video documentaries
Multimedia ontologies (especially MPEG-7-based ontologies) have the potential to
increase the interoperability of applications producing and consuming multimedia
annotations. Hunter (2003) provided the first attempt to model parts of MPEG-7 in
RDFS, later integrated with the ABC model. Tsinaraki et al. (2004) start from the core
of this ontology and extend it to cover the full Multimedia Description Scheme (MDS)
part of MPEG-7, in an OWL DL ontology. Isaac and Troncy (2004) proposed a core
audio-visual ontology inspired by several terminologies such as MPEG-7, TV Anytime
or ProgramGuideML., while Garcia and Celma (2005) produced the first complete
MPEG-7 ontology, automatically generated using a generic mapping from XSD to
OWL. All these methods perform a one to one translation of MPEG-7 types into OWL
concepts and properties. This translation however does not guarantee that the intended
semantics of MPEG-7 is fully captured and formalized. On the contrary, the syntactic
interoperability and conceptual ambiguity problems remain.
A video documentary ontology can increase the interoperability of documentary
authoring tools. It can represent documentary concepts and their relationships that will
help to retrieve the required result. From another perspective, the application of
multimedia reasoning techniques on top of semantic multimedia annotations can enable a
multimedia authoring application more intelligent (Van Ossenbruggen et al., 2004).
Currently, we are engaged in representing the complete media semantics of a documentary
using the Web Ontology Language (OWL) (Smith et al., 2004). We aim to describe the video
documentary ontology. In the near future, we will examine how we can raise the quality of
documentary annotation and improve the usability of content-based video search and
retrieval systems. Figure 13 depicts a portion of our ontology for documentaries.
Documentary
media objects
743
24. 8. Conclusions
Tools for automatically understanding video are required in the documentary domain.
Semantics-based annotations will break the traditional linear manner of accessing and
browsing documentaries and will support vignette-oriented access of audio and video.
In this paper, we have presented a framework for the modeling, annotation, and
retrieval of media documents, applied to the domain of documentary. Using a basic set
of 14 semantic description schemes, we demonstrated how a documentary video can be
annotated and how this information can be used for the retrieval to support
documentary design. We emphasized tools and technologies for the manual annotation
of the documentary media objects. Flexible annotation facilities are required to
facilitate documentary creativity by way of semantic networks because the annotation
process is dynamic and annotations can grow over time. We have proposed a flexible
organization of media content description and the related media data. This
organization requires the adaptable construction in the form of a semantic network.
The proposed concept features three significant functions, which make it suitable as a
platform for supporting the needs of documentary production:
(1) It provides semantic and technical memory structures (i.e. information nodes)
with the capability to change and grow, allowing an ongoing task specific
process of inspection and interpretation of source material.
(2) Our approach facilitates the dynamic use of audio-visual material using links,
enabling the connection from multi-layered information nodes to data on a
temporal, spatial and spatial-temporal level. Moreover, since the description of
media content holds constant for the associated time interval, we are now in the
position to handle multiple content descriptions for the same media unit and
also to handle gaps.
(3) It enables the semantic connection between information nodes using typed
relations, thus structuring the information space on a semantic as well as syntactic
level.
We believe that our approach (audio-visual strategy) can be used for improving training
and education in documentary communication and to this end we have also indicated
future efforts to create an ontology for video documentaries with enhanced annotation.
Figure 13.
A part of the domain
ontology for
documentaries
EL
30,5
744
25. References
Agius, H. and Angelides, M. (2005), “COSMOS-7: video-oriented MPEG-7 scheme for modeling
and filtering of semantic content”, The Computer Journal, Vol. 48 No. 5, pp. 545-62.
Aguierre Smith, T.G. and Davenport, G. (1992), “The stratification system: a design environment
for random access video”, Proceedings of the ACM Workshop on Networking and
Operating System Support for Digital Audio and Video, San Diego, CA, Lecture Notes in
Computer Science, Vol. 712, Springer, Berlin, pp. 250-61.
Aubert, O., Champin, P.-A., Prie´, Y. and Richard, B. (2008), “Canonical processes in active reading
and hypervideo production”, Multimedia Systems Journal, Vol. 14 No. 6, pp. 427-33.
Barnouw, E. (1993), Documentary: A History of the Non-fiction Film, Oxford University Press,
Oxford.
Benitez, A., Paek, S., Chang, S.-F., Puri, A., Huang, Q., Smith, J., Li, C.-S., Bergman, L. and Judice,
C. (2000), “Object-based multimedia content description schemes and applications for
MPEG-7”, Signal Processing: Image Communication, Vol. 16 Nos 1/2, pp. 235-69.
Bocconi, S., Nack, F. and Hardman, L. (2008), “Automatic generation of matter-of-opinion video
documentaries”, Journal of Web Semantics, Vol. 6, pp. 139-50.
Brachman, R.J. and Levesque, H.J. (1983), Readings in Knowledge Representation, Morgan
Kaufmann, San Mateo, CA.
Burger, T. (2008), “The need for formalizing media semantics in the games and entertainment
industry”, Journal of Universal Computer Science, Vol. 14 No. 10, pp. 1775-91.
Chakravarthy, A., Ciravegna, F. and Lanfranchi, V. (2006), “Cross-media document annotation
and enrichment”, Proceedings of the 1st Semantic Authoring and Annotation Workshop
(SAAW 2006), Athens, GA, November 6.
Chakravarthy, A., Beales, R., Matskanis, N. and Yang, X. (2009), “OntoFilm: a core ontology for
film production”, in Chua, T.-S., Kompatsiaris, Y., Me´rialdo, B., Haas, W., Thallinger, G.
and Bailer, W. (Eds), Proceedings of the 4th International Conference on Semantic and
Digital Media Technologies (SAMT 2009), Lecture Notes in Computer Science, Vol. 5887,
Springer, Berlin, pp. 177-81.
Chen, M.-Y., Chu, H.-C. and Chen, Y.M. (2010), “Developing a semantic-enable information
retrieval mechanism”, Expert Systems with Applications, Vol. 37 No. 1, pp. 322-40.
Choi, I. (2010), “From tradition to emerging practice: a hybrid computational production model
for interactive documentary”, Entertainment Computing, Vol. 1 Nos 3/4, pp. 105-17.
Costa, M., Correia, N. and Guimaraes, N. (2002), “Annotations as multiple perspectives of video
content”, Proceedings of the ACM Conference on Multimedia, San Francisco, CA,
2-7 November, pp. 283-6.
Dorai, C. and Venkatesh, S. (2001), “Computational media aesthetics: finding meaning beautiful”,
IEEE Multimedia, Vol. 8 No. 4, pp. 10-12.
Dorai, C., Mauthe, A., Nack, F., Rutledge, L., Sikora, T. and Zettl, H. (2002), “Media semantics: who
needs it and why?”, Proceedings of Multimedia ’02, December 1-6, Juan-les-Pins, pp. 580-3.
Eco, U. (1997), A Theory of Semiotics, Macmillan, London.
Garcia, R. and Celma, O. (2005), “Semantic integration and retrieval of multimedia metadata”,
Proceedings of the Fifth International Workshop on Knowledge Markup and Semantic
Annotation, 7 November, Galway.
Geurts, J., Bocconi, S., van Ossenbruggen, J. and Hardman, L. (2003), “Towards ontology-driven
discourse: from semantic graphs to multimedia presentations”, in Fensel, D., Sycara, K.
and Mylopoulos, J. (Eds), Proceedings of the Second International Semantic Web
Conference (ISWC 2003), Sanibel Island, FL, 20-23 October, Springer, Berlin.
Documentary
media objects
745
26. Hardman, L., Obrenovic, Zˇ., Nack, F., Kerherve´, B. and Piersol, K. (2008), “Canonical processes of
semantically annotated media production”, Multimedia Systems, Vol. 14, pp. 327-40.
Harry, W.A. and Angelides, M.C. (2001), “Modeling content for semantic level querying of
multimedia”, Multimedia Tools and Applications, Vol. 15 No. 1, pp. 5-37.
Hunter, J. (2003), “Enhancing the semantic interoperability of multimedia through a core
ontology”, IEEE Transactions: Circuits and Systems for Video Technology, Vol. 13 No. 1,
pp. 49-58.
IBM (n.d.), “alphaWorks community, VideoAnnEx annotation tool”, available at: www.
alphaworks.ibm.com/tech/videoannex
IMDb (n.d.), “World Travels”, available at: www.imdb.com/title/tt1392723/
Isaac, A. and Troncy, R. (2004), “Designing and using an audio-visual description core ontology”,
paper presented at the Workshop on Core Ontologies in Ontology Engineering, 5-8
October, Whittlebury.
Korsakow (n.d.), “Korsakow system”, available at: www.korsakow.com/ksy/index.html
Little, S., Geurts, J. and Hunter, J. (2002), “Dynamic generation of intelligent multimedia
presentations through semantic inferencing”, Proceedings of the 6th European Conference
on Research and Advanced Technology for Digital Libraries, Pontifical Gregorian
University, Rome, Springer, Berlin.
Liu, D. and Chen, T. (2009), “Video retrieval based on object discovery”, Computer Vision and
Image Understanding, Vol. 113 No. 3, pp. 397-404.
Martinez, J., Koenen, R. and Pereira, F. (2002), “MPEG-7 – The generic multimedia content
description standard Part 1”, IEEE MultiMedia Magazine, Vol. 9 No. 2, pp. 78-87.
Mateas, M. (2000), “Generation of ideologically-biased historical documentaries”, Proceedings of
the 17th National Conference on Artificial Intelligence and Innovative Applications of
Artificial Intelligence Conference (AAAI-00), Austin, TX, pp. 36-42.
Murtaugh, M. (1996), “The automatist storytelling system”, PhD thesis, Massachusetts Institute
of Technology, available at: http://alumni.media.mit.edu/,murtaugh/thesis/
Nack, F. and Parkes, A. (1997), “Towards the automated editing of theme-oriented video
sequences”, Applied Artificial Intelligence, Vol. 11 No. 4, pp. 331-66.
Nack, F. and Putz, W. (2004), “Saying what it means: semi-automated (news) media annotation”,
Multimedia Tools and Applications, Vol. 22 No. 3, pp. 263-302.
Nack, F., Ossenbruggen, J.v. and Hardman, L. (2005), “That obscure object of desire: multimedia
metadata on the web (Part II)”, IEEE Multimedia, Vol. 12 No. 1, pp. 54-63.
Nichols, B. (2001), “What types of documentary are there?”, Introduction to Documentary,
Indiana University Press, Bloomington, IN, pp. 99-138.
Parkes, A.P. (1989), “Settings and the settings structure: the description and automated
propagation of networks for perusing videodisk image states”, in Belkin, N.J. and
Rijsbergen, C.J. (Eds), Proceedings of SIG Information Retrieval ’89, Cambridge, MA, ACM
Press, New York, NY, pp. 229-38.
Pereira, F., Vetro, A. and Sikora, T. (2008), “Multimedia retrieval and delivery: essential metadata
challenges and standards”, Proceedings of the IEEE, Vol. 96 No. 4, pp. 721-44.
Ramadoss, B. and Rajkumar, K. (2007), “Semi-automated annotation and retrieval of dance media
objects”, Cybernetics and Systems, Vol. 38 No. 4, pp. 349-79.
Ren, W., Singh, S., Singh, M. and Zhu, Y.S. (2009), “State-of-the-art on spatio-temporal
information-based video retrieval”, Pattern Recognition, Vol. 42 No. 2, pp. 267-82.
Rijksmuseum (n.d.), available at: www.rijksmuseum.nl
EL
30,5
746
27. Rincon, M. and Martinez-Cantos, J. (2007), “An annotation tool for video understanding”,
in Moreno-Dı´az, R., Pichler, F. and Quesada Arencibia, A. (Eds), Proceedings of the
11th International Conference on Computer Aided Systems Theory and Technology
(EUROCAST 2007), Las Palmas, 12-16 February, Lecture Notes in Computer Science,
Vol. 4739, Springer, Berlin, pp. 701-8.
Rosenthal, A. and Corner, J. (2005), New Challenges for Documentary, 2nd ed., Manchester
University Press, Manchester.
Ryn, J., Sohn, Y. and Kin, M. (1989), “MPEG-7 metadata authoring tool”, Proceedings of the ACM
Conference on Multimedia, pp. 267-70.
Salembier, P. and Smith, J. (2002), “Overview of MPEG-7 multimedia description schemes and
schema tools”, in Manjunath, B.S., Salembier, P. and Sikora, T. (Eds), Introduction to
MPEG-7: Multimedia Content Description Interface, Wiley, Chichester.
Sarkar, A. and Tripasai, W. (2002), “Learning verb argument structure from minimally
annotated corpora”, Proceedings of the 19th International Conference on Computational
Linguistics, August 24-September Vol. 1, Taipei, pp. 1-7.
Smith, M.K., Welty, C. and McGuinness, D.L. (2004), “OWL web ontology language,
W3C recommendation”, available at: www.w3c.org/TR/owl-guide/
Soft Cinema (n.d.), available at: www:softcinema.net
Sourcegorge.net (n.d.), “VIPER-GT annotation tool”, available at: http://viper-toolkit.
sourcegorge.net
The Language Archive (n.d.), “ELAN annotation tool”, available at: www.lat-mpi.eu/tools/elan
Tien, T.T. and Cecile, R. (2003), “Multimedia modeling using MPEG-7 for authoring multimedia
integration”, Proceedings of the ACM Conference on Multimedia Information Retrieval,
pp. 171-8.
Tsinaraki, C., Polydoros, P. and Christodoulakis, S. (2004), “Integration of OWL ontologies in
MPEG-7 and TVAnytime compliant semantic indexing”, Proceedings of the 16th
International Conference on Advanced Information Systems Engineering (CAiSE 2004),
Riga, June 7-11, pp. 143-61.
(The) University of Edingurgh (n.d.), “CAVIAR: Context Aware Vision using Image-based
Active Recognition”, available at: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/
Van Ossenbruggen, J., Nack, F. and Hardman, L. (2004), “That obscure object of desire:
multimedia metadata on the Web (Part I)”, IEEE Multimedia, Vol. 11 No. 4, pp. 38-48.
About the author
Dimitris Kanellopoulos holds a PhD in multimedia communications from the Department of
Electrical and Computer Engineering of the University of Patras, Greece. He is a member of the
Educational Software Development Laboratory in the Department of Mathematics at the
University of Patras. His research interests include multimedia communications, knowledge
representation, intelligent systems, and Web engineering. He has authored many papers in
international journals and conferences at these areas. He serves as a member of the editorial
boards in ten academic journals. Dimitris Kanellopoulos can be contacted at:
d_kan2006@yahoo.gr
Documentary
media objects
747
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints