1. Linking Media and Data using
Apache Marmotta
Keynote at LIME 2014 Workshop
Sebastian Schaffert and Thomas Kurz
2. Contents
➔Motivation: The Red Bull Content Pool
➔Background:
➔ Linked Media Principles
➔ Media Fragments and Media Ontology
➔Implementation: Linked Media Framework
➔ Red Bull Use Case
➔ ConnectMe Use Case
➔Standardising: The Linked Data Platform
➔Introducing Apache Marmotta
➔Querying for Multimedia Fragments: SPARQL-MM
2009
2011
2013
2014
5. Motivation: The Red Bull Content Pool
➔ online archive containing video and image material related to
extreme sports events organised by Red Bull
➔ business-to-business portal where journalists can get material for
further broadcasting (mostly for free)
➔ material comes with metadata in the form of tables in word
documents:
➔ interview transcriptions (with time interval start/end second)
➔ scene descriptions (with time interval start/end second)
➔ music cue sheets (copyright information about background
music tracks)
7. Motivation: The Red Bull Content Pool
➔Problems:
➔ videos consist of series of scenes with many different
persons
➔ scanning through a video to find a particular scene is a
huge amount of work
➔ metadata is valuable but not really exploited for searching
videos and while playing videos
8. Can we help Markus?
Name: Markus
Occupation: sports journalist
Company: RegioTV Pinzgau
Objective: create report about cliff diving
Requires: videos, background info, contacts
How can we help Markus?
efficient and precise search in the Red Bull Content Pool
compact and relevant display of background information
contacts (e.g. website,email) of athletes, other journalists, etc.
fast and successful creation of the report
10. Linked Media Principles (2009)
➔ Linked Data is „read-only“
i.e. focus was on publication of big datasets, not the interaction
with data
a system for managing media assets needs to be capable of
updating resources and their metadata
➔ Linked Data is „data-only“
i.e. a resource is represented either as RDF metadata for
machines or as HTML tables for humans, but in all cases it is
metadata and not content
a system for managing media assets needs to be capable of
managing both media content and metadata about that content
11. Linked Media Principles (2009)
➔ extend Linked Data for updates using REST principles (HTTP):
➔ GET: returns a resource (as in Linked Data)
➔ POST: creates a new resource and uploads content or metadata
➔ PUT: updates content or metadata of a resource
➔ DELETE: removes a resource and all associated information
➔ extend Linked Data for arbitrary media formats using MIME:
➔ controlled by Accept: (in case of GET) and Content-Type: (in case of
PUT/POST) HTTP headers
➔ header value: MIME type (e.g. text/turtle or image/jpeg) and type of
relationship (e.g. rel=content or rel=meta)
➔ accessing a resource with GET or PUT redirects to the actual
representation specified by MIME type and relationship
12. Linked Media Principles (2009)
➔ Example 1: Retrieve HTML table representation of resource metadata
➔ Example 2: Retrieve HTML content of resource
➔ Example 3: Update resource metadata
GET http://data.redlink.io/resource/1234
Accept: text/html; rel=meta
GET http://data.redlink.io/resource/1234
Accept: text/html; rel=content
PUT http://data.redlink.io/resource/1234
Content-Type: text/turtle; rel=meta
<http://data.redlink.io/resource/1234>
mm:hasFragment <http://data.redlink.io/resource/1234#t=0,10>
14. Media Fragments URI
➔ media content currently treated as „black box binary content“
➔ interaction only via plugin or special browser support
➔ linking to a subsequence of a video not possible
➔ Media Fragments URI: use the „fragment“ part of a URI to
encode temporal and spatial subsequences
➔ Examples:
Identify the sequence from second 3 to second 10 of the video:
http://data.redlink.io/resource/cliff_diving.ogg#t=3,10
Identify the spatial box 320x240 at x=160 and y=120 of the video
http://data.redlink.io/resource/cliff_diving.ogg#xywh=160,120,320,240
15. Ontology for Media Resources
➔ common data model for representing video metadata:
➔ identification
➔ creation (hasCreator, hasPublisher, ...)
➔ content description (hasLanguage, hasGenre, hasKeyword,...)
➔ rights and distribution (hasPermissions, hasTargetAudience, ...)
➔ technical properties (hasCompression, hasFormat, ...)
➔ fragments (hasFragment, hasChapter, ...)
➔ mapping tables from the most popular video metadata formats to
the Ontology for Media Resources (EXIF, MPEG-7, TV-Anytime,
YouTube, ID3)
16. Combining Media Fragments and Media Ontology
➔ use Media Fragment URIs to uniquely identify fragments of
media content
➔ browser compatibility
➔ Linked Data compatibility
➔ use Ontology for Media Resources to describe these fragments
➔ RDF compatibility
➔ rich description graph with SPARQL querying
20. Behind the Scenes: Linked Media Framework
Linked Data Server with updates and uniform management of content and
metadata => particularly well-suited for multimedia content and metadata!
Linked Media Principles for resource-centric access to content and
metadata
SPARQL Query and SPARQL Update 1.1 for structural updating and
querying
Modules for Reasoning, Semantic Search, Linked Data Caching, Versioning,
and Social Media
Specialised on Linked Media and Linked Enterprise Content
Code, Installer, Screencasts and more:
http://code.google.com/p/lmf/
22. LMF Semantic Search
Facetted Search over Content and Metadata with SOLR compatible API
RDF Path Language for configurable Metadata Indexing
Multiple Cores with different configurations to adapt to different search
requirements
23. LMF Reasoning
Rule-based reasoning over triples in the LMF triple store to represent implicit
knowledge
Reason maintenance allows to describe justifications for inferences
adapted version of sKWRL rule language:
more efficient implementation,
improved reason maintenance
24. LMF Linked Data Caching
transparently retrieves linked resources from the Linked Data cloud when needed
(e.g. LD Path or SPARQL query)
powerful component for integrating with other information systems exposing their
data as Linked Media or Linked Data
adapters for services offering their data in proprietary formats (e.g. YouTube, Vimeo,
…)
25. LMF Classification and Sentiment Analysis
support for statistical text classification, allows to train different classifiers with sample
texts for arbitrary categories
suggest most likely category for a text according to similarity with training data
analyse text for positive or negative sentiment (German and English)
25
26. LMF Social Media Integration
allows linking to social media resources, e.g. Facebook or Google accounts, videos,
interests
allows authentication and data import from selected social media services
(Facebook, YouTube, generic RSS)
27. LMF Versioning
keeps history of updates in the Linked Media Framework
provides information for trust and provenance
of data, e.g. annotations added to the system
34. Linked Data Platform: Introduction
➔ recommendation draft of the LDP working group at W3C
➔ support for „read/write Linked Data“
➔ support for RDF and non-RDF resources
➔ can be used as an alternative for Linked Media Principles
➔ advantage of standardisation and wide adoption
➔ considerably more complex standard and protocol
➔ URL: http://www.w3.org/TR/ldp/
35. Linked Data Platform: Concepts
➔ access and interaction according to REST webservice principles
➔ GET: returns description of a resource
➔ POST: creates a new resource
➔ PUT: replaces the description of a resource
➔ DELETE: removes the description of a resource
➔ Linked Data Platform Resources (LDP-R)
➔ RDF resources (LDP-RS): RDF description of a resource
➔ non-RDF resources (LDP-NR): arbitrary (media) content
➔ Linked Data Platform Containers (LDP-C)
➔ collection of LDP resources, e.g. „students“, „professors“, „lectures“
➔ basic container (LDP-BC): simple collection of resources with common URI prefix
➔ direct container (LDP-DC): collection with explicit membership (as triple)
➔ indirect container (LDP-IC): collection with implicit membership (based on content)
36. LDP Basic Containers (LDP-BC)
➔ collection of LDP resources
➔ identification via common URI prefix, e.g.
http://example.com/container1/a
http://example.com/container1/b
➔ can contain both RDF and non-RDF resources at the same time
➔ container is itself an RDF resource
➔ description as RDF:
@base <http://example.com/container1/>
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix ldp: <http://www.w3.org/ns/ldp#>.
<>
a ldp:BasicContainer;
dcterms:title "A very simple container";
ldp:contains <a>, <b>, <c>.
38. Apache Marmotta
➔ a simplification of the Linked Media Framework taking core
components:
➔ Linked Data Server with SPARQL 1.1
➔ Linked Data Cache
➔ Versioning, Reasoning
➔ no search, no content analysis
➔ reference implementation of the Linked Data Platform and
participation in W3C working group
➔ highly modular and extensible to build custom Linked Data
applications (both client and server)
http://marmotta.apache.org
41. SPARQL-MM: Introduction
➔ extension of SPARQL with specific multimedia functions and
relations, implemented in Apache Marmotta
RelationFunction Aggregation Function
Spatial mm:rightBeside mm:spatialIntersection
mm:spatialOverlaps mm:spatialBoundingBox
… …
Temporal mm:after mm:temporalIntersection
mm:temoralOverlaps mm:temporalIntermediate
… …
Combined mm:overlaps mm:boundingBox
mm:contains mm:intersection
A list of all functions can be found at:
https://github.com/tkurz/sparql-mm/blob/master/sparql-mm/functions.md
42. SPARQL-MM: A sample query
Give me the spatio-temporal snippet that shows Lewis Jones
right beside Connor Macfarlane.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-mm/functions#>
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT (mm:boundingBox(?l1,?l2) AS ?two_guys) WHERE {
?f1 ma:locator ?l1; dct:subject ?p1.
?p1 foaf:name "Lewis Jones".
?f2 ma:locator ?l2; dct:subject ?p2.
?p2 foaf:name "Connor Macfarlane".
FILTER mm:rightBeside(?l1,?l2)
FILTER mm:temporalOverlaps(?l1,?l2)
}
46. Conclusions
➔ semantic media asset management requires management and
interaction with both content and metadata
➔ Linked Media Principles (2009) were a first approach to extend
Linked Data with support for semantic media asset
management
➔ Linked Data Platform (W3C working draft) supersedes Linked
Media Principles, as it covers the same aspects and more
➔ semantic media asset management requires specific media
access and querying
➔ Media Fragments URI (W3C) to identify media fragments
➔ Ontology for Media Resources (W3C) to describe media
fragments
➔ SPARQL-MM to query media fragment descriptions
47. Thanks for your Attention!
Dr. Sebastian Schaffert
Chief Technology Officer
Redlink GmbH
sebastian.schaffert@redlink.co