2. Who Am I
Dr. Sebastian Schaffert
Senior Researcher at Salzburg Research
Chief Technology Officer at Redlink GmbH
Committer at Apache Software Foundation
… and starting 12/2014 Software Engineering Manager (SRE) @ Google
sschaffert@apache.org
http://linkedin.com/in/sebastianschaffert
http://www.schaffert.eu
3. Agenda
● Introduction to Apache Marmotta (Sebastian)
– Overview
– Installation
– Development
● Linked Data Platform (Sergio & Jakob)
– Overview
– Practical Usage
● Semantic Media Management (Thomas)
– Media Use Case
– SPARQL-MM
5. What is Apache Marmotta?
● Linked Data Server
(implements Content Negotiation and LDP)
● SPARQL Server
(public SPARQL 1.1 query and update endpoint)
● Linked Data Development Environment
(collection of modules and libraries for building
custom Linked Data applications)
● Community of Open Source Linked Data
Developers
… all under business friendly Apache Open Source licence
6. Linked Data Server
● easily offer your data as Linked Data on
the Web
● human-readable and machine-readable
read-write data access based on HTTP
content negotiation
● reference implementation of the Linked
Data Platform (see next presentation block)
7. SPARQL Server
● full support of SPARQL 1.1 through HTTP
web services
● SPARQL 1.1 query and update endpoints
● implements the SPARQL 1.1 protocol
( supports any standard SPARQL clients)
● fast native implementation of SPARQL in
KiWi triple store
● lightweight Squebi SPARQL explorer UI
8. Linked Data Development
● modular server architecture allows
combining exactly those functionalities
needed for a use case
(no need for reasoning? exclude reasoner ...)
● collection of independent libraries for
common Linked Data problems
– access Linked Data resources (and even
some that are not Linked Data)
– simplified Linked Data query language
(LDPath)
– use only the triple store without the server
9. Community of Developers
● discuss with people interested in getting-things-
done in the Linked Data world
● build applications that are useful without
reimplementing the whole stack
● thorough software engineering process
under the roof of the Apache Software
Foundation
10. Installation / Setup
(we help you)
https://github.com/wikier/apache-marmotta-tutorial-iswc2014
14. Apache Marmotta Platform
● implemented as Java web application
(deployed as marmotta.war file)
● service oriented architecture using CDI
(Java EE 6)
● REST web services using JAX-RS
(RestEasy)
● CDI services found on classpath are
automatically added to system
16. Marmotta Core (required)
● core platform functionalities:
– Linked Data access
– RDF import and export
– Admin UI
● platform glue code:
– service and dependency injection
– triple store
– system configuration
– logging
17. Marmotta Backends (one required)
● choice of different triple store backends
● KiWi (Marmotta Default)
– based on relational database (PostgreSQL, MySQL,
H2)
– highly scalable
● Sesame Native
– based on Sesame Native RDF backend
● BigData
– based on BigData clustered triple store
● Titan
– based on Titan graph database (backed by HBase,
Cassandra, or BerkeleyDB)
19. Marmotta LDCache (optional)
● transparently access Linked Data
resources from other servers as if they
were local
● support for wrapping some legacy data
sources (e.g. Facebook Graph)
● local triple cache, honors HTTP expiry and
cache headers
Note:
SPARQL does NOT work well with LDCache,
use LDPath instead!
20. Marmotta LDPath (optional)
● query language specifically designed for
querying the Linked Data Cloud
● regular path based navigation starting at a
resource and then following links
● limited expressivity (compared to SPARQL)
but full Linked Data support
@prefix local: <http://localhost:8080/resource/> ;
@prefix foaf: <http://xmlns.com/foaf/0.1/>;
@prefix mao: <http://www.w3.org/ns/ma-ont#>;
likes = local:likes /
(foaf:primaryTopic / mao:title | foaf:name)
:: xsd:string;
22. Marmotta Versioning (optional)
● transaction-based versioning of all changes
to the triple store
● implementation of Memento protocol for
exploring changes over time
● snapshot/wayback functionality (i.e.
possibility to query the state of the triple
store at a given time in history)
25. Apache Marmotta Libraries
● provide implementations for common
Linked Data problems (e.g. accessing
resources)
● standalone lightweight Java libraries that
can be used outside the Marmotta platform
26. LDClient
● library for accessing and retrieving Linked
Data resources
● includes all the standard code written again
and again (HTTP retrieval, content
negotiation, ...)
● extensible (Java ServiceLoader) with
custom wrappers for legacy data sources
(included are RDF, RDFa, Facebook, Youtube,
Freebase, Wikipedia, as well as base classes for
mapping other formats like XML and JSON)
27. LDCache
● library providing local caching functionality
for (remote) Linked Data resources
● builds on top of LDClient, so offers the
same extensibility
● Sesame Sail with transparent Linked Data
access (i.e. Sesame API for Linked Data
Cloud)
28. LDPath
● library offering a standalone
implementation of the LDPath query
language
● large function library for various scenarios
(e.g. string, math, ...)
● can be used with LDCache and LDClient
● can be integrated in your own applications
● supports different backends (Sesame,
Jena, Clerezza)
29. Marmotta Loader
● command line infrastructure for bulk-loading
RDF data in various formats to
different triple stores
● supports most RDF serializations, directory
imports, split-file imports, compressed files
(.gz, .bzip2, .xy), archives (tar, zip)
● provides progress indicator, statistics
31. KiWi Triplestore
● Sesame SAIL: can be plugged into any
Sesame application
● based on relational database (supported:
PostgreSQL, MySQL, H2)
● integrates easily in existing enterprise
infrastructure (database server, backups,
clustering, …)
● reliable transaction management (at the
cost of performance)
● supports very large datasets (e.g.
Freebase with more than 2 billion triples)
32. KiWi Triplestore: SPARQL
● translation of SPARQL queries into native
SQL
● generally very good performance for typical
queries, even on big datasets
● query performance can be optimized by
proper index and memory configuration in
the database
● almost complete support for SPARQL 1.1
(except some constructs exceeding the
expressivity of SQL and some “bugs”)
33. KiWi Triplestore: Reasoner
● rule-based sKWRL reasoner (see demo
before)
● fast forward chaining implementation of
rule evaluation
● truth maintenance for easy deletes/updates
● future: might be implemented as stored
procedures in database
34. KiWi Triplestore: Clustering
● cluster-wide caching and synchronization
based on Hazelcast or Infinispan
● useful for load balancing of several
instances of the same application (e.g.
Marmotta Platform)
35. KiWi Triplestore: Versioning
● transaction-based versioning of triple
updates
● undo transactions (applied in reverse
order)
● get a Sesame repository connection
visiting any time of the triple store history
36. Thank You!
Sebastian Schaffert
sschaffert@apache.org
supported by the European
Commission FP7 project MICO
(grant no. 610480)