Linked Data and Semantic Web Application Development by Peter Haase
1. Linked Data and
Semantic Application Development
Peter Haase
ХанкŃ-ĐĐľŃĐľŃĐąŃŃĐł
4. December 2014
2. Who am I and What am I Talking About?
A Linked Data Perspective
affilia%on
develops
affilia%on
owl:sameAs
founder
develops
www.metaphacts.com
owl:sameAs
project
worksOn
3. For
exercises,
quiz
and
further
material
visit
our
website:
http://www.euclid-Ââproject.eu
EUCLID
-Ââ
Providing
Linked
Data
3
eBook
@euclid_project
euclidproject
euclidproject
Other
channels:
Course
4. Semantic Technologies enabling
Smart Data
Â§ď§ Not just data, not just information, but actionable
insights, delivering insight and support better
decisions
4
Raw
Data
Access
Sense
Making
Ac%onable
Insights
Decision
Support
Data
Informa%on
Knowledge
10. Classes and properties for Wikipedia export (infoboxes), regularly updated
See http://wiki.dbpedia.org/
DBpedia
11. Linked (Open) Data
⢠Set of standards, principles for publishing, sharing
11
and interrelating structured knowledge
⢠Data from different knowledge domains, self-described, linked and
accessible
⢠From data silos to a Web of Data
⢠RDF as data model,
SPARQL for querying
⢠Ontologies to
describe the semantics
12. Linked Data Principles
1. Use
URIs
as
names
for
things.
2. Use
HTTP
URIs
so
that
users
can
look
up
those
names.
3. When
someone
looks
up
a
URI,
provide
useful
informa7on,
using
the
standards
(RDF*,
SPARQL).
4. Include
links
to
other
URIs,
so
that
users
can
discover
more
things.
13. Semantics
on
the
Web
Seman%c
Web
Stack
Berners-ÂâLee
(2006)
13
Applica%on
specific
declara%ve-Ââknowledge
Query
language
Basic
data
model
Syntac%c
basis
Simple
vocabulary
(schema)
language
Expressive
vocabulary
(ontology)
language
Digital
signatures,
recommenda%ons
Proof
genera%on,
exchange,
valida%on
14. Ontologies
Â§ď§ An ontology defines a domain of interest
â ⌠in terms of the things you talk about in the domain, their attributes, as
well as relationships between them
Â§ď§ Ontologies are used to
â Share a common understanding about a domain among people and
machines
â Enable reuse of domain knowledge
06.12.14
15. Categories
of
Linked
Data
Applications
Furthermore,
Linked
Data
applica%ons
can
be
classified
according
to
the
following
dimensions:
Dimensions
Levels
Descrip7on
Seman%c
Extrinsic
technology
depth
Use
of
seman%cs
on
the
surface
of
the
applica%on.
Intrinsic
Conven%onal
technologies
(e.g.,
RDBMS)
are
complemented
or
replaced
with
SW
equivalents.
Source:
M.
Mar%n
and
S.
Auer.
âCategorisa%on
of
Seman%c
Web
Applica%onsâ
EUCLID
â
Building
Linked
Data
applica%ons
15
Informa%on
flow
direc%on
Consuming
LD
is
retrieved
from
the
source
or
via
a
wrapper.
Producing
Publishes
LD
(in
RDF-Ââbased
formats).
Seman%c
richness
Shallow
Simple
taxonomies,
use
of
RDF
or
RDFS.
Strong
High
level
representa%on
formalisms
(OWL
variants)
Seman%c
integra%on
Isolated
Crea%on
of
own
vocabularies
Integrated
Reuse
of
informa%on
at
schema
or
instance
level
18. Example:
ResearchSpace
Image
Annota%on
EUCLID
â
Building
Linked
Data
applica%ons
18
⢠The
ResearchSpace
environment
aims
at
providing
a
set
of
RDF
data
sets
and
tools
to
describe
concepts
and
objects
related
to
cultural
historical
research.
⢠The
tools
are
highly
interac7ve:
allow
users
to
access
the
data
and
contribute
to
the
data
set
by
crea%ng
RDF
annota%ons.
Geo
Mapper
Source:
hcps://sites.google.com/a/researchspace.org/researchspace/
19. Example:
ResearchSpace
CRM
Search
System
Search
by
predicates
Source:
Snapshot
from
hcps://www.youtube.com/watch?v=HCnwgq6ebAs
EUCLID
â
Building
Linked
Data
applica%ons
19
Faceted
search
22. Benefits of Linked Data in the Enterprise
Â§ď§ Enterprise
Data
Integra7on:
Seman%cally
integrate
data
scacered
across
different
informa%on
systems,
leading
to
transparent,
streamlined
informa%on
management
with
less
redundancies
and
inconsistencies
Â§ď§ Simplified
publishing,
sharing
and
reuse
of
data:
increase
openness
and
accessibility
of
enterprise
data
through
open,
standards-Ââbased
APIs
Â§ď§ Enrichment
and
contextualiza7on
through
interlinking:
Increase
value
add
by
linking
to
Linked
Open
Data
Â§ď§ Improved
analy7cs:
enable
cross-Ââorganiza7on
analysis,
interac7ve
analy7cs,
and
repor7ng
on
top
of
a
collabora7ve
plaKorm
23. Optique Case Study:
Statoil Exploration
Experts in geology and geophysics develop
stratigraphic models of unexplored areas
â Based on production and
exploration data from nearby
locations
â Analytics on:
⢠1,000 TB of relational data
⢠using diverse schemata
⢠spread over 3,000 tables
⢠spread over multiple individual data bases
â 900 experts in Statoil Exploration
â Up to 4 days for new data access
queries
â Assistance from IT-experts
required
24. Ontology Based Data Access
Complex case:
information need specialized query
engineer IT expert
translation
disparate sources
Up
to
80%
of
expertâs
%me
spent
on
data
access
25. Example Query
Â§ď§ Find
â fields together with their remaining oil
â that are currently operated by Statoil
and
â show the types of wellbores located
on this fields
28. General
Architecture
of
Linked
Data
Applications
28
Presenta7on
Tier
Logic
Tier
Data
Integra%on
Component
SPARQL
Web
Data
accessed
via
APIs
Endpoints
Data
Tier
RDF/
XML
Integrated
Dataset
(Triple
Store)
Interlinking
Cleansing
Data
Access
Component
Linked
Data
EUCLID
â
Building
Linked
Data
applica%ons
Rela%onal
Data
Vocabulary
Mapping
Republica%on
Republica%on
Component
Physical
Wrapper
SPARQL
Wr.
R2R
Transf.
LD
Wrapper
29. Architectural
Patterns
1. The
Crawling
PaPern:
Crawls
or
loads
data
in
advance.
Data
is
managed
in
one
triple
store,
thus
it
can
be
accessed
efficiently.
The
disadvantage
of
this
pacern
is
that
the
data
might
not
be
up
to
date.
2. The
On-ÂâThe-ÂâFly
Dereferencing
PaPern:
URIs
are
dereferenced
at
the
moment
that
the
app
requires
the
data.
This
pacern
retrieves
up
to
date
data.
Performance
is
affected
when
the
app
must
dereference
many
URIs.
3. The
(Federated)
Query
PaPern:
Submits
complex
queries
to
a
fixed
set
of
data
sources.
Enables
applica%ons
to
work
with
current
data
directly
retrieved
from
the
sources.
Finding
op%mal
query
execu%on
plans
over
a
large
number
of
sources
is
a
complex
problem.
Data
Access
Data
Access
Cache
App
EUCLID
â
Building
Linked
Data
applica%ons
29
App
Data
Access
App
Source:
T.
Heath,
C.
Bizer.
Linked
Data:
Evolving
the
Web
into
a
Global
Data
Space
30. Data
Layer
Data
Access
Component
⢠Linked
Data
applica%ons
may
implement
a
Mediator-Ââ
Wrapper
Architecture
to
access
heterogeneous
sources:
EUCLID
â
Building
Linked
Data
applica%ons
30
â Wrappers
are
built
around
each
data
source
in
order
to
provide
an
unified
view
of
the
retrieved
data.
⢠The
method
to
access
the
data
depends
on
the
Linked
Data
architectural
paPern.
⢠The
factors
that
determine
the
decision
of
a
paPern
are:
â Number
of
data
sources
to
access
â Requirement
of
consuming
up-Ââto-Ââdate
data
â Tolerance
to
high
response
%me
â Requirement
of
discovering
new
data
sources
31. Data
Layer
(2)
Data
Access
Component
(2)
⢠The
data
access
component
may
be
implemented
by
using
one
or
a
combina%on
of
the
following
tools:
Mechanisms
Tools
(Examples)
Linked
Data
Crawlers
LDspider
hcps://code.google.com/p/ldspider/
Slug
hcps://code.google.com/p/slug-Ââsemweb-Ââcrawler/
Linked
Data
Client
Libraries
Seman%c
Web
Client
Library
hcp://wifo5-Ââ03.informa%k.uni-Ââ
mannheim.de/bizer/ng4j/semwebclient/
The
Tabulator
hcp://www.w3.org/2005/ajar/tab
Moriarty
hcps://code.google.com/p/moriarty/
SPARQL
Client
Libraries
Jena
Seman%c
Web
Framework
hcp://jena.apache.org/
Federated
SPARQL
Engines
ANAPSID
hcps://github.com/anapsid/anapsid
FedX
hcp://www.fluidops.com/fedx/
SPLENDID
hcps://code.google.com/p/rdffederator/
Search
Engine
APIs
Sindice
hcp://sindice.com/developers/api
Uberblic
hcp://uberblic.com/
EUCLID
â
Building
Linked
Data
applica%ons
31
32. Data
Layer
(3)
Data
Integration
Component
⢠Consolidates
the
data
retrieved
from
heterogeneous
sources.
⢠This
component
may
operate
at:
â Schema
level:
Performs
vocabulary
mappings
in
order
to
translate
data
into
a
single
unified
schema.
Links
correspond
to
RDFS
proper%es
or
OWL
property
and
class
axioms.
â Instance
level:
Performs
en%ty
resolu%on
via
owl:sameAs
links.
In
case
the
data
sources
do
not
provide
the
links,
further
tools
like
Silk
or
Open
Refine
can
be
used
to
integrate
the
data.
Data
Integra%on
Component
Interlinking
Cleansing
EUCLID
â
Building
Linked
Data
applica%ons
32
Data
Access
Component
Vocabulary
Mapping
33. Data
Layer
(4)
Integrated
Dataset
⢠The
dataset
resul%ng
of
integrated
and
consolidated
data
can
be
cached
in
a
RDF
store.
⢠There
are
many
solu%ons
to
deploy
triple/RDF
stores,
e.g.:
EUCLID
â
Building
Linked
Data
applica%ons
33
⢠bigdata
(hcp://www.bigdata.com/)
⢠OWLIM
(hcp://www.ontotext.com/owlim)
⢠Jena
TDB
(hcp://jena.apache.org/documenta%on/tdb/)
â˘
AllegroGraph
(hcp://www.franz.com/agraph/allegrograph/)
⢠Virtuoso
Universal
Server
(hcp://virtuoso.openlinksw.com/)
⢠RDF3x
(hcps://code.google.com/p/rdf3x/)
Integrated
Dataset
Republica%on
Republica%on
Component
34. Data
Layer
(5)
Republication
Component
⢠Exposes
as
Linked
Data
por%ons
EUCLID
â
Building
Linked
Data
applica%ons
34
⢠There
are
different
solu%ons
to
make
the
data
accessible:
⢠Via
SPARQL
endpoints
(e.g.,
Sesame
OpenRDF
SPARQL
Endpoint,
âŚ)
⢠Via
APIs
(e.g.,
Linked
Data
API)
⢠As
RDF
dumps
⢠With
the
built-Ââin
means
of
your
framework/CMS
(e.g.,
Drupal,
Informa%on
Workbench,
âŚ)
Data
Layer
Integrated
Dataset
Republica%on
Republica%on
Component
35. Application
and
Presentation
Layers
⢠The
logic
layer
implements
sophis%cated
processing
according
to
the
func%onali%es
of
the
applica%on.
This
layer
may
include
data
mining
components
as
well
as
reasoners
that
are
not
integrated
in
the
data
layer.
⢠The
presenta7on
layer
displays
the
informa%on
to
the
user
in
various
formats,
including
text,
diagrams
or
other
type
of
visualiza%on
techniques.
Presenta%on
Layer
Logic
Layer
EUCLID
â
Building
Linked
Data
applica%ons
35
36. LINKED
DATA
APPLICATION
DEVELOPMENT
FRAMEWORKS
Informa%on
Workbench
EUCLID
â
Building
Linked
Data
applica%ons
36
37. Information
Workbench
⢠Platorm
for
development
of
linked
data
applica%ons
Seman%c
Web
Data
Seman%cs-Ââ
&
Linked
Data-Ââbased
Integra%on
of
Enterprise
and
Open
Data
Sources
Intelligent
Data
Access
and
Analy%cs
⢠Visual
EUCLID
â
Building
Linked
Data
applica%ons
37
explora%on
⢠Seman%c
search
⢠Dashboarding
and
repor%ng
Collabora%on
and
Knowledge
Management
Platorm
⢠Wiki-Ââbased
cura%on
&
authoring
of
data
⢠Collabora%ve
workflows
Source:
hcp://www.fluidops.com/informa%on-Ââworkbench/
38. Information
Workbench
(2)
Customized
applica%on
solu%ons
Reusable
UI
and
data
integra%on
components
Data
storage
and
management
platorm
External
resources
to
reuse
data
and
create
mashups
EUCLID
â
Building
Linked
Data
applica%ons
38
39. Data
Integration:
Data
Provider
Concept
Data
providers
support
the
periodic
Examples:
EUCLID
â
Building
Linked
Data
applica%ons
39
extrac7on
&
integra7on
from
external
data
sources
into
a
central
repository
⢠Living
from
arbitrary
data
formats
to
RDF
(e.g.,
rela%onal,
XML,
CSV)
⢠Parametrizable
(e.g.
connec%on
informa%on,
refresh
interval,
..)
⢠Built-Ââin
UI
for
instan%a%ng
providers
⢠Intui%ve
interfaces
and
APIs
for
wri%ng
own,
custom
providers
Connect
to
data
source
Convert
data
into
RDF
Extract
data
from
source
RDF
R2RML
XML2RDF
SPARQL
Store
RDF
in
repository
40. W3C
RDB2RDF
⢠Task:
Integrate
data
from
rela%onal
DBMS
with
Linked
Data
⢠Approach:
map
from
rela%onal
schema
to
seman%c
vocabulary
with
R2RML
⢠Publishing:
two
alterna%ves
â
â Translate
SPARQL
into
SQL
on
the
fly
â Batch
transform
data
into
RDF,
index
and
provide
SPARQL
access
in
a
triplestore
40
Access
LD
Data
set
Integrated
Data
in
Triplestore
Interlinking
Vocabulary
Cleansing
Mapping
SPARQL
Endpoint
Publishing
Data
acquisi%on
R2RML
Engine
EUCLID
-Ââ
Providing
Linked
Data
Rela%onal
DBMS
41. W3C
RDB2RDF
⢠The
W3C
made,
last
year,
two
recommenda%ons
for
mapping
between
rela%onal
databases
and
RDF:
â Direct
mapping
directly
exposes
data
as
RDF
⢠Not
allowance
for
vocabulary
mapping
⢠No
allowance
for
interlinking
(unless
URIs
used
in
rela%onal
data)
â R2RML,
the
RDB
to
RDF
mapping
language
⢠Allows
vocabulary
mapping
(subject,
predicate
and
object
maps
with
class
op%ons)
⢠Allows
interlinking
â
URIs
can
be
constructed
hcp://www.w3.org/2001/sw/rdb2rdf/
EUCLID
-Ââ
Providing
Linked
Data
41
42. R2RML
Class
Mapping
⢠Declera%ve
mappings
with
an
RDF-Ââbased
syntax:
lb:Artist
a
rr:TriplesMap
;
rr:logicalTable
[rr:tableName
"artist"]
;
rr:subjectMap
[rr:class
mo:MusicArtist
;
rr:template
"http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate
mo:musicbrainz_guid
;
rr:objectMap
[rr:column
"gid"
;
rr:datatype
xsd:string]]
.
EUCLID
-Ââ
Providing
Linked
Data
42
43. Data
Warehousing
vs.
Federation
Warehousing
/
Crawling
⢠Data
is
copied
from
the
source
into
the
warehouse
⢠Query
runs
in
the
warehouse
⢠Supported
in
IWB
using
data
providers
Federa7on
⢠Data
remains
in
federated
DB
⢠Query
is
pushed
down
to
federated
DB
⢠Supported
in
IWB
using
SPARQL
federa3on
Query
Warehouse
Load
DB
DB
Query
Federa%on
Query
DB
DB
EUCLID
â
Building
Linked
Data
applica%ons
43
44. Customizable
User
Interface
Demo
available
at
hcp://musicbrainz.fluidops.net
Wiki
page
management
Main
view
area
EUCLID
â
Building
Linked
Data
applica%ons
44
View
selec%on
toolbar
Current
resource
Naviga%on
shortcuts
45. User
Interface
Concept:
One
Page
URI
Resource
page
Graph
Resource
page
Resource
page
Resource
page
EUCLID
â
Building
Linked
Data
applica%ons
45
46. UI
templates
Template:âŚ
Data
Driven
UI:
Ontology
as
âStructural
Backboneâ
Template:mo:MusicAr7st
Ontology
(RDFS/OWL)
EUCLID
â
Building
Linked
Data
applica%ons
46
Resource
page
RDF
Data
Graph
Resource
page
47. Different
Views
on
Every
Resource
Wiki
View
Table
View
Graph
View
Pivot
View
EUCLID
â
Building
Linked
Data
applica%ons
47
48. CH
4
Widget-ÂâBased
User
Interface
Visualiza7on
and
Explora7on
Analy7cs
and
Repor7ng
Mashups
with
Social
Media
Authoring
and
Content
Crea7on
Widgets are not static and can be integrated
into the UI using a Wiki-style syntax.
EUCLID
â
Building
Linked
Data
applica%ons
48
49. Example:
Add
Widgets
to
Wiki
⢠{{#widget: BarChart |
⢠query ='SELECT distinct (COUNT(?Release) AS ?COUNT) ?label WHERE {
⢠?? foaf:made ?Release .
⢠?Release rdf:type mo:Release .
⢠?Release dc:title ?label .
⢠}
⢠GROUP BY ?label
⢠ORDER BY DESC(?COUNT)
⢠LIMIT 10
⢠'
⢠| input = 'label'
⢠| output = 'COUNT'
⢠}}
Example:
Show
top
10
released
records
for
an
ar=st
EUCLID
â
Building
Linked
Data
applica%ons
49
50. Music
Example
Page
of
a
class:
⢠Shows
an
overview
of
MusicAr%st
instances
See
hcp://musicbrainz.fluidops.net/resource/mo:MusicAr%st
EUCLID
â
Building
Linked
Data
applica%ons
50
51. Music
Example
(2)
Page
of
a
class
template:
⢠Defines
a
layout
for
displaying
each
resource
of
the
class
EUCLID
â
Building
Linked
Data
applica%ons
51
⢠Uses
seman%c
wiki
syntax
See
hcp://musicbrainz.fluidops.net/resource/Template:mo:MusicAr%st
52. Music
Example
(3)
Page
of
a
class
instance:
⢠Displays
the
data
about
the
resource
according
to
the
class
EUCLID
â
Building
Linked
Data
applica%ons
52
template
See
hcp://musicbrainz.fluidops.net/resource/?uri=hcp%3A%2F
%2Fmusicbrainz.org%2Far%st%2Fb10bbbfc-Ââcf9e-Ââ42e0-Ââbe17-Ââe2c3e1d2600d%23_
53. Mashups
with
external
sources
⢠Relevant
informa%on
and
UI
elements
from
external
sources
can
be
incorporated
in
the
wiki
view
⢠IWB
contains
mul%ple
mashup
widgets
for
popular
social
media
sources
â Twicer
â Youtube
â Facebook
â New
York
Times
news
â LinkedIn
â âŚ
{{#widget:
Youtube
|
searchString
=
$SELECT
?x
WHERE
{
??
foaf:name
?x
.
}$
|
asynch
=
'trueâ
}}
Template
instantiation
??
=
http://musicbrainz.org/artist/a3cb23fc-Ââ
acd3-Ââ4ce0-Ââ8f36-Ââ1e5aa6a18432%23_
?x
=
âU2â
EUCLID
â
Building
Linked
Data
applica%ons
53
54. Triple
Editor
Table
View
⢠Edit
structured
data
associated
with
a
resource
⢠Make
change,
add
and
remove
triples
EUCLID
â
Building
Linked
Data
applica%ons
54
55. Ontology-ÂâBased
Data
Input
Triple
Editor
takes
into
account
the
ontology
defini%on:
⢠Autosugges%on
tool
considers
the
domains
and
ranges
of
the
proper%es
Example:
proper%es
available
for
the
class
mo:MusicGroup
are
suggested
automa%cally
EUCLID
â
Building
Linked
Data
applica%ons
55
56. Validation
of
User
Input
Valida%on
uses
property
defini%ons
in
the
ontology:
⢠The
property
myIntegerProperty
has
an
associated
rdfs:range
defini%on.
⢠This
ensures
that
all
objects
must
be
of
XML
schema
type
xsd:integer.
EUCLID
â
Building
Linked
Data
applica%ons
56
57. Use
Case
3:
Mobile
App
Templates
+
CSS
for
Systap
Bigdata
Russian Museum Project â
Architecture and Use Cases
Users
IWB
Frontend
IWB
Backend
Original
data
sources
Data
Engineer
Website
visitor
Use
Case
1:
Data
Provisioning
Museum
visitor
Museums
and
other
sources
⢠Data
crawling
⢠Data
transforma7on
⢠Data
Interlinking
⢠Data
enrichment
/
Informa7on
extrac7on
⢠Data
valida7on
Cards
⢠HTML5
mobile
devices
⢠Simplified
Social
networks
Russian
Museum
Data
DBpedia
Subset
Bri%sh
Museum
Data
User
Data
IWB
Wiki
View
⢠Google
Glass
App
⢠QR
Code
recogni7on
⢠PaPern
/
image
recogni7on
Use
Case
2:
Search
and
Visualiza7on
⢠Base
Templates
for
visualiza7on
⢠Templates
for
external
data
⢠PivotViewer
⢠Step-Ââby-Ââstep
visualiza7on
⢠Extended
Search
widgets
⢠SemFacet
58. Linked
Data
Applica%on
for
the
Russian
Museum
Ontology
Data
Data Providers
Templates
Widgets
Web Crawl, RDF Dump
61. Summary
Â§ď§ Linked Data and Semantic Technologies
â From data to information to knowledge
â Graphs for integration of heterogeneous data in variety of data models
â Ontologies for knowledge representation and interpretation of data
Â§ď§ Linked Data applications
â Publishing and consuming Linked Data
â Main components and architecture
Â§ď§ Standards-based, declarative models for all aspects of the application
â RDF: common data model
â OWL Ontology: conceptual domain model
â R2RML: Integrating data sources
â SPARQL queries: expressing informatin needs
â Wiki-templates: interfaces for interacting with the data
62. Contact us!
metaphacts GmbH
Kautzelweg 13
69190 Walldorf
Germany
p +49 6227 8308660
m +49 157 50152441
e info@metaphacts.com
@metaphacts
62