4. Linked
Data
Principles
1. Use
URIs
as
names
for
things.
2. Use
HTTP
URIs
so
that
users
can
look
up
those
names.
3. When
someone
looks
up
a
URI,
provide
useful
informa9on,
using
the
standards
(RDF*,
SPARQL).
4. Include
links
to
other
URIs,
so
that
users
can
discover
more
things.
EUCLID
-‐
Providing
Linked
Data
4
CH
1
5. Linked
Data
Lifecycle
Linked
Data
Lifecycle
EUCLID
-‐
Providing
Linked
Data
5
Source:
Sören
Auer.
“The
Seman3c
Data
Web”
(slides)
Source:
José
M.
Alvarez.
“My
Linked
Data
Lifecycle”
Source:
Michael
Hausenblas.
“Linked
Data
lifeyclcle”
6. Core
Tasks
for
Providing
Linked
Data
EUCLID
-‐
Providing
Linked
Data
6
Based
on
the
proposed
LD
lifecycles
and
the
LD
principles,
we
can
iden3fy
3
main
tasks
for
providing
LD:
① Crea9ng:
includes
data
extrac3on,
crea3on
of
HTTP
URIs,
and
vocabulary
selec3on.
(LD
principles
1
&
2)
② Interlinking:
involves
the
crea3on
of
(RDF)
links
to
external
data
sets.
(LD
principle
4)
③ Publishing:
consists
of
crea3ng
the
metadata
and
making
the
data
set
accessible.
(LD
principle
3)
7. Agenda
1. Crea9ng
Linked
Data
2. Interlinking
Linked
Data
3. Publishing
Linked
Data
4. Linked
Data
publishing
checklist
7
EUCLID
-‐
Providing
Linked
Data
9. • The
data
of
interest
may
be
stored
in
a
wide
range
or
formats:
• Several
tools
support
the
process
of
mining
data
from
different
repositories,
for
example:
Extracting
the
Data
9
EUCLID
-‐
Providing
Linked
Data
Spreadsheets
or
tabular
data
Databases
Text
R2RML
10. Using
the
RDF
Data
Model
EUCLID
-‐
Providing
Linked
Data
10
• The
RDF
data
model
is
used
to
represent
the
extracted
informa3on
• The
nodes
represent
the
concepts/en33es
within
the
data.
A
node
corresponds
to
a
URI,
a
blank
node
or
a
literal
(only
in
predicates)
• The
rela3onships
between
the
concepts/en33es
are
modeled
as
arcs
Subject
Object
Predicate
11. Naming
Things:
URIs
• All
the
things
or
dis3nct
en33es
within
the
data
must
be
named
• According
to
the
Linked
Data
principles,
the
standard
mechanism
to
name
en33es
is
the
URI
• Designing
Cool
URIs:
– Leave
out
informa3on
about
the
data
regarding
to:
author,
technologies,
status,
access
mechanisms,
…
– Simplicity:
short,
mnemonic
URIs
– Stability:
maintain
the
URIs
as
long
as
possible
– Manageability:
issue
the
URIs
in
a
way
that
you
can
manage
11
EUCLID
-‐
Providing
Linked
Data
Source:hjp://www.w3.org/TR/cooluris/
12. Selecting
Vocabularies
• Vocabularies
model
the
concepts
and
the
rela9onship
between
them
in
a
knowledge
domain
• Terms
from
well-‐known
vocabularies
should
be
reused
wherever
possible
• New
terms
should
be
define
only
if
you
can
not
find
required
terms
in
exis3ng
vocabularies
• A
large
number
of
vocabularies
in
RDF
are
openly
available,
e.g.,
Linked
Open
Vocabularies
(LOV)
12
EUCLID
-‐
Providing
Linked
Data
13. Selecting
Vocabularies
(2)
EUCLID
-‐
Providing
Linked
Data
13
Linked
Open
Vocabularies
322
vocabularies
classified
by
domain
Source:hjp://lov.okfn.org/dataset/lov/
14. Selecting
Vocabularies
(3)
EUCLID
-‐
Providing
Linked
Data
14
Linked
Open
Vocabularies:
Analyzing
MusicOntology
Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html
15. Selecting
Vocabularies
(4)
EUCLID
-‐
Providing
Linked
Data
15
Other
lists
of
well-‐known
vocabularies
are
maintained
by:
• W3C
SWEO
Linking
Open
Data
community
project
hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/
CommonVocabularies
• Library
Linked
Data
Incubator
Group:
Vocabularies
in
the
library
domain
hjp://www.w3.org/2005/Incubator/lld/XGR-‐lld-‐vocabdataset-‐20111025
17. Interlinking
Data
Sets
• It’s
one
of
the
Linked
Data
principles!
• Involves
the
crea3on
of
RDF
links
between
two
different
RDF
data
sets:
– Links
at
instance
level
(rdfs:seeAlso,
owl:sameAs)
– Links
at
schema
level
(RDFS
subclass/subproperty,
OWL
equivalent
class/property,
SKOS
mapping
proper9es)
• Appropriate
links
are
detected
via
link
discovery
EUCLID
-‐
Providing
Linked
Data
17
4.
Include
links
to
other
URIs,
so
that
users
can
discover
more
things.
18. Interlinking
Data
Sets
(2)
Challenges
for
link
discovery
• Linked
Data
sets
are
heterogeneous
in
terms
of
vocabularies,
formats
and
data
representa3on
• Large
range
of
knowledge
domains
• Scalability:
LD
is
composed
of
a
large
number
of
data
sets
and
RDF
triples,
hence
it
is
not
possible
to
compare
every
possible
en3ty
pair
EUCLID
-‐
Providing
Linked
Data
18
Source:
Robert
Isele.
“LOD2
Webinar
Series:Silk”
19. Interlinking
Data
Sets
(3)
Challenges
for
link
discovery
• It
corresponds
to
the
en9ty
resolu9on
problem:
deciding
whether
two
en..es
correspond
to
same
object
in
the
real
world
• Name
ambigui9es:
typos,
misspellings,
different
languages,
homonyms
• Structural
ambigui9es:
same
concepts/en33es
with
different
structures.
Requires
the
applica3on
of
ontology
and
schema
matching
techniques
EUCLID
-‐
Providing
Linked
Data
19
20. Interlinking
Data
Sets
(4)
EUCLID
-‐
Providing
Linked
Data
20
RDF
data
sets
can
be
interlinked:
Manually
• Involves
the
manual
explora3on
of
LD
data
sets
and
their
RDF
resources
to
iden3fy
linking
targets
• May
not
be
feasible
when
the
number
of
en33es
within
the
data
set
is
very
large
Automatically
• Using
tools
that
perform
link
discovery
based
on
linkage
rules,
for
example:
Silk,
Limes
and
xCurator
21. owl:sameAs
&
rdfs:seeAlso
• owl:sameAs
• Creates
links
between
individuals
• States
that
two
URIs
refer
to
the
same
individuals
• rdfs:seeAlso
• States
that
a
resource
may
provide
addi3onal
informa3on
about
the
subject
resource
• Links
in
MusicBrainz:
– owl:seeAlso
is
used
for
music
ar3sts
– rdfs:seeAlso
is
used
for
albums
EUCLID
-‐
Providing
Linked
Data
21
22. SKOS
• Simple
Knowledge
Organiza3on
System
– hjp://www.w3.org/TR/skos-‐reference/
• Data
model
for
knowledge
organiza3on
systems
(thesauri,
classifica3on
scheme,
taxonomies)
• SKOS
data
is
expressed
as
RDF
triples
• Allows
the
crea3on
of
RDF
links
between
different
data
sets
with
the
usage
of
mapping
proper9es
EUCLID
-‐
Providing
Linked
Data
22
23. SKOS:
Mapping
Properties
These
proper3es
are
used
to
link
SKOS
concepts
(par3cularly
instances)
in
different
schemes:
• skos:closeMatch:
links
two
concepts
that
are
sufficiently
similar
(some3mes
can
be
used
interchangeably)
• skos:exactMatch:
indicates
that
the
two
concepts
can
be
used
interchangeably.
• Axiom:
It
is
a
transi9ve
property
• skos:relatedMatch:
states
an
associa3ve
mapping
link
between
two
concepts
EUCLID
-‐
Providing
Linked
Data
23
24. Example
of
SKOS
exact
match
SKOS:
Mapping
Properties
(2)
EUCLID
-‐
Providing
Linked
Data
24
mo:MusicArtist
skos:exactMatch
dbpedia-‐ont:MusicalArtist.
@prefix
skos:
<http://www.w3.org/2004/02/skos/core#>
@prefix
mo:
<http://purl.org/ontology/mo/>
@prefix
dbpedia-‐ont:
<http://dbpedia.org/ontology/>
@prefix
schema:
<http://schema.org/>
mo:MusicGroup
skos:exactMatch
schema:MusicGroup.
mo:MusicGroup
skos:exactMatch
dbpedia-‐ont:Band.
25. Example
of
SKOS
close
match
SKOS:
Mapping
Properties
(3)
EUCLID
-‐
Providing
Linked
Data
25
mo:SignalGroup
skos:closeMatch
schema:MusicAlbum.
@prefix
skos:
<http://www.w3.org/2004/02/skos/core#>
@prefix
mo:
<http://purl.org/ontology/mo/>
@prefix
dbpedia-‐ont:
<http://dbpedia.org/ontology/>
@prefix
schema:
<http://schema.org/>
mo:SignalGroup
skos:closeMatch
dbpedia-‐ont:Album.
26. Integrity
conditions
• Guarantee
consistency
and
avoid
contradic3ons
in
the
rela3onships
between
SKOS
concepts
SKOS:
Mapping
Properties
(4)
EUCLID
-‐
Providing
Linked
Data
26
skos:Mapping
Relation
skos:close
Match
skos:exact
Match
skos:related
Match
Symmetric
&
Transi9ve
Disjoint
with
Par3al
Mapping
Rela3on
diagram
with
integrity
condi3ons
Symmetric
28. Publishing
Linked
Data
Once
the
RDF
data
set
has
been
created
and
interlinked,
the
publishing
process
involves
the
following
tasks:
1. Metadata
crea3on
for
describing
the
data
set
2. Making
the
data
set
accessible
3. Exposing
the
data
set
in
Linked
Data
repositories
4. Valida9ng
the
data
set
EUCLID
-‐
Providing
Linked
Data
28
29. • Consists
of
providing
(machine-‐readable)
metadata
of
RDF
data
sets
which
can
be
processed
by
engines
• This
informa3on
allows
for:
– Efficient
and
effec3ve
search
of
data
sets
– Selec3on
of
appropriate
data
sets
(for
consump3on
or
interlinking)
– Get
general
sta3s3cs
of
the
data
sets
EUCLID
-‐
Providing
Linked
Data
29
Describing
RDF
Data
Sets
30. Describing
RDF
Data
Sets
(2)
• The
common
language
for
describing
RDF
data
sets
is
VoID
(Vocabulary
of
Interlinked
Data
sets)
• Defines
an
RDF
data
set
with
the
predicate
void:Dataset
• Covers
4
types
of
metadata:
EUCLID
-‐
Providing
Linked
Data
30
• General
metadata
• Structural
metadata
• Descrip3ons
of
linksets
• Access
metadata
31. VoID:
General
Metadata
• General
metadata
is
used
by
users
to
iden3fy
appropriate
data
sets.
• Specifies
informa3on
about
descrip3on
of
the
data
set,
contact
person/organiza3on,
the
license
of
the
data
set,
data
subject
and
some
technical
features.
• VoID
(re)uses
predicates
from
the
Dublin
Core
Metadata1
and
FOAF2
vocabularies.
EUCLID
-‐
Providing
Linked
Data
31
1
hjp://dublincore.org/documents/2010/10/11/dcmi-‐terms/
2
hjp://xmlns.com/foaf/spec/
32. VoID:
General
Metadata
(2)
Predicate
Range
Descrip9on
dcterms:title
Literal
Name
of
the
data
set.
dcterms:description
Literal
Descrip3on
of
the
data
set.
dcterms:source
RDF
resource
Source
from
which
the
data
set
was
derived.
dcterms:creator
RDF
resource
Primarily
responsible
of
crea3ng
the
data
set.
dcterms:date
xsd:date
Time
associated
with
an
event
in
the
life-‐cycle
of
the
resource.
dcterms:created
xsd:date
Date
of
crea3on
of
the
data
set.
dcterms:issued
xsd:date
Date
of
publica3on
of
the
data
set.
dcterms:modified
xsd:date
Date
on
which
the
data
set
was
changed.
foaf:homepage
Literal
Name
of
the
data
set.
dcterms:publisher
RDF
resource
En3ty
responsible
for
making
the
data
set
available.
dcterms:contributor
RDF
resource
En3ty
responsible
for
making
contribu3ons
to
the
data
set.
EUCLID
-‐
Providing
Linked
Data
32
Source:
hjp://www.w3.org/TR/void/#metadata
General
Information
Contains
informa3on
about
the
crea3on
of
the
data
set
33. VoID:
General
Metadata
(3)
Other
Information
• License
of
the
data
set:
specifies
the
usage
condi3ons
of
the
data.
The
license
can
be
pointed
with
the
property
dcterms:license
• Category
of
the
data
set:
to
specify
the
topics
or
domains
covered
by
the
data
set,
the
property
dcterms:subject
can
be
used
• Technical
features:
the
property
void:feature
can
be
used
to
express
technical
proper3es
of
the
data
(e.g.
RDF
serializa3on
formats)
EUCLID
-‐
Providing
Linked
Data
33
34. VoID:
Structural
Metadata
EUCLID
-‐
Providing
Linked
Data
34
• Provides
high-‐level
informa3on
about
the
internal
structure
of
the
data
set
• This
metadata
is
useful
when
exploring
or
querying
the
data
set
• Includes
informa3on
about
resources,
vocabularies
used
in
the
data
set,
sta3s3cs
and
examples
of
resources
in
the
data
set
35. VoID:
Structural
Metadata
(2)
EUCLID
-‐
Providing
Linked
Data
35
Information
about
resources
• Example
resources:
allow
users
to
get
an
impression
of
the
kind
of
resources
included
in
the
data
set.
Examples
can
be
shown
with
the
property
void:exampleResource
• Pajern
for
resource
URIs:
the
void:uriSpace
property
can
be
used
to
state
that
all
the
en3ty
URIs
in
a
data
set
start
with
a
given
string
:MusicBrainz
a
void:Dataset;
void:exampleResource
<http://musicbrainz.org/artist/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d>
.
:MusicBrainz
a
void:Dataset;
void:uriSpace
"http://musicbrainz.org/"
.
36. VoID:
Structural
Metadata
(3)
EUCLID
-‐
Providing
Linked
Data
36
Vocabularies
used
in
the
data
set
• The
void:vocabulary
property
iden3fies
the
vocabulary
or
ontology
that
is
used
in
a
data
set
• Typically,
only
the
most
relevant
vocabularies
are
listed
• This
property
can
only
be
used
for
en3re
vocabularies.
It
cannot
be
used
to
express
that
a
subset
of
the
vocabulary
occurs
in
the
data
set.
:MusicBrainz
a
void:Dataset;
void:vocabulary
<http://purl.org/ontology/mo/>
.
37. VoID:
Structural
Metadata
(4)
EUCLID
-‐
Providing
Linked
Data
37
Source:
hjp://www.w3.org/TR/void/#metadata
Statistics
about
a
data
set
Express
numeric
sta3s3cs
about
a
data
set:
Predicate
Range
Descrip9on
void:triples
Number
Total
number
of
triples
contained
in
the
data
set.
void:entities
Number
Total
number
of
en33es
that
are
described
in
the
data
set.
An
en3ty
must
have
a
URI,
and
match
the
void:uriRegexPajern
void:classes
Number
Total
number
of
dis3nct
classes
in
the
data
set.
void:properties
Number
Total
number
of
dis3nct
proper3es
in
the
data
set.
void:distinctSubjects
Number
Total
number
of
dis3nct
subjects
in
the
data
set.
void:distinctObjects
Number
Total
number
of
dis3nct
objects
in
the
data
set.
void:documents
Number
Total
number
of
documents,
in
case
that
the
data
set
is
published
as
a
set
of
individual
documents.
38. VoID:
Structural
Metadata
(5)
EUCLID
-‐
Providing
Linked
Data
38
Partitioned
data
sets
• The
void:subset
property
provides
descrip3on
of
parts
of
a
data
set
• Data
sets
can
be
par33oned
based
on
classes
or
proper9es:
• void:classPartition
contains
only
instances
of
a
par3cular
class
• void:propertyPartition
contains
only
triples
with
a
par3cular
predicate
:MusicBrainz
a
void:Dataset;
void:subset
:MusicBrainzArtists
.
:MusicBrainz
a
void:Dataset;
void:classPartition
[
void:class
mo:Release
.]
;
void:propertyParition
[
void:property
mo:member
.]
.
39. VoID:
Describing
Linksets
EUCLID
-‐
Providing
Linked
Data
39
• Linkset:
collec3on
of
RDF
links
between
two
RDF
data
sets
:DS1
:DS2
:LS1
:LS2
Image
based
on
hjp://seman3cweb.org/wiki/File:Void-‐linkset-‐conceptual.png
owl:sameAs
@PREFIX
void:<http://rdfs.org/ns/void#>
@PREFIX
owl:<http://www.w3.org/2002/07/owl#>
:DS1
a
void:Dataset
.
:DS2
a
void:Dataset
.
:DS1
void:subset
:LS1
.
:LS1
a
void:Linkset;
void:linkPredicate
owl:sameAs;
void:target
:DS1,
:DS2
.
40. VoID:
Describing
Linksets
(2)
EUCLID
-‐
Providing
Linked
Data
40
Example
@PREFIX
void:<http://rdfs.org/ns/void#>
@PREFIX
skos:<http://www.w3.org/2002/07/owl#>
:MusicBrainz
a
void:Dataset
.
:DBpedia
a
void:Dataset
.
:MusicBrainz
void:classPartition
:MBArtists
.
:MBArtists
void:class
mo:MusicArtist
.
:MBArtists
a
void:Linkset;
void:linkPredicate
skos:exactMatch;
void:target
:MusicBrainz,
:DBpedia
.
41. The
access
metadata
describes
the
methods
of
accessing
the
actual
RDF
data
set
*
This
assumes
that
the
default
graph
of
the
SPARQL
endpoint
contains
the
data
set.
VoID
cannot
express
that
a
data
set
is
contained
a
specific
named
graph.
This
can
be
specified
with
SPARQL
1.1.
Service
Descrip3on
VoID:
Access
Metadata
EUCLID
-‐
Providing
Linked
Data
41
Method
Predicate
Descrip9on
URI
look
up
endpoint
void:uriLookupEndpoint
Specifies
the
URI
of
a
service
for
accessing
the
data
set
(different
from
the
SPARQL
protocol)
Root
resource
void:rootResource
URI
of
the
top
concepts
(only
for
data
sets
structured
as
trees)
SPARQL
endpoint
void:sparqlEndpoint
Provides
access
to
the
data
set
via
the
SPARQL
protocol.*
RDF
data
dumps
void:dataDump
Specifies
the
loca3on
of
the
dump
file.
If
the
data
set
is
split
into
mul3ple
files,
then
several
values
of
this
property
are
provided.
CH
5
42. Providing
Access
to
the
Data
Set
The
data
set
can
be
accessed
via
different
mechanisms:
EUCLID
-‐
Providing
Linked
Data
42
RDFa
RDF
dump
SPARQL
endpoint
Dereferencing
HTTP
URIs
43. Dereferencing
HTTP
URIs
• Allows
for
easily
exploring
certain
resources
contained
in
the
data
set
•
What
to
return
for
a
URI?
• Immediate
descrip9on:
triples
where
the
URI
is
the
subject.
• Backlinks:
triples
where
the
URI
is
the
object.
• Related
descrip9ons:
informa3on
of
interest
in
typical
usage
scenarios.
• Metadata:
informa3on
as
author
and
licensing
informa3on.
• Syntax:
RDF
descrip3ons
as
RDF/XML
and
human-‐readable
formats.
• Applica3ons
(e.g.
LD
browsers)
render
the
retrieved
informa3on
so
it
can
be
perceived
by
a
user.
EUCLID
-‐
Providing
Linked
Data
43
Source:
How
to
Publish
Linked
Data
on
The
Web
-‐
Chris
Bizer,
Richard
Cyganiak,
Tom
Heath.
CH
1
45. RDFa
• RDFa
=
“RDF
in
ajributes”
• Extension
to
HTML5
for
embedding
RDF
within
HTML
pages:
– The
HTML
is
processed
by
the
browser,
the
(human)
consumer
don’t
see
the
RDF
data
– The
RDF
triples
within
the
page
are
consumed
by
APIs
to
extract
the
(semi-‐)structured
data
• It
is
considered
as
the
bridge
between
the
Web
of
Data
and
the
Web
of
Documents
• It
is
a
complete
serializa9on
of
RDF
EUCLID
-‐
Providing
Linked
Data
45
46. RDFa:
Attributes
A]ribute
role
A]ribute
Descrip9on
Syntax
prefix
List
of
prefix-‐name
IRIs
pairs
vocab
IRI
that
specifies
the
vocabulary
where
the
concept
is
defined
Subject
about
Specifies
the
subject
of
the
rela3onship
Predicate
property
Express
the
rela3onship
between
the
subject
and
the
value
rel
Defines
a
rela3on
between
the
subject
and
a
URL
rev
Express
reverse
rela3onships
between
two
resources
Resource
href
Specifies
an
object
URI
for
the
rel
and
rev
ajributes
resource
Same
as
href
(used
when
href
is
not
present)
src
Specifies
the
subject
of
a
rela3onship
Literal
datatype
Express
the
datatype
of
the
object
of
the
property
ajribute
content
Supply
machine-‐readable
content
for
a
literal
xml:lang,
lang
Specifies
the
language
of
the
literal
Macro
typeof
Indicate
the
RDF
type(s)
to
associate
with
a
subject
inlist
An
object
is
added
to
the
list
of
a
predicate.
EUCLID
-‐
Providing
Linked
Data
46
47. RDFa:
Example
Extracting
RDF
from
HTML
EUCLID
-‐
Providing
Linked
Data
47
<div
class="ar3stheader"
about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_"
typeof="hjp://purl.org./ontology/mo/MusicGroup">
…
</div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_>
HTML
(+RDFa):
RDF:
48. RDFa:
Example
Extracting
RDF
from
HTML
EUCLID
-‐
Providing
Linked
Data
48
<div
class="ar3stheader"
about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_"
typeof="hjp://purl.org./ontology/mo/MusicGroup">
…
</div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_>
<hjp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type>
HTML
(+RDFa):
RDF:
49. RDFa:
Example
Extracting
RDF
from
HTML
EUCLID
-‐
Providing
Linked
Data
49
<div
class="ar3stheader"
about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_"
typeof="hjp://purl.org./ontology/mo/MusicGroup">
…
</div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_>
<hjp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type>
<hjp://purl.org./ontology/mo/MusicGroup>.
HTML
(+RDFa):
RDF:
50. RDFa:
Example
(2)
Extracting
RDF
from
MusicBrainz.org
EUCLID
-‐
Providing
Linked
Data
50
hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d
51. RDFa:
Example
(2)
Extracting
RDF
from
MusicBrainz.org
EUCLID
-‐
Providing
Linked
Data
51
Source:
hjp://www.w3.org/2007/08/pyRdfa/
52. RDFa:
Example
(2)
Extracting
RDF
from
MusicBrainz.org
EUCLID
-‐
Providing
Linked
Data
52
hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org
%2Far3st%2Fb10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d&format=nt
Watch
the
EUCLID
screencast:
http://vimeo.com/euclidproject
53. RDF
Dump
• An
RDF
dump
refers
to
a
file
which
contains
(part
of)
a
data
set
specified
in
an
RDF
format
(RDF/XML,
N-‐
Triples,
N-‐Quads)
• The
data
set
can
be
split
into
several
RDF
dumps
• A
list
of
available
data
sets
available
as
RDF
dumps
can
be
found
at:
– hjp://www.w3.org/wiki/DataSetRDFDumps
EUCLID
-‐
Providing
Linked
Data
53
54. SPARQL
Endpoint
• The
SPARQL
endpoint
refers
to
the
URI
of
the
listener
of
the
SPARQL
protocol
service,
which
handles
requests
for
SPARQL
protocol
opera3ons
• The
user
submits
SPARQL
queries
to
the
SPARQL
endpoint
in
order
to
retrieve
only
a
desired
subset
of
the
RDF
data
set
• List
of
available
SPARQL
endpoints:
• hjp://www.w3.org/wiki/SparqlEndpoints
• hjp://labs.mondeca.com/sparqlEndpointsStatus/
EUCLID
-‐
Providing
Linked
Data
54
CH
2
55. Using
Linked
Data
Catalogs
• Data
catalogs,
markets
or
repositories
are
pla{orms
dedicated
to
provide
access
to
a
wide
range
of
data
sets
from
different
domains
• Allow
data
consumers
to
easily
find
and
use
the
data
• Usually
the
catalogs
offer
relevant
metadata
about
the
crea3on
of
the
data
set
EUCLID
-‐
Providing
Linked
Data
55
56. Using
Linked
Data
Catalogs
(2)
How
to
publish
an
RDF
data
set
into
a
catalog?
EUCLID
-‐
Providing
Linked
Data
56
Create
your
own
data
catalog
Recommended
for
big
organiza3ons/ins3tu3ons
aiming
at
providing
a
large
number
of
data
sets
Use
a
data
management
system,
for
example:
Upload
your
data
set
into
an
exis3ng
catalog
Allows
data
consumers
to
easily
find
new
data
sets
Common
LD
catalogs
are:
-‐
-‐
The
Linking
Open
Data
Cloud
57. Validating
Data
Sets
There
are
different
ways
to
validate
the
published
RDF
data
set:
EUCLID
-‐
Providing
Linked
Data
57
General
validators
Parsing
&
Syntax
• Vapour
-‐
Performs
two
types
of
tests:
without
content
nego3a3on
and
reques3ng
RDF/XML
content
hjp://validator.linkeddata.org/vapour
• URI
Debugger
-‐
Retreieves
the
HTTP
responses
of
accessing
a
URI
hjp://linkeddata.informa3k.hu-‐berlin.de/uridbg/
• RDF
Triple-‐Checker
–
Dereferences
namespaces
associated
with
the
resources
used
in
the
document
hjp://graphite.ecs.soton.ac.uk/checker/
• W3C
RDF/XML
Valida9on
Service
–
Evaluates
the
syntax
of
RDF/
XML
documents
and
displays
the
RDF
triples
in
it
hjp://validator.linkeddata.org/vapour
• W3C
Markup
Valida9on
Service
–
Checks
syntac3c
correctness
for
web
documents
with
RDFa
markup
hjp://validator.w3.org/
• RDF:ALERTS
–
Validates
syntax,
undefined
resources,
datatype
and
other
types
of
errors
hjp://swse.deri.org/RDFAlerts/
Accessibility
58. Validating
Data
Sets
(2)
Example:
Validating
URIs
with
Vapour
EUCLID
-‐
Providing
Linked
Data
58
Source:
hjp://idi.fundacionc3c.org/vapour
59. Validating
Data
Sets
(3)
Example:
Validating
URIs
with
Vapour
EUCLID
-‐
Providing
Linked
Data
59
Source:
hjp://idi.fundacionc3c.org/vapour
60. Validating
Data
Sets
(4)
Example:
Validating
URIs
with
Vapour
EUCLID
-‐
Providing
Linked
Data
60
Source:
hjp://idi.fundacionc3c.org/vapour
Example:
Validating
URIs
with
Vapour
hjp://dbpedia.org/page/The_Beatles
hjp://dbpedia.org/data/The_Beatles.xml
HTML
content
RDF
document
62. Providing
Linked
Data:
Checklist
(1)
Creating
Linked
Data
o All
the
relevant
en33es/concepts
were
effec3vely
extracted
from
the
raw
data
?
o Are
all
the
created
URIs
dereferenceable?
o Are
you
reusing
terms
from
widely
accepted
vocabularies?
EUCLID
-‐
Providing
Linked
Data
62
63. Providing
Linked
Data:
Checklist
(2)
Interlinking
Linked
Data
o Is
the
data
set
linked
to
other
RDF
data
sets?
o Are
the
created
vocabulary
terms
linked
to
other
vocabularies?
EUCLID
-‐
Providing
Linked
Data
63
64. Providing
Linked
Data:
Checklist
(3)
Publishing
Linked
Data
o Do
you
provide
data
set
metadata?
o Do
you
provide
informa3on
about
licensing?
o Do
you
provide
addi3onal
access
methods?
o Is
the
data
set
available
in
LD
catalogs?
o Did
the
data
set
pass
the
valida3on
tests?
EUCLID
-‐
Providing
Linked
Data
64
65. Summary
EUCLID
-‐
Providing
Linked
Data
65
• The
Linked
Data
lifecycle:
• 3
core
tasks:
crea3ng,
interlinking
and
publishing
• Crea3on
of
Linked
Data:
• Extrac3ng
relevant
data,
using
URIs
to
name
en33es
and
selec3ng
vocabularies
and
expressing
the
data
using
the
RDF
data
model
• Interlinking
Linked
Data:
• Challenges
of
link
discovery,
using
Silk
to
create
links
between
two
data
sets
and
using
SKOS
links
• Publishing
Linked
Data:
• Crea3on
of
data
set
metadata;
publishing
the
data
set
via
RDF
dumps,
SPARQL
endpoints
or
RDFa;
using
RDFa
and
schema.org
to
enrich
search
results,
and
uploading
the
data
set
to
a
LD
catalog
In
this
chapter
we
studied:
66. The
Web
&
Linked
Data
• Linked
Data
catalogs
• Applica9ons
67. CKAN
• CKAN
is
an
open
source
pla{orm
for
developing
data
set
catalogs
• Implement
useful
tools
for
data
publishers
to
support:
• Data
harves3ng
• Crea3on
of
metadata
• Access
mechanisms
to
the
data
set
• Upda3ng
the
data
set
• Monitoring
the
access
to
the
data
set
EUCLID
-‐
Providing
Linked
Data
67
70. • The
Data
Hub
is
a
community-‐run
data
catalog
which
contains
more
than
5,000
data
sets1
• “(…)
is
an
openly
editable
open
data
catalogue,
in
the
style
of
Wikipedia”.2
• It
is
implemented
on
top
of
the
CKAN
pla{orm
• Allows
the
crea3on
of
groups:
– The
Linking
Open
Data
Cloud
group
exclusively
contains
Linked
Data
sets
EUCLID
-‐
Providing
Linked
Data
70
1
According
to
the
informa3on
presented
in
the
portal
on
March
2013
2
Source:
hjp://datahub.io/about
The
Data
Hub
72. The
Data
Hub
(3)
EUCLID
-‐
Providing
Linked
Data
72
Source:
hjp://datahub.io/
73. The
Linking
Open
Data
Cloud
EUCLID
-‐
Providing
Linked
Data
73
September
2011
Source:
Linking
Open
Data
cloud
diagram,
by
Richard
Cyganiak
and
Anja
Jentzsch
74. The
Linking
Open
Data
Cloud
How
to
publish
an
RDF
data
set
in
this
cloud?
1. The
data
set
must
follow
the
Linked
Data
principles
2. The
data
set
must
contain
at
least
1,000
RDF
triples
3. The
data
set
must
contain
at
least
50
RDF
links
to
a
data
set
that
is
already
in
the
diagram
4. Access
to
the
data
set
must
be
provided
Once
these
criteria
are
met,
the
data
publisher
must
add
the
data
set
to
the
Data
Hub
catalog,
and
contact
the
administrators
of
the
Linking
Open
Data
Cloud
group
EUCLID
-‐
Providing
Linked
Data
74
Source:
hjp://lod-‐cloud.net/
75. Linked
Data
&
Search
Engines
EUCLID
-‐
Providing
Linked
Data
75
• Search
engines
collect
informa3on
about
web
resources
in
order
to
produce
richer
search
results
by
improving
the
display
of
the
results
• This
is
only
possible
if
the
search
engines
are
able
to
understand
the
content
within
the
web
pages
• The
HTML
pages
must
be
annotated
with
machine-‐
readable
content
to
describe
their
content:
Mark
up
format
Vocabulary
76. RDFa
for
marking
up
data
EUCLID
-‐
Providing
Linked
Data
76
• RDFa
is
used
to
provide
(semi-‐)structured
Linked
Data
embedded
in
web
content
• Examples:
– Some
search
engines
use
RDFa,
e.g.,
Google,
Yahoo!
and
Bing
– Facebook’s
Open
Graph
is
based
on
RDFa
77. Google
Rich
Snippets
EUCLID
-‐
Providing
Linked
Data
77
• Embedding
seman3cs
via
RDFa
(or
microformats/
microdata)
enhances
search
results:
79. Schema.org
EUCLID
-‐
Providing
Linked
Data
79
• Collec3on
of
schemas/vocabularies
to
markup
the
HTML
pages
• It
is
recognized
by
Bing,
Google,
Yahoo!
and
Yandex
• Covers
a
wide
range
of
knowledge
domains
• It
also
offers
an
extension
mechanism
in
case
the
publisher
is
interested
in
adding
new
concepts
to
the
vocabularies
80. Schema.org
(2)
EUCLID
-‐
Providing
Linked
Data
80
The
vocabularies
cover
the
following
topics:
Source:
hjp://schema.org/docs/schemas.html
“The
world
is
too
rich,
complex
and
interes.ng
for
a
single
schema
to
describe
fully
on
its
own.
With
schema.org
we
aim
to
find
a
balance,
by
providing
a
core
schema
that
covers
lots
of
situa.ons,
alongside
extension
mechanisms
for
extra
detail.”
(Dan
Brickley,
schema.org)
81. EUCLID
-‐
Providing
Linked
Data
81
Integrates(/aligns)
exis3ng
vocabularies
where
appropriate,
e.g.
rNews
Source:
hjp://schema.org/Ar3cle
Schema.org
(3)
82. Google
Knowledge
Graph
EUCLID
-‐
Providing
Linked
Data
82
• The
user
is
able
to
find
answer
to
their
queries
without
browsing
pages
• Provides
detailed
informa3on
83. Google
Knowledge
Graph
(2)
EUCLID
-‐
Providing
Linked
Data
83
• Google
Search
results
include
structured
data
from
Freebase
• Might
disambiguate
search
terms
84.
Freebase
EUCLID
-‐
Providing
Linked
Data
84
• Knowledge
base
of
structured
data
• Data
is
stored
as
a
graph
• Describes
data
from
different
domains
85. Bing
Snapshot
EUCLID
-‐
Providing
Linked
Data
85
• Provides
structured
data
related
to
the
search
term
• Includes
a
significant
number
of
en33es
from
more
domains
• Connects
data
from
LinkedIn
• Is
is
powered
by
the
graph
engine
Trinity.RDF
87.
Open
Graph
Protocol
EUCLID
-‐
Providing
Linked
Data
87
• It
was
originally
created
by
Facebook
• Allows
describing
web
content
as
graph
objects,
establishing
connec3ons
between
people
and
objects
• The
descrip3ons
are
embedded
in
the
web
page
as
RDFa
data
• Supports
descrip3on
of
several
domains:
basic
metadata,
music,
video,
ar3cles,
books,
websites
and
user
profiles
Source:
hjp://ogp.me/
88.
Open
Graph
Protocol
(2)
EUCLID
-‐
Providing
Linked
Data
88
Source:
hjp://ogp.me/
Who
is
using
Open
Graph
protocol?
Source:
hjp://ogp.me/
Facebook
Google
Mixi
Consumers
Publishers
IMDb
Microso•
NHL
Posterous
Rojen
Tomatoes
TIME
89.
Open
Graph
Protocol
(3)
EUCLID
-‐
Providing
Linked
Data
89
• Facebook
expands
vocabulary
of
rela3onships
beyond
“friendship”
and
“like”
more
ac9ons!
Source:
hjps://developers.facebook.com/docs/opengraph/
90.
Open
Graph
Protocol
&
Facebook
EUCLID
-‐
Providing
Linked
Data
90
List
of
domains
and
ac9ons
Source:
hjps://developers.facebook.com/docs/opengraph/
• Listen
• Create
a
playlist
• Watch
• Rate
• Wants
to
watch
• Rate
• Read
• Quote
• Wants
to
read
• Achieve
• High
score
• Bike
• Run
• Walk
• Like
• Recommend
• Follow
General
Music
Movies
&
TV
Games
Fitness
Book
How
can
we
exploit
these
links
and
rela3onships?
91.
Facebook
Graph
Search
EUCLID
-‐
Providing
Linked
Data
91
• focuse
on
people
and
their
interests,
exploi3ng
how
everything
is
related
to
each
other
• Queries
are
specified
using
natural
language
• Takes
advantage
of
context
and
suggest
possible
queries
• Allows
for
building
more
complex
(expressive)
queries
that
are
not
possible
with
normal
search:
– For
example,
“music
liked
by
me
and
friends
who
live
in
my
city”
92.
Facebook
Graph
Search
(2)
EUCLID
-‐
Providing
Linked
Data
92
Context
(informa3on
from
profile):
Graph
search
sugges9ons:
94.
Facebook
Graph
Search
(4)
EUCLID
-‐
Providing
Linked
Data
94
Observations
• Allows
for
conjunc3ve
queries
(applying
filter
over
intermediate
results
=
“apply
operator”)
• Disjunc9ve
queries
are
not
supported:
– For
example:
“My
friends
who
like
Seman3cWeb.com
OR
ReadWrite”
• Post
search
is
not
supported
– It
is
not
possible
to
search
in
post
content
submijed
to
the
3meline
• User
privacy
segngs
affect
the
results
95. Tools
for
providing
Linked
Data
• Extrac9ng
data
from
spreadsheets:
OpenRefine
• Extrac9ng
data
from
RDBMS:
R2RML
• Extrac9ng
data
from
text:
Zemanta,
OpenCalais,
GATE
• Interlinking
data
sets:
Silk
96. EXTRACTING
DATA
FROM
SPREADSHEETS
WITH
OPENREFINE
EUCLID
-‐
Providing
Linked
Data
96
97. Integrate
Chart
Data
• Task:
Integrate
latest
chart
informa3on
into
your
RDF
database.
• Data
may
be
available
in
non-‐
RDF
formats:
– Plain
text
– CSV,
TSV,
separator-‐based
files
– HTML
tables
– Spreadsheets
(OpenDocument,
Excel,
…)
– XML
– JSON
– …
97
LD
Data
set
Access
Integrated
Data
Set
Interlinking
Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
CSV/
TSV
HTML
Spreadsheets
JSON
Data
acquisi3on
EUCLID
-‐
Providing
Linked
Data
98. Example
Data
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million
98
hjp://en.wikipedia.org/wiki/
List_of_best-‐selling_music_ar3sts
Ar3st
Country
of
origin
Period
ac3ve
Release-‐year
of
first
charted
record
Total
cer3fied
units
(from
available
markets)[Notes]
The
Beatles
United
Kingdom
1960–
1970[4]
1962[4]
Total
available
cer9fied
units:
250
million[show]
Elvis
Presley
United
States
1954–
1977[28]
1954[28]
Total
available
cer9fied
units:
203.3
million[show]
Michael
Jackson[Note
2]
United
States
1964–
2009[32]
1971[32]
Total
available
cer9fied
units:
157.4
million[show]
Madonna
United
States
1979–
present[44]
1982[44]
Total
available
cer9fied
units:
160.1
million[show]
Led
Zeppelin
United
Kingdom
1968–
1980[50]
1969[50]
Total
available
cer9fied
units:
135.5
million[show]
Queen
United
Kingdom
1971–
present[53]
1973[53]
Total
available
cer9fied
units:
90.5
million[show]
{
"artist": {
"class": "artist",
"name": "The Beatles"
},
"rank": 1,
"value": 250 million
},
…
CSV
JSON
HTML
tables
EUCLID
-‐
Providing
Linked
Data
99.
OpenRefine
• transforms
and
cleans
messy
input
data
sets.
• is
an
open-‐source
successor
of
Google
Refine.
• allows
for
en3ty
reconcilia3on
against
SPARQL
endpoints
or
RDF
data.
• is
extended
with
plugins
that
enhance
its
func3onality,
e.g.
for
RDF
support.
99
EUCLID
-‐
Providing
Linked
Data
Quick
Facts
100. Use
of
OpenRefine
100
1. Messy
input
data
is
imported,
transformed
into
a
table
represen-‐
ta3on
and
cleaned.
3. Define
the
structure
of
the
RDF
output.
4. The
data
is
exported
into
some
RDF
syntax.
2. En3ty
reconcilia3on
is
applied
to
allow
for
interlinking
with
exis3ng
data
sets.
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million
CSV
musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int .
musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int .
musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int .
musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int .
musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int .
musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int .
RDF
EUCLID
-‐
Providing
Linked
Data
101. Typical
steps:
• Group
and
explore
data
items
• Dele3ng
columns
or
rows
based
on
filter
condi3on
• Split
columns
into
several
columns
based
on
condi3on
• Modify
messy
data
items
with
GREL,
a
powerful
expression
language
• Replay
steps
from
a
previous
Refine
project
101
EUCLID
-‐
Providing
Linked
Data
Data
Transformation
102. How
to
Generate
RDF?
• Addi3onal
problem:
data
needs
to
be
interlinked
with
exis3ng
MusicBrainz
data
• This
is
the
point
where
plugins
come
into
play:
– RDF
Refine:
developed
by
DERI
– An
extension
of
OpenRefine
to
support
RDF
102
?
RDF
EUCLID
-‐
Providing
Linked
Data
103. Core
Capabilities
• Interlinking
of
data
by
en3ty
reconcilia3on
– Against
SPARQL
endpoints,
RDF
dumps
– Discovery
of
relevant
RDF
data
sets
• RDF
export
with
the
help
of
RDF
skeletons
– Define
the
vocabulary
and
graph
structure
of
the
RDF
serializa3on
– In
Turtle,
RDF/XML
103
EUCLID
-‐
Providing
Linked
Data
104. Typical
steps:
• Define
a
reconcilia3on
service
• Select
specific
types
to
reconcile
against
• Start
reconciling
a
column
against
the
service
104
EUCLID
-‐
Providing
Linked
Data
Entity
Reconciliation
105. Define
RDF
Skeletons
• An
RDF
skeleton
defines
the
structure
of
the
RDF
triples
that
are
exported
EUCLID
-‐
Providing
Linked
Data
105
108. W3C
RDB2RDF
• Task:
Integrate
data
from
rela3onal
DBMS
with
Linked
Data
• Approach:
map
from
rela3onal
schema
to
seman3c
vocabulary
with
R2RML
• Publishing:
two
alterna3ves
–
– Translate
SPARQL
into
SQL
on
the
fly
– Batch
transform
data
into
RDF,
index
and
provide
SPARQL
access
in
a
triplestore
108
LD
Data
set
Access
Integrated
Data
in
Triplestore
Interlinking
Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
Data
acquisi3on
EUCLID
-‐
Providing
Linked
Data
R2RML
Engine
Rela3onal
DBMS
109. W3C
RDB2RDF
• The
W3C
made,
last
year,
two
recommenda3ons
for
mapping
between
rela3onal
databases
and
RDF:
– Direct
mapping
directly
exposes
data
as
RDF
• Not
allowance
for
vocabulary
mapping
• No
allowance
for
interlinking
(unless
URIs
used
in
rela3onal
data)
• Not
appropriate
for
this
topic
– R2RML,
the
RDB
to
RDF
mapping
language
• Allows
vocabulary
mapping
(subject,
predicate
and
object
maps
with
class
op3ons)
• Allows
interlinking
–
URIs
can
be
constructed
• Means
to
provide
MusicBrainz
RDF/SPARQL
itself
EUCLID
-‐
Providing
Linked
Data
109
hjp://www.w3.org/2001/sw/rdb2rdf/
110. MusicBrainz
Next
Gen
Schema
EUCLID
-‐
Providing
Linked
Data
110
• Ar9st
As
pre-‐NGS,
but
further
ajributes
• Ar9st
Credit
Allows
joint
credit
• Release
Group
Cf.
‘album’
versus:
• Release
• Medium
• Track
• Track
List
• Work
• Recording
Source:
hjps://wiki.musicbrainz.org/Next_Genera3on_Schema
111. Music
Ontology
• OWL
ontology
with
following
core
concepts
(classes)
and
rela3onships
(proper3es):
EUCLID
-‐
Providing
Linked
Data
111
Source:
hjp://musicontology.com
112. R2RML
Class
Mapping
• Mapping
tables
to
classes
is
‘easy’:
lb:Artist
a
rr:TriplesMap
;
rr:logicalTable
[rr:tableName
"artist"]
;
rr:subjectMap
[rr:class
mo:MusicArtist
;
rr:template
"http://musicbrainz.org/artist/{gid}#_"]
;
rr:predicateObjectMap
[rr:predicate
mo:musicbrainz_guid
;
rr:objectMap
[rr:column
"gid"
;
rr:datatype
xsd:string]]
.
EUCLID
-‐
Providing
Linked
Data
112
113. R2RML
Property
Mapping
• Mapping
columns
to
proper3es
can
be
easy:
lb:artist_name
a
rr:TriplesMap
;
rr:logicalTable
[rr:sqlQuery
"""SELECT
artist.gid,
artist_name.name
FROM
artist
INNER
JOIN
artist_name
ON
artist.name
=
artist_name.id"""]
;
rr:subjectMap
lb:sm_artist
;
rr:predicateObjectMap
[rr:predicate
foaf:name
;
rr:objectMap
[rr:column
"name"]]
.
EUCLID
-‐
Providing
Linked
Data
113
114. NGS
Advanced
Relations
EUCLID
-‐
Providing
Linked
Data
114
• Major
en33es
(Ar3st,
Release
Group,
Track,
etc.)
plus
URL
are
paired
(l_ar3st_ar3st)
• Each
pairing
of
instances
refers
to
a
Link
• Links
have
types
(cf.
RDF
proper3es)
and
ajributes
Source:
hjp://wiki.musicbrainz.org/Advanced_Rela3onship
115. R2RML
Advanced
Mapping
• Mapping
advanced
rela3onships
(SQL
joins):
lb:artist_member
a
rr:TriplesMap
;
rr:logicalTable
[rr:sqlQuery
"""SELECT
a1.gid,
a2.gid
AS
band
FROM
artist
a1
INNER
JOIN
l_artist_artist
ON
a1.id
=
l_artist_artist.entity0
INNER
JOIN
link
ON
l_artist_artist.link
=
link.id
INNER
JOIN
link_type
ON
link_type
=
link_type.id
INNER
JOIN
artist
a2
on
l_artist_artist.entity1
=
a2.id
WHERE
link_type.gid='5be4c609-‐9afa-‐4ea0-‐910b-‐12ffb71e3821'
AND
link.ended=FALSE"""]
;
rr:subjectMap
lb:sm_artist
;
rr:predicateObjectMap
[rr:predicate
mo:member_of
;
rr:objectMap
[rr:template
"http://musicbrainz.org/artist/
{band}#_"
;
rr:termType
rr:IRI]]
.
EUCLID
-‐
Providing
Linked
Data
115
117. OpenCalais
• Not
easily
customised/extended
• Domain-‐specific
coverage
varies
EUCLID
-‐
Providing
Linked
Data
117
Source:
hjp://viewer.opencalais.com/
118. DBpedia
Spotlight
• Not
easily
customised/extended
• Is
currently
only
available
for
English
EUCLID
-‐
Providing
Linked
Data
118
Source:
hjp://dbpedia-‐spotlight.github.com/demo/
hjp://dbpedia.org/page/Slowcore
hjp://dbpedia.org/page/Dorothy_Parker
119. Zemanta
EUCLID
-‐
Providing
Linked
Data
119
Source:
hjp://www.zemanta.com/demo/
• Common
problem
with
general
purpose,
open-‐domain
seman3c
annota3on
tools
• Best
results
require
bespoke
customisa3on
120. • General
Architecture
for
Text
Engineering
• Free
open-‐source
(LGPL)
framework
and
development
environment
• Started
1996,
large
developer
community
• Used
worldwide
by
many
organisa3ons
to
build
bespoke
solu3ons;
e.g.
Press
Associa3on
and
the
Na3onal
Archive
• Informa3on
Extrac3on
in
many
languages
GATE
EUCLID
-‐
Providing
Linked
Data
120
hjp://www.gate.ac.uk/
121. • Increases
recall
over
DBpedia
by
deriving
new
lexicalisa3ons
for
URIs
from
link
anchor
texts,
disambigua3on
pages,
and
redirect
pages
GATE
Example
-‐
LODIE
EUCLID
-‐
Providing
Linked
Data
121
122. Precision
and
Recall
• Generic
services
typically
very
low
recall
• Combina3on
is
one
solu3on
• Other
solu3on
is
custom
extrac3on
122
PER LOC ORG TOTAL
DB
Spotlight 0.97
/
0.40 0.82
/
0.46 0.86
/
0.31 0.85
/
0.39
Zemanta 0.96
/
0.84 0.89
/
0.62 0.82
/
0.57 0.90
/
0.68
LODIE 0.81
/
0.82 0.73
/
0.76 0.56
/
0.59 0.71
/
0.74
Zemanta
∩
LODIE 1.00
/
0.74 0.95
/
0.45 0.97
/
0.42 0.97
/
0.54
Zemanta
U
LODIE 0.94
/
0.93 0.77
/
0.76 0.72
/
0.71 0.82
/
0.81
EUCLID
-‐
Providing
Linked
Data
123. Custom
GATE
Gazetteer
• Retrieve
MusicBrainz
en3ty/label/class
with
SPARQL
query
123
EUCLID
-‐
Providing
Linked
Data
124. GATECloud
• Custom
(e.g.
based
around
custom
gazejeer)
GATE
pipelines
can
be
executed
on
the
cloud:
124
EUCLID
-‐
Providing
Linked
Data
126. Interlinking
with
Silk
• Task:
Create
links
between
the
data
set
and
external
Linked
Data
sources.
• Approach:
Crea3on
of
specified
links
by
querying
the
target
data
sets
• Alterna9ves:
– Manual
crea3on
of
linkage
rules
by
the
user
– Automa3c
learning
linkage
rules
by
submi…ng
predefined
SPARQL
queries
126
LD
Data
set
Access
Integrated
Data
Set
Interlinking
Cleansing
Vocabulary
Mapping
SPARQL
Endpoint
Publishing
CSV/
TSV
HTML
Spreadsheets
JSON
Data
acquisi3on
EUCLID
-‐
Providing
Linked
Data
127. Link
Discovery
with
Silk
• Open
source
tool
for
discovering
RDF
links
between
data
items
within
different
Linked
Data
sources
• It
is
based
on
the
Silk
Link
Specifica3on
Language
(Silk-‐LSL)
for
expressing
linkage
rules
• It
accesses
the
target
RDF
data
sets
via
SPARQL
endpoints
to
generate
RDF
links
EUCLID
-‐
Providing
Linked
Data
127
Source:
Robert
Isele.
“LOD2
Webinar
Series:Silk”
128. Silk
Variants
• Silk
Single
Machine
• Generates
RDF
links
on
a
single
machine
• Data
sets
can
reside
either
locally
or
in
remote
machines
• Provides
mul3threading
and
caching
• Silk
MapReduce
• Uses
a
cluster
composed
of
mul3ple
machines
• Based
on
Hadoop
and
designed
to
scale
to
big
data
sets
• Silk
Server
• Used
within
applica3ons
that
consume
Linked
Data
from
the
Web
while
keeping
track
of
known
en33es
• Provides
an
HTTP
API
for
matching
en33es
from
an
incoming
stream
EUCLID
-‐
Providing
Linked
Data
128
Source:
hjp://wifo5-‐03.informa3k.uni-‐mannheim.de/bizer/silk/
129. Source:
Silk
workflow
is
par3ally
based
on
“LOD2
Webinar
Series:
Silk
-‐(Simplified)
Linking
Workflow”
by
Rober
Isele.
Silk
Workflow
EUCLID
-‐
Providing
Linked
Data
129
Select
LD
data
sets
• Iden3fy
suitable
data
sets
in
LD
catalogs*
• Select
the
two
data
sets
to
link
Specify
LD
data
sets
• Specify
the
access
method
to
the
data
set
(RDF
dump,
SPARQL
endpoint)*
• Specify
the
en3ty
types
to
be
linked
Write
linkage
rule
• Specifies
how
to
compare
the
resources
• Use
Silk-‐LSL
• The
rules
can
also
be
learnt
Generate
RDF
links
• Output
links
can
be
stored
in
a
file
or
a
triple
store
• Can
discover
SKOS
links
Silk
framework
*
See
sec3on
“Publishing
Linked
Data”
130. Linkage
Rule
Components
• Linkage
rules
define
the
condi3ons
to
create
the
links
between
the
data
sets.
These
rules
are
composed
of:
EUCLID
-‐
Providing
Linked
Data
130
Source:
hjp://wifo5-‐03.informa3k.uni-‐mannheim.de/bizer/silk/
RDF
Paths
• Describe
the
elements
to
be
compared
• Example:
?a/rdfs:label
Transforma9ons
• Apply
transforma3ons
to
the
result
set
of
an
RDF
path
• Examples:
LowerCase,
Concatenate,
Replace,
…
Comparators
• Compute
the
similarity
of
two
inputs
• Examples:
String
similarity
metrics,
Date
similarity,
…
Aggrega9ons
• Compute
an
aggregated
value
from
mul3ple
comparators
• Examples:
Min,
Max,
Avg,
various
means,
Euclidian
distance
…
1
2
3
4
131. Silk
Workbench
• Web
applica3on
built
on
top
of
Silk,
which
allows
the
crea3on
of
projects
to
manage
the
crea3on
of
links
between
RDF
data
sets
• The
data
sets
can
be
stored
locally
or
accessed
remotely
by
specifying
the
SPARQL
endpoint
• The
user
is
able
to
create
customized
linking
tasks:
– The
tool
offers
a
graphical
editor
to
create
linkage
rules
by
combining
the
linkage
rules
components
via
drag
&
drop
elements
– Includes
support
for
(automa3c)
learning
linkage
rules
EUCLID
-‐
Providing
Linked
Data
131
132. Project
configuration
Silk
Workbench
(2)
EUCLID
-‐
Providing
Linked
Data
132
1
2
3
4
1. Project:
name
and
components
(data
sources,
linking
tasks
and
output
tasks)
2. Data
sources:
specifica3on
of
the
data
sets
to
be
interlinked
3. Linking
task:
specifica3on
of
the
linkage
rules
and
type
of
links
to
be
created
4. Output
task:
mechanism
to
store
the
results
from
the
lnking
process
2
133. Editing
a
linking
task
Silk
Workbench
(3)
EUCLID
-‐
Providing
Linked
Data
133
1
4
2
3
1. Linkage
rule
components
2. Graphical
editor:
the
items
from
(1)
are
dragged
&
dropped
in
this
area,
and
connected
to
compose
the
linkage
rules
3. Generate
links:
based
on
the
defined
linkage
rules
in
(2),
the
data
sets
are
accessed
to
discover
possible
links
4. Learn:
automa3c
learning
of
linkage
rules
134. Adding
a
linkage
rule
Silk
Workbench
(4)
EUCLID
-‐
Providing
Linked
Data
134
The
previous
linkage
rule
states:
1. Retrieve
the
foaf:name
values
from
MusicBrainz
and
the
rdfs:label
from
DBpedia
2. Apply
lower
case
transforma3on
to
the
output
of
(1)
3. Compare
the
output
from
(2)
using
the
metric
“Levenshtein
distance”.
If
this
distance
is
greater
than
0.90,
then
create
a
link.
1
2
3
137. For
exercises,
quiz
and
further
material
visit
our
website:
EUCLID
-‐
Providing
Linked
Data
137
@euclid_project
euclidproject
euclidproject
http://www.euclid-‐project.eu
Other
channels:
eBook
Course