Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
2. Why do we create and use Linked Open Data?
Example questions from
the humanities and
social sciences
How did the debate about
the financial crisis in
Greece develop?
3. Searching the proceedings of the EU Parliament
"Greece" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
050100150200
1999 2000 2001 2001 2002 2003 2004 2005 2006 2006 2007 2008 2009 2010 2010 2011 2012 2013
5. Search volumes on a search engine
Query = “Greece”
http://www.google.com/trends
6. Search volumes on a search engine
Query = “Greece”
http://www.google.com/trends
We need access to data. Analysing
them gives us some useful insight.
But to answer the question properly
we would need to combine sources
and do more complex queries.
7. Why do we create and use Linked Open Data?
Example question 2
Which political debate in the
post-war period has attracted
most media attention?
9. “De Indonesische Quaestie"
To answer this question we need to
go through all newspaper articles
about all political debates.
-> we need access to combined
data sources, we need
structured queries.
11. Why do we create and use Linked Open Data?
Example question 3
What are the differences
between different media?
Example question 4
Has the coverage changed
over time?
12. Research goals and research questions
Our goal is to build an infrastructure to answer these kinds of questions.
1. How do we automatically link heterogeneous datasets?
2. How do we interpret links between datasets of different quality and certainty?
3. What can we conclude from usage statistics on these datasets?
4. Can we design interfaces that allow scholars to study the datasets
• including the links between them?
• while assessing the reliability of the findings?
13. Research goals and research questions
Our goal is to build an infrastructure to answer these kinds of questions.
1. How do we automatically link heterogeneous datasets?
2. How do we interpret links between datasets of different quality and certainty?
3. What can we conclude from usage statistics on these datasets?
4. Can we design interfaces that allow scholars to study the datasets
• including the links between them?
• while assessing the reliability of the findings?
Data Science - Big Data - Linked Open Data
14. Table of Contents
1. What is Linked Open Data (LOD)
2. Creating LOD
1. How to discover links
2. How to represent links on the Web
3. How to evaluate links
3. Access to LOD (from both the server and the client
perspective)
17. What is Linked Open Data?
A method of publishing structured data on the Web
in such a way that it can be linked and queried
by computers as well as humans.
19. The Web of Documents
• Documents
identified
by
URIs
(html,
pdf,
images,
movies,
etc.)
• with
structured
information
for
humans
(tables,
headers)
and
• with
hyperlinks
between
them
• The
data
is
not
machine
readable,
meant
for
humans
• structure
is
implicit
(what
do
the
columns
of
a
table
mean?)
• links
are
not
typed
(what
is
the
relation
between
two
documents?)
21. The Web of Data
• Everything
identified
by
URIs
(not
just
documents,
but
also
classes,
instances,
relations/links)
• The
data
is
machine
readable:
• in
formal
languages
(RDF,
RDFS,
OWL,
SKOS)
• which
enable
machines
to
do
reasoning,
i.e.
infer
new
statements
from
inserted
statements.
22. Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
23. Thing Type Population Airport
Amsterdam City 1364422 Schiphol
…. … …. …
Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
24. Differences:
• Statements can be distributed over the web
• Non-unique naming assumption
• Open World assumption
• Everyone can say anything about anything
Thing Type Population Airport
Amsterdam City 1364422 Schiphol
…. … …. …
Compared to a database table…
Amsterdam
has population
“1364422” City Schiphol
is a has airport
25. Examples of URIs on the Web of Data
• documents:
• http://vu.nl/index.html
• http://example.org/cities#Leuven
• real world objects (a book in the library, a person)
• isbn://5031-4444-333
• http://eyaloren.org/foaf.rdf#me
• concepts:
• http://cyc.org/concept/Mammal
• http://cyc.org/concept/Dog
• www.w3.org/2006/03/wn/wn20/instances/synset-anniversary-noun-1
• relations:
• http://purl.org/linkedpolitics/vocabulary/speaker
26. RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”
• See https://www.w3.org/RDF/
• RDF data model: triples!
27. RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”
• See https://www.w3.org/RDF/
• RDF data model: triples!
28. RDF (the basics)
• A W3C recommendation to
describe resources on the Web
of Data called “Resource
description Framework”
• See https://www.w3.org/RDF/
• RDF data model: triples!
RDF example in Turtle syntax:
<bob#me>
a foaf:Person ;
foaf:knows <alice#me> ;
schema:birthDate "1990-07-04"^^xsd:date ;
foaf:topic_interest wd:Q12418 .
29. Vocabulary definition and reasoning with RDFS
B
C
r
A
data level
ontology / vocabulary /
schema level
30. Vocabulary definition and reasoning with RDFS
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
B
C
r
A
data level
ontology / vocabulary /
schema level
31. Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
C rdfs:subClassOf B
r rdf:type C
THEN
r rdf:type B
B
C
r
A
data level
ontology / vocabulary /
schema level
32. Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
33. Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
<bob#me> a foaf:Agent .
34. Vocabulary definition and reasoning with RDFS
A
B
A
B
C
IF
B rdfs:subClassOf A
C rdfs:subClassOf B
THEN
C rdfs:subClassOf A
IF
B rdfs:subClassOf A
r rdf:type B
THEN
r rdf:type A
<bob#me> rdf:type foaf:Person .
foaf:Person rdfs:subClassOf foaf:Agent .
<bob#me> a foaf:Agent .
Standard meaning
35. Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
36. Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
<alice#me> rdf:type foaf:Person .
37. Vocabulary definition and reasoning with RDFS
IF
p rdfs:range R
A p B
THEN
B rdf:type R
<bob#me> foaf:knows <alice#me> .
foaf:knows rdfs:range foaf:Person .
<alice#me> rdf:type foaf:Person .
Standard meaning
38. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
39. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
40. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
41. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
42. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
43. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
44. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
45. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
46. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
47. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
Query: :JamesDean ?what :Giant.
48. SPARQL (the basics)
• A W3C recommendation for querying RDF graphs called “SPARQL Protocol
And RDF Query Language”
• See http://www.w3.org/TR/rdf-sparql-query/ or http://www.w3.org/TR/
sparql11-query/
Data: :JamesDean :playedIn :Giant .
Query: :JamesDean :playedIn ?what .
Answer: :Giant
Query: ?who :playedIn :Giant.
Answer: :JamesDean
Query: :JamesDean ?what :Giant.
Answer: :playedIn
49. Linked Open Data
A method of publishing on the Web of Data: openly
available, in RDF, with links to other datasets.
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
50. Linked Open Data
A method of publishing on the Web of Data: openly
available, in RDF, with links to other datasets.
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
51. Creating Linked Open Data
in the Talk of Europe project:
Discovering links, knowledge representation
52. Creating Linked Open Data
in the Talk of Europe project:
Discovering links, knowledge representation
53. The European Parliament as Linked Open Data
Laura Hollink Centrum Wiskunde & Informatica, Amsterdam
Astrid van Aggelen VU University Amsterdam
Martijn Kleppe Erasmus University Rotterdam
Henri Beunders Erasmus University Rotterdam
Jill Briggeman Erasmus University Rotterdam
Max Kemman University of Luxembourg
54. Talk of Europe goals
• To publish the entire plenary debates of the European
Parliament as Linked Open Data
• To improve access to the data
• To enable large scale analysis across time spans.
‣To residents of the European Union access to the proceedings
of the European parliament is a formal right.
A. van Aggelen, L. Hollink, M.
Kemman, M. Kleppe & H. Beunders.
The debates of the European
Parliament as Linked Open Data.
Semantic Web Journal. In press, 2016.
57. 1. Data in RDF
14M RDF statements about the 30K
speeches in 23 languages by 3K
speakers in 1K session days that
were held in the EU parliament
between 1999 and 2014
61. Example 1: speeches that contain a certain keyword
Query: all speeches that contain the phrase “open data”
…. So let us go for open data, let us
go for utilisation of all the instruments
available to that end! …..
…. but there too governments are
encouraging the use of open data to
increase transparency, accountability
and citizen participation ….
…. We already have many open data
projects in the Member States and
local authorities…..
62. Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
63. Example 2: speeches that contain a certain
keyword by date
"Slovenia" in the plenary meetings of the European Parliament
Year
Nr.ofmentions
020406080100
1999 2000 2001 2003 2004 2005 2006 2007 2008 2010 2011 2012 2013
64. Example 2: speeches that contain a certain keyword
by date
Mentions of 'human rights'
dates
Frequency
0200400600800
1999 2000 2001 2003 2004 2005 2006 2007 2009 2010 2011 2012 2013
65. Example 3: speeches that contain a certain keyword
by country
AT BE BG CY CZ DE DK EE ES FI FR GB GR HR HU IE IT LT LU LV MT NL PL PT RO SE SI SK
Mentions of 'human rights' by country
01000200030004000500060007000
66. Example 4: the number of speeches per EU
country
SELECT ?c (COUNT(?c) as ?count)
WHERE {
?x rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/plenary/Speech>.
?x <http://purl.org/linkedpolitics/vocabulary#speaker> ?p.
?p <http://purl.org/linkedpolitics/vocabulary#countryOfRepresentation> ?c
} GROUP BY ?c LIMIT 50
67. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
68. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
69. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
70. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
71. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
72. Example 5: background info about the MEPs
• MEPs that were not born in Europe.
Members of Parliament
Integrate data from
the EU parliament
with external datasets
76. Linking Members of Parliament to Wikipedia /
DBpedia
• String matching is the most important feature in the linking process.
• “nearly all [alignment systems] use a string similarity metric” [12]
• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
http://www.dbpedia.org/resource/Judith_Sargentini
77. Linking Members of Parliament to Wikipedia /
DBpedia
• String matching is the most important feature in the linking process.
• “nearly all [alignment systems] use a string similarity metric” [12]
• stopping and stemming is not helpful! Nor is using WordNet synonyms. [12]
[12] Cheatham, M., & Hitzler, P. String
similarity metrics for ontology alignment.
ISWC 2013.
http://www.dbpedia.org/resource/Judith_Sargentini
78. How to relate a speech to a speaker and party?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
79. How to relate a speech to a speaker and party?
Why is this not a good solution?
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
80. How to relate a speech to a speaker and party?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
81. How to relate a speech to a speaker and party?
Why is this not a good solution?
1. A person might be a member of more than one party (at different times)
2. Since there is no link between a speech and a party, queries for all speeches
spoken by the members of a certain party become very complicated.
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:speaker
lp:EUParty/SomeParty
lpv:hasParty
82. How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
83. How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
84. How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
85. How to relate a speech to a speaker and party?
"20111126"^ xsd:date
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitute
lpv:political
Function
lpv:institution
lpv:speaker
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:speaker
rdf:type
"20111126"^ xsd:date
lp:political-
Function101
lpv:end
"20111126"^
xsd:date
lpv:beginning
"20071114"
^xsd:date
lpv:PoliticalFunction
"20090716"^ xsd:date
lp:political-
Function102
lpv:beginning
lpv:end
lp:EUmember_1023
lp:political
Function
lp:eu/plenary/2009-10-21/Speech_140>
lpv:role
lp:EUCommittee/
Committee_on_Legal_Affairs
lp:Role/substitutelp:Role/member
lp:EUParty/NI
lpv:role
lpv:political
Function
lpv:institutionlpv:institution rdf:type
lpv:spokenAs
lpv:speaker
lpv:spokenAs
rdf:type
Note: this is a common “design pattern”
referred to as n-ary relations or
relations as classes
86. Intermezzo: one-question Quiz
Reasoning on the Web of Data
Question: What can we conclude from this graph?
A. Stihler is a member of exactly 3 parties
B. Stihler is a member of at least 3 parties
C. Stihler is a member of at most 3 parties
D. None of the above
E. All of the above
F. Other, namely ….
http://purl.org/linkedpolitics/EUmember_4545 "Catherine Stihler"foaf:name
http://purl.org/linkedpolitics/EUParty/PES
http://dbpedia.org/resource/
Party_of_European_Socialists
http://dbpedia.org/resource/
Progressive_Alliance_of_Socialists_and_Democrats
:memberOf
:memberOf
:memberOf
87. Creating Linked Open Data
in the PoliMedia project:
Discovering links, knowledge representation, evaluation
88. Creating Linked Open Data
in the PoliMedia project:
Discovering links, knowledge representation, evaluation
91. Which political debate in
the post-war period has
attracted most media
attention?
What are the differences
between different media?
Has the coverage changed
over time?
92. Transcriptions of all 9,294
meetings of the Dutch
parliament between
1945-1995, consisting of
1,208,903 speeches.
Roughly 1.8 Million news
bulletins between
1937-1984
(We only use 1945-1995)
Archives of hundreds of
newspaper with tons of
newspaper issues or 10’s
of Millions of articles
between 1618-1995.
(We only use 1945-1995)
Transcriptions of all
meetings of the
European Parliament
between 1999 and
2014.
95. Discovering links between politics and news
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
96. Step 2: generate links
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
97. Step 2: generate links
Detect
topics in
speeches
Create
queries
Search
newspaper
archive
Topics
Named
Entities
Name of
speaker
Detect
Named
Entities in
speeches
Candidate
articles
Queries
Rank
candidate
articles
Links
between
speeches
and articles
Debates
Date of
debate
Intuition 1: The name of the speaker should
appear in the article and the article should
be published within a week of the debate
Intuition 2: the more the article and the
speech overlap in terms of topics and
named entities, the more they are related.
99. Representation of links
• Note: this is another
example of
the“design pattern”
referred to as n-ary
relations or relations
as classes!
• It allows us to save
provenance
information about
the statements we
create.
:speech123:newsArticle456 :isAbout
100. Representation of links
• Note: this is another
example of
the“design pattern”
referred to as n-ary
relations or relations
as classes!
• It allows us to save
provenance
information about
the statements we
create.
:speech123:newsArticle456 :isAbout
:speech123
:newsArticle456
:link001
01-02-2013 :PoliMedia_Linking_Engine
:quotes
:concept1
:concept2
link type
:madeBy:creationDate
102. Evaluation of links
1. Manually rating (a sample of) links
• relatively cheap and easy to interpret
• only precision, no recall
103. Evaluation of links
1. Manually rating (a sample of) links
• relatively cheap and easy to interpret
• only precision, no recall
2. Comparison to a reference linkset
• precision and recall
• used in OAEI on the SEALS platform
• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
104. Evaluation of links
1. Manually rating (a sample of) links
• relatively cheap and easy to interpret
• only precision, no recall
2. Comparison to a reference linkset
• precision and recall
• used in OAEI on the SEALS platform
• more expensive if a reference alignment has to be
created (but: crowd sourcing!)
3. End-to-end evaluation (a.k.a. evaluating an application
that uses the mappings)
• arguably the best method!
• need to have access to an application + users
105. Evaluation of links: beyond precision / recall
B
C
r
A
data level
ontology / vocabulary /
schema level
106. Evaluation of links: beyond precision / recall
Generalized precision and Generalized recall
• Instead of a binary classification into correct/
incorrect mappings, take into account how wrong
an link is:
• where r(a) is the semantic distance between
correspondence a and correspondence a’ in the
reference alignment, A is the number of
correspondences.
Laura Hollink, Mark van Assem, Shenghui
Wang, Antoine Isaac, Guus Schreiber. Two
Variations on Ontology Alignment
Evaluation: Methodological Issues.ESWC
2008.
B
C
r
A
data level
ontology / vocabulary /
schema level
107. Evaluation of links in PoliMedia
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.
• a pilot study showed that we needed
more than a 2 point scale.
• inter-rater agreement: 0.5 ->
acceptable, but not high.
• Precision: 80%
108. Evaluation of links in PoliMedia
Setting 1 Setting 2 Setting 3
0,48 0,62 0,8
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.
• a pilot study showed that we needed
more than a 2 point scale.
• inter-rater agreement: 0.5 ->
acceptable, but not high.
• Precision: 80%
109. Evaluation of links in PoliMedia
Setting 1 Setting 2 Setting 3
0,48 0,62 0,8
How many links did we miss?
• We ask the raters to
manually search the KB
archives for related
articles.
• Recall: 62%
How good are the links?
• We ask 2 raters to manually score pairs of
newspaper articles and speeches.
• a pilot study showed that we needed
more than a 2 point scale.
• inter-rater agreement: 0.5 ->
acceptable, but not high.
• Precision: 80%
116. Online database:
“SPARQL endpoint”
• A service to query a knowledge
base using the SPARQL query
language.
“All speeches with more
than 60 associated news
items.”
117. Access to Linked Open Data: how to serve and
how to consume Linked Open Data
118. Access to Linked Open Data: how to serve and
how to consume Linked Open Data
122. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
123. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
124. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
125. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
126. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
127. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
dbp:children
"2"
lpv:speaker
dbc:Officiers_of_the_Légion_d'honneur
128. Access to LOD 2: follow-your-nose
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/
2013-11-20/Speech_103
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
dc:hasPart
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
lp:eu/plenary/
2013-11-20/Speech_103
...the fittest need to
struggle for the
survival of the
weak.[...]"@en
lpv:spokenText
lpv:speaker
lp:Speaker_Malala_Yousafzai
"Award of the Sakharov Prize (formal sitting)."@en
dc:title
dc:hasPart
lp:eu/plenary/
2013-11-20/AgendaItem_6
lp:eu/plenary/2013-11-20/
Speech_104
lpv:has
Subsequent
...Ich glaube, das war ein
außergewöhnlicher Moment
für uns alle hier in diesem
Parlament[...]"@en
lpv:spokenText
lpv:speaker
owl:sameAs
http:://dbpedia.org/
resource/Martin_Schulz
dc:hasPart
lp:Martin_Schulz
dbp:children
"2"
lpv:speaker
dbc:Officiers_of_the_Légion_d'honneur
From server logs we know the requested URI:
GET /Martin_Schulz HTTP/1.0 Accept: application/rdf+xml
129. Count the agenda items in which at least one MEP from
France spoke out.
Access to LOD: 3. SPARQL
SELECT (COUNT (DISTINCT ?ai) as ?count)
WHERE {
?ai rdf:type <http://purl.org/linkedpolitics/vocabulary/eu/
plenary/AgendaItem
?ai dcterms:hasPart ?speech.
?speech lpv:speaker ?speaker.
?speaker lpv:countryOfRepresentation ?country.
?country rdfs:label ?label.
filter(?label="France"@en)
}
130.
131. From server logs we know the query
-some context of the requested URIs
-variable names (?)
132.
133.
134.
135. Access to LOD: 4. Linked Data Fragments
xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000]
"GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin
HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en"
…
136. Access to LOD: 4. Linked Data Fragments
xxx.xxx.xxx.xxx - - [17/Oct/2014:07:43:02 +0000]
"GET /2014/en?subject=&predicate=&object=dbpedia%3AAustin
HTTP/1.1" 200 1309 "http://fragments.dbpedia.org/2014/en"
…
From server logs we know the triple patterns that were
requested
-some context of the requested URIs
-variable names (?)
137. What do we know about usage of Linked Open
Data?
138. What do we know about usage of Linked Open
Data?
139. 1. Yearly datasets of server logs released for research purposes, 2011-2016
Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016)
USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344
2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016
Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al.
http://usewod.org/
USEWOD2011
2016
Linked Open Data query log analysis?
140. 1. Yearly datasets of server logs released for research purposes, 2011-2016
Luczak-Roesch, Markus, Aljaloud, Saud, Berendt, Bettina and Hollink, Laura (2016)
USEWOD 2016 Research Dataset. doi:10.5258/SOTON/385344
2. Yearly workshops for researchers on Usage Data and the Web of Data, 2011-2016
Laura Hollink, Markus Luczak-Roesch, Bettina Berendt, et al.
http://usewod.org/
USEWOD2011
2016
Linked Open Data query log analysis?
Licensing + Anonymization:
replace all IPs with a
country code and an
identifier
141. What has been found so far?
• Efficient index generation [1]
• Caching [2]
• Auto-completion [3]
• Hardware scaling at peak times [4]
• modularisation of data [4]
[1] Arias, M., Fernández, J. D., Martínez-Prieto, M. A., & de
la Fuente, P. (2011). An empirical study of real-world
SPARQL queries. USEWOD2011
[2] Lorey, J., & Naumann, F. Caching and prefetching
strategies for sparql queries. USEWOD2013
[3] K. Kramer,R.Q. Dividino, and G. Gröner. SPACE:
SPARQL Index for Efficient Autocompletion. ISWC2013
(Posters & Demos)
[4] Luczak-Rösch, M., & Bischoff, M. (2011). Statistical
analysis of web of data usage. EvoDyn2011
[5] Rietveld, L., & Hoekstra, R. Man vs. Machine:
Differences in SPARQL Queries. USEWOD2014
[6] Huelss, J., & Paulheim, H. What SPARQL Query Logs
Tell and do not Tell about Semantic Relatedness in LOD.
NoISE @ ESWC 2015
Issues:
• what is the difference between queries by machines and
humans? [5]
• what is the meaning of repeated queries by tools? Bots?
• a lot of the usage is invisible due to data dump
download
[6]
142. Reflection: to what extend can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?
Which political event has
attracted most media
attention?
What are the differences
between different media?
Has the coverage changed
over time?
143. Reflection: to what extend can we now answer
these questions?
How did the debate about the
financial crisis in Greece
develop?
Which political event has
attracted most media
attention?
What are the differences
between different media?
Has the coverage changed
over time?
Yes, but:
• what is the influence of the selection of newspapers
available at the National Library?
• what was the quality of the digitisation process (OCR)?
• How good is our linking approach (based on
automatically detected entities and topics)?
➡ How to handle these uncertainties is one of our research
questions! We call this Tool Criticism
144. Resources:
PoliMedia demo: http://polimedia.nl/
PoliMedia project video: https://youtu.be/u24oRCj7xrQ
Talk of Europe project: http://talkofeurope.eu/
Talk of Europe data: purl.org/linkedpolitics
Talk of Europe project video: https://youtu.be/GxA53gkCe0o
USEWOD workshop: http://usewod.org/
My website: http://homepages.cwi.nl/~hollink/
I’d be happy to answer your questions!