SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Methods for Intrinsic Evaluation of
Links in the Web of Data
Cristina Sarasua, Steffen Staab, Matthias Thimm
ESWC 2017
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data 2
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover
more things.
[Berners-Lee, 2006]
“The Semantic Web isn't just about putting data on the
Web. It is about making links, so that a person or machine
can explore the Web of Data.”
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Current Link Analyses
3
3
Symmetry: “ the symmetry of
entity links varies between
different pairs of datasets.”
[Hu et al., 2015 on Life sciences
data sets]
Transitivity: “ the transitivity of
entity links is often
topic-dependent.”
[Hu et al., 2015 on Life sciences
data sets]
Link Properties
In- and Out-degree: “Bibsonomy
has links to external 91 data sets.”
Overall link predicate usage:
“owl:sameAs as the most widely
used linking predicate.”
[Schmachtenberg et al., 2014]
Link count: “The Eurostat data set
has owl:sameAs 149 links to
DBpedia.”
[CKAN][Ermilov et al., 2013][Hogan
et al.,2012]
Descriptive Statistics
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Current Link Quality Assessment Methods
4
4
Network measures and link
properties count as proxy:
“centrality, clustering coefficient, ,
sameAs chains are shown to be
partially effective at detecting
semantically correct / incorrect
links.”
[Guéret et al., 2012]
Crowdsourcing: “hybrid
methods can improve semantic
accuracy.”
[Demartini et al., 2012, Sarasua et
al.,2012, Acosta et al., 2013]
Entity Connectivity:
“50 % of the entities in
the source data set
contain links to
external entities.”
[Albertoni et al., 2013]
CompletenessSemantic Accuracy
Deadlinks: “we
found 302,855,189
unverified links, and
12,430,800 dead
links.”
[Neto et al., 2016]
Availability
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Current Link Quality Assessment Methods
5
5
Network measures and link
properties count as proxy:
“centrality, clustering coefficient,
sameAs chains are shown to be
partially effective at detecting
semantically correct / incorrect
links.”
[Guéret et al., 2012]
Crowdsourcing: “hybrid
methods can improve semantic
accuracy.”
[Demartini et al., 2012]
[Sarasua et al.,2012]
[Acosta et al., 2013]
Semantic Accuracy
Entity Connectivity:
“50 % of the entities in
the source data set
contain links to
external entities.”
[Albertoni et al., 2013]
Completeness
Deadlinks: “we found
302,855,189
unverified links, and
12,430,800 dead
links.”
[Neto et al., 2016]
Availability
To what extent do existing links
add “more things” to the
source entities?
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Why?
Help
● to understand the impact
of existing links
● to spot weak points
● with guidance for
improving existing links
Encourage iterative
maintenance
6
6
Data Publisher
(predisposed to improve her links)
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Task
Given a data set D containing the interlinking I
➔ compare D and D I and
➔ analyse the value that I gives to the source
data
In terms of the principles for data interlinking in the
Web of Data.
7
Bibsonomy ACM
owl:sameAs
owl:sameAs
owl:sameAs
7
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Task
8
Bibsonomy ACM
owl:sameAs
owl:sameAs
owl:sameAs
8
Given a data set D containing the interlinking I
➔ compare D and D I and
➔ analyse the value that I gives to the source
data
In terms of the principles for data interlinking in the
Web of Data.
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Task
9
Bibsonomy ACM
owl:sameAs
owl:sameAs
owl:sameAs
9
Given a data set D containing the interlinking I
➔ compare D and D I and
➔ analyse the value that I gives to the source
data
In terms of the principles for data interlinking in the
Web of Data.
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Task
10
Bibsonomy ACM
owl:sameAs
owl:sameAs
owl:sameAs
10
Given a data set D containing the interlinking I
➔ compare D and D I and
➔ analyse the value that I gives to the source
data
In terms of the principles for data interlinking in the
Web of Data.
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
11
11
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Extend entity description (P1)
12
12
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
d2:nn “Natasha Noy”
foaf:name
Data set 1 Data set 2
owl:sameAs
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Extend entity description (P1)
13
13
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
d2:nn “Natalya Noy”
foaf:name
d2:
goog
dbo:affiliation
d2:
p2015
dbo:swrc:publication
Data set 1 Data set 2
owl:sameAs
Better!
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Extend entity description (P1)
14
14
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
14
d2:nn “Natalya Noy”
foaf:name
d2:
goog
dbo:affiliation
d2:
p2015
dbo:swrc:publication
Data set 2
vivo:
Female
rdf:type
Even
Better!
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Extend entity connectivity (P2)
15
15
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
d2:nn “Natasha Noy”
foaf:name
d2:
stan
dbo:affiliation
Data set 1 Data set 2
owl:sameAs
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Extend entity connectivity (P2)
16
16
d1:nn “Natasha Noy”
foaf:name
d1:
goog
dbo:affiliation
d1:
p2012
dbo:swrc:publication
d2:nn “Natalya Noy”
foaf:name
d2:
goog
dbo:affiliation
Data set 1
Data set 2
owl:sameAs
d3:nn “Natalya Noy”
foaf:name
d3:
goog
dbo:affiliation
Data set 3
owl:sameAs
Better!
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Increase number of vocabularies used (P3)
17
17
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
17
d2:nn “Natasha Noy”
foaf:name
d2:
prote
ge
foaf:pastProject
d2:
biopo
rtal
foaf:project
Data set 2
foaf:
Person
rdf:type
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Principles
Increase number of vocabularies used (P3)
18
18
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
18
d2:nn “Natasha Noy”
foaf:name
d2:
post2
sioc:creator
foaf:
Perso
n
rdf:type
Data set 2
proton
:Human
rdf:type
Better!
SeaStar Framework
19
Part of the Koldfish framework
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Solution
i) We create views of the data
20
20
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
20
d2:nn “Natasha Noy”
foaf:name
d2:
prote
ge
foaf:pastProject
d2:
biopo
rtal
foaf:project
Data set 2
foaf:
Person
rdf:type
Property,Values
in description
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Solution
i) We create views of the data
21
21
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
21
d2:nn “Natasha Noy”
foaf:name
d2:
prote
ge
foaf:pastProject
d2:
biopo
rtal
foaf:project
Data set 2
foaf:
Person
rdf:type
Properties
in description
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Solution
i) We create views of the data
22
22
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
22
d2:nn “Natasha Noy”
foaf:name
d2:
prote
ge
foaf:pastProject
d2:
biopo
rtal
foaf:project
Data set 2
foaf:
Person
rdf:type
Classes
in description
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Our Solution
i) We create views of the data
23
23
d1:nn “Natasha Noy”
foaf:name
d1:
stan
dbo:affiliation
d1:
p2012
dbo:swrc:publication
Data set 1
owl:sameAs
23
d2:nn “Natasha Noy”
foaf:name
d2:
prote
ge
foaf:pastProject
d2:
biopo
rtal
foaf:project
Data set 2
foaf:
Person
rdf:type
Targeted
entities
Our Solution
24
ii) Measure what each source entity gains for each
view V, signaling redundancy.
Gain = diversity (V_with_links
) - diversity (V_without_links
)
Diversity as Shannon Entropy
E.g.:
descm(d1:nn,D wo links
) = {(foaf:name “Natasha Noy”), (o:likes d1:ISWC) }
descm(d1:nn,D w links
) = {(foaf:name “Natasha Noy”), (o:likes d1:ISWC),
(foaf:name “Natasha Noy”) }
given DE, DE’ Gain = H(DE’) - H(DE) = - 0,082
Empirical Analysis
25
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Methodology
26
26
● Data:
○ 35 data sets of the Linked Data Crawl 2014 [Schmachtenberg et
al.,2014]
○ 5 topical domains (e.g. publications, gov)
● Computed all measurements for all entities in the source data sets
See also:
https://goo.gl/6VVHBD
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Methodology
● Measure Validation (following methodology by [Beckhamal et al., 2014])
○ Discriminative Measures
■ different values across data sets
○ Independent Measures
■ Spearman correlation
● Data Set Analysis (showcase)
27
27
See also:
https://goo.gl/eD5kKX
C P,O O E D V
extended
connectivity
increase number
of vocabs
extended
description
extended
connectivity
increase number
of vocabs
C
P,O
O
E
D
V
extended
description
Results
28
SW company
L3S
DWS
MANNHEIM
DNB
BIBSONOMY
#Entities #Links #AVG
link p.entity
16 50 3,12
520 1059 2,04
49 71 1,5
2782 3577 1,3
19888 35646 1,8
Gain in description - identity links
(analysing only properties)
Results
29
SW company
L3S
DWS
MANNHEIM
DNB
BIBSONOMY
HeterogeneityGain in description - identity links
(analysing only properties)
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Results
30
30
Gain in description - identity links
(analysing properties and values)
Extending the description (P1) is the principle with highest
gain, and identity links the type of links with highest gain.
Results
31
SW company
L3S
DWS
MANNHEIM
DNB
BIBSONOMY
Actual values
Gain in description - identity links
(analysing only properties)
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Results
32
32
Gain in description - relationship links
(analysing only properties)
Redundancy appears when analysing only properties
(e.g. rdfs:seeAlso statements)
Results
33
Gain in # vocabularies
relationship and identity links
The number of vocabularies used only increases in a
limited way.
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Limitations and Future Work
● We do not include ontology mappings.
● An evaluation with data publishers should be done
○ relevance of each measurement
○ observing if (and how) they improve existing links
based on them
● Would publishing the measurements as reports help data
consumers in deciding what links to traverse?
34
34
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
Conclusions
● It is valuable to spot redundancy and counting new statements /
new vocabularies is not sufficient.
● There is room for improvement
○ homogenizing what entities in a data set gain
○ enabling a higher gain
● If no match is found, other type of links can also enrich the
entities in various dimensions.
● Main Contributions:
○ Implemented and validated measures to assess the value
(description, connectivity, classification) that links give to
source entities, signaling redundancy.
○ Empirical analysis of 35 real LOD data sets.
35
35
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Thank you!
csarasua@uni-koblenz.de
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
References
Berners-Lee, Tim. Linked Data: Design Issues.
https://www.w3.org/DesignIssues/LinkedData.html
Max Schmachtenberg, Christian Bizer, and Heiko Paulheim (2014). Adoption of
the Linked Data Best Practices in Different Topical Domains. In Proceedings of
the 13th International Semantic Web Conference ISWC 2014.
Ermilov, I., Martin, M., Lehmann, J. and Auer, S., 2013, October. Linked open
data statistics: Collection and exploitation. In International Conference on
Knowledge Engineering and the Semantic Web (pp. 242-249). Springer Berlin
Heidelberg.
Hu, W., Qiu, H. and Dumontier, M., 2015, October. Link Analysis of Life Science
Linked Data. In International Semantic Web Conference (pp. 446-462). Springer
International Publishing.
A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer. Quality
assessment for linked data: A survey
Semantic Web 7 (1), 63-93
37
37
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
References
Neto, C.B., Kontokostas, D., Hellmann, S., Müller, K. and Brümmer, M., 2016,
April. Assessing quantity and quality of links between linked data datasets. In
Proceedings of the Workshop on Linked Data on the Web Co-located with the
25th International World Wide Web Conference (WWW 2016).
Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. 2012.
Assessing linked data mappings using network measures. In Proceedings of the
9th international conference on The Semantic Web: research and applications
(ESWC'12).
Riccardo Albertoni and Asunción Gómez Pérez. 2013. Assessing linkset quality
for complementing third-party datasets. In Proceedings of the Joint EDBT/ICDT
2013 Workshops (EDBT '13). ACM, New York, NY, USA, 52-59.
DOI=http://dx.doi.org/10.1145/2457317.2457327
38
38
Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data
References
Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication,
Univ of Illinois Press. ISBN 0-252-72548-4
39
39

Weitere ähnliche Inhalte

Was ist angesagt?

An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchDavid Amerland
 
TCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingTCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingDan Chudnov
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...cscpconf
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Mark Wilkinson
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
Data-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDData-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDFrank Lynam
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
Graph-based Approaches for Organization Entity Resolution in MapReduce
Graph-based Approaches for Organization Entity Resolution in MapReduceGraph-based Approaches for Organization Entity Resolution in MapReduce
Graph-based Approaches for Organization Entity Resolution in MapReduceDeepak K
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Peter Mika
 
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Mathieu d'Aquin
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Peter Mika
 
PhD Presentation: Exploring Semantic Relationships in the Web of Data
PhD Presentation: Exploring Semantic Relationships in the Web of DataPhD Presentation: Exploring Semantic Relationships in the Web of Data
PhD Presentation: Exploring Semantic Relationships in the Web of DataLaurens De Vocht
 

Was ist angesagt? (14)

An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
TCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linkingTCDL 2009 keynote: Better living through linking
TCDL 2009 keynote: Better living through linking
 
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
WEB SEARCH ENGINE BASED SEMANTIC SIMILARITY MEASURE BETWEEN WORDS USING PATTE...
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
Building SADI Services Tutorial - SIB Workshop, Geneva, December 2015
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
Data-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCDData-mining the Semantic Web @TCD
Data-mining the Semantic Web @TCD
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Graph-based Approaches for Organization Entity Resolution in MapReduce
Graph-based Approaches for Organization Entity Resolution in MapReduceGraph-based Approaches for Organization Entity Resolution in MapReduce
Graph-based Approaches for Organization Entity Resolution in MapReduce
 
Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012Semantic Search tutorial at SemTech 2012
Semantic Search tutorial at SemTech 2012
 
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
Semantic Monitoring of Personal Web Activity to Support the Management of Tru...
 
Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012Semantic Search overview at SSSW 2012
Semantic Search overview at SSSW 2012
 
PhD Presentation: Exploring Semantic Relationships in the Web of Data
PhD Presentation: Exploring Semantic Relationships in the Web of DataPhD Presentation: Exploring Semantic Relationships in the Web of Data
PhD Presentation: Exploring Semantic Relationships in the Web of Data
 

Ähnlich wie Methods for Intrinsic Evaluation of Links in the Web of Data

Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsCristina Sarasua
 
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE HannoverHow DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE HannoverCrossref
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsCloudTechnologies
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_servicessiyaza
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebPablo Mendes
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paperbakers84
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsStefan Dietze
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Emily Nimsakont
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13DataDryad
 
Crossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinarCrossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinarCrossref
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingMaaike Duine
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 

Ähnlich wie Methods for Intrinsic Evaluation of Links in the Web of Data (20)

Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAs
 
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE HannoverHow DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover
How DataCite and Crossref Support Research Data Sharing - Crossref LIVE Hannover
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Entity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutionsEntity linking with a knowledge base issues techniques and solutions
Entity linking with a knowledge base issues techniques and solutions
 
992 sms10 social_media_services
992 sms10 social_media_services992 sms10 social_media_services
992 sms10 social_media_services
 
Dynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data WebDynamic Associative Relationships on the Linked Open Data Web
Dynamic Associative Relationships on the Linked Open Data Web
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
ICICCE0280
ICICCE0280ICICCE0280
ICICCE0280
 
Viewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View PaperViewing Closest Relatives in the My Relatives View Paper
Viewing Closest Relatives in the My Relatives View Paper
 
What's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked DatasetsWhat's all the data about? - Linking and Profiling of Linked Datasets
What's all the data about? - Linking and Profiling of Linked Datasets
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
Crossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinarCrossref DataCite joint data citation webinar
Crossref DataCite joint data citation webinar
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Data, data, everywhere? Not nearly enough!
Data, data, everywhere? Not nearly enough!Data, data, everywhere? Not nearly enough!
Data, data, everywhere? Not nearly enough!
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 

Mehr von Cristina Sarasua

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata EditorsCristina Sarasua
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonCristina Sarasua
 
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Cristina Sarasua
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCristina Sarasua
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Cristina Sarasua
 

Mehr von Cristina Sarasua (14)

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata Editors
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Closing session
Closing sessionClosing session
Closing session
 
Reviews and awards
Reviews and awardsReviews and awards
Reviews and awards
 
Crowd statement marathon
Crowd statement marathonCrowd statement marathon
Crowd statement marathon
 
Paper presentations1
Paper presentations1Paper presentations1
Paper presentations1
 
Paper presentations2
Paper presentations2Paper presentations2
Paper presentations2
 
Hello session
Hello sessionHello session
Hello session
 
Tecnología e Igualdad
Tecnología e IgualdadTecnología e Igualdad
Tecnología e Igualdad
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata Editathon
 
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
 
Swib2014csarasua
Swib2014csarasuaSwib2014csarasua
Swib2014csarasua
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro Work
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...
 

Kürzlich hochgeladen

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Methods for Intrinsic Evaluation of Links in the Web of Data

  • 1. Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Methods for Intrinsic Evaluation of Links in the Web of Data Cristina Sarasua, Steffen Staab, Matthias Thimm ESWC 2017
  • 2. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data 2 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs, so that they can discover more things. [Berners-Lee, 2006] “The Semantic Web isn't just about putting data on the Web. It is about making links, so that a person or machine can explore the Web of Data.”
  • 3. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Current Link Analyses 3 3 Symmetry: “ the symmetry of entity links varies between different pairs of datasets.” [Hu et al., 2015 on Life sciences data sets] Transitivity: “ the transitivity of entity links is often topic-dependent.” [Hu et al., 2015 on Life sciences data sets] Link Properties In- and Out-degree: “Bibsonomy has links to external 91 data sets.” Overall link predicate usage: “owl:sameAs as the most widely used linking predicate.” [Schmachtenberg et al., 2014] Link count: “The Eurostat data set has owl:sameAs 149 links to DBpedia.” [CKAN][Ermilov et al., 2013][Hogan et al.,2012] Descriptive Statistics
  • 4. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Current Link Quality Assessment Methods 4 4 Network measures and link properties count as proxy: “centrality, clustering coefficient, , sameAs chains are shown to be partially effective at detecting semantically correct / incorrect links.” [Guéret et al., 2012] Crowdsourcing: “hybrid methods can improve semantic accuracy.” [Demartini et al., 2012, Sarasua et al.,2012, Acosta et al., 2013] Entity Connectivity: “50 % of the entities in the source data set contain links to external entities.” [Albertoni et al., 2013] CompletenessSemantic Accuracy Deadlinks: “we found 302,855,189 unverified links, and 12,430,800 dead links.” [Neto et al., 2016] Availability
  • 5. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Current Link Quality Assessment Methods 5 5 Network measures and link properties count as proxy: “centrality, clustering coefficient, sameAs chains are shown to be partially effective at detecting semantically correct / incorrect links.” [Guéret et al., 2012] Crowdsourcing: “hybrid methods can improve semantic accuracy.” [Demartini et al., 2012] [Sarasua et al.,2012] [Acosta et al., 2013] Semantic Accuracy Entity Connectivity: “50 % of the entities in the source data set contain links to external entities.” [Albertoni et al., 2013] Completeness Deadlinks: “we found 302,855,189 unverified links, and 12,430,800 dead links.” [Neto et al., 2016] Availability To what extent do existing links add “more things” to the source entities?
  • 6. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Why? Help ● to understand the impact of existing links ● to spot weak points ● with guidance for improving existing links Encourage iterative maintenance 6 6 Data Publisher (predisposed to improve her links)
  • 7. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Task Given a data set D containing the interlinking I ➔ compare D and D I and ➔ analyse the value that I gives to the source data In terms of the principles for data interlinking in the Web of Data. 7 Bibsonomy ACM owl:sameAs owl:sameAs owl:sameAs 7
  • 8. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Task 8 Bibsonomy ACM owl:sameAs owl:sameAs owl:sameAs 8 Given a data set D containing the interlinking I ➔ compare D and D I and ➔ analyse the value that I gives to the source data In terms of the principles for data interlinking in the Web of Data.
  • 9. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Task 9 Bibsonomy ACM owl:sameAs owl:sameAs owl:sameAs 9 Given a data set D containing the interlinking I ➔ compare D and D I and ➔ analyse the value that I gives to the source data In terms of the principles for data interlinking in the Web of Data.
  • 10. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Task 10 Bibsonomy ACM owl:sameAs owl:sameAs owl:sameAs 10 Given a data set D containing the interlinking I ➔ compare D and D I and ➔ analyse the value that I gives to the source data In terms of the principles for data interlinking in the Web of Data.
  • 11. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles 11 11 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1
  • 12. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Extend entity description (P1) 12 12 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication d2:nn “Natasha Noy” foaf:name Data set 1 Data set 2 owl:sameAs
  • 13. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Extend entity description (P1) 13 13 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication d2:nn “Natalya Noy” foaf:name d2: goog dbo:affiliation d2: p2015 dbo:swrc:publication Data set 1 Data set 2 owl:sameAs Better!
  • 14. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Extend entity description (P1) 14 14 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 14 d2:nn “Natalya Noy” foaf:name d2: goog dbo:affiliation d2: p2015 dbo:swrc:publication Data set 2 vivo: Female rdf:type Even Better!
  • 15. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Extend entity connectivity (P2) 15 15 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication d2:nn “Natasha Noy” foaf:name d2: stan dbo:affiliation Data set 1 Data set 2 owl:sameAs
  • 16. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Extend entity connectivity (P2) 16 16 d1:nn “Natasha Noy” foaf:name d1: goog dbo:affiliation d1: p2012 dbo:swrc:publication d2:nn “Natalya Noy” foaf:name d2: goog dbo:affiliation Data set 1 Data set 2 owl:sameAs d3:nn “Natalya Noy” foaf:name d3: goog dbo:affiliation Data set 3 owl:sameAs Better!
  • 17. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Increase number of vocabularies used (P3) 17 17 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 17 d2:nn “Natasha Noy” foaf:name d2: prote ge foaf:pastProject d2: biopo rtal foaf:project Data set 2 foaf: Person rdf:type
  • 18. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Principles Increase number of vocabularies used (P3) 18 18 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 18 d2:nn “Natasha Noy” foaf:name d2: post2 sioc:creator foaf: Perso n rdf:type Data set 2 proton :Human rdf:type Better!
  • 19. SeaStar Framework 19 Part of the Koldfish framework
  • 20. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Solution i) We create views of the data 20 20 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 20 d2:nn “Natasha Noy” foaf:name d2: prote ge foaf:pastProject d2: biopo rtal foaf:project Data set 2 foaf: Person rdf:type Property,Values in description
  • 21. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Solution i) We create views of the data 21 21 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 21 d2:nn “Natasha Noy” foaf:name d2: prote ge foaf:pastProject d2: biopo rtal foaf:project Data set 2 foaf: Person rdf:type Properties in description
  • 22. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Solution i) We create views of the data 22 22 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 22 d2:nn “Natasha Noy” foaf:name d2: prote ge foaf:pastProject d2: biopo rtal foaf:project Data set 2 foaf: Person rdf:type Classes in description
  • 23. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Our Solution i) We create views of the data 23 23 d1:nn “Natasha Noy” foaf:name d1: stan dbo:affiliation d1: p2012 dbo:swrc:publication Data set 1 owl:sameAs 23 d2:nn “Natasha Noy” foaf:name d2: prote ge foaf:pastProject d2: biopo rtal foaf:project Data set 2 foaf: Person rdf:type Targeted entities
  • 24. Our Solution 24 ii) Measure what each source entity gains for each view V, signaling redundancy. Gain = diversity (V_with_links ) - diversity (V_without_links ) Diversity as Shannon Entropy E.g.: descm(d1:nn,D wo links ) = {(foaf:name “Natasha Noy”), (o:likes d1:ISWC) } descm(d1:nn,D w links ) = {(foaf:name “Natasha Noy”), (o:likes d1:ISWC), (foaf:name “Natasha Noy”) } given DE, DE’ Gain = H(DE’) - H(DE) = - 0,082
  • 26. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Methodology 26 26 ● Data: ○ 35 data sets of the Linked Data Crawl 2014 [Schmachtenberg et al.,2014] ○ 5 topical domains (e.g. publications, gov) ● Computed all measurements for all entities in the source data sets See also: https://goo.gl/6VVHBD
  • 27. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Methodology ● Measure Validation (following methodology by [Beckhamal et al., 2014]) ○ Discriminative Measures ■ different values across data sets ○ Independent Measures ■ Spearman correlation ● Data Set Analysis (showcase) 27 27 See also: https://goo.gl/eD5kKX C P,O O E D V extended connectivity increase number of vocabs extended description extended connectivity increase number of vocabs C P,O O E D V extended description
  • 28. Results 28 SW company L3S DWS MANNHEIM DNB BIBSONOMY #Entities #Links #AVG link p.entity 16 50 3,12 520 1059 2,04 49 71 1,5 2782 3577 1,3 19888 35646 1,8 Gain in description - identity links (analysing only properties)
  • 29. Results 29 SW company L3S DWS MANNHEIM DNB BIBSONOMY HeterogeneityGain in description - identity links (analysing only properties)
  • 30. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Results 30 30 Gain in description - identity links (analysing properties and values) Extending the description (P1) is the principle with highest gain, and identity links the type of links with highest gain.
  • 31. Results 31 SW company L3S DWS MANNHEIM DNB BIBSONOMY Actual values Gain in description - identity links (analysing only properties)
  • 32. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Results 32 32 Gain in description - relationship links (analysing only properties) Redundancy appears when analysing only properties (e.g. rdfs:seeAlso statements)
  • 33. Results 33 Gain in # vocabularies relationship and identity links The number of vocabularies used only increases in a limited way.
  • 34. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Limitations and Future Work ● We do not include ontology mappings. ● An evaluation with data publishers should be done ○ relevance of each measurement ○ observing if (and how) they improve existing links based on them ● Would publishing the measurements as reports help data consumers in deciding what links to traverse? 34 34
  • 35. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data Conclusions ● It is valuable to spot redundancy and counting new statements / new vocabularies is not sufficient. ● There is room for improvement ○ homogenizing what entities in a data set gain ○ enabling a higher gain ● If no match is found, other type of links can also enrich the entities in various dimensions. ● Main Contributions: ○ Implemented and validated measures to assess the value (description, connectivity, classification) that links give to source entities, signaling redundancy. ○ Empirical analysis of 35 real LOD data sets. 35 35
  • 36. Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Thank you! csarasua@uni-koblenz.de
  • 37. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data References Berners-Lee, Tim. Linked Data: Design Issues. https://www.w3.org/DesignIssues/LinkedData.html Max Schmachtenberg, Christian Bizer, and Heiko Paulheim (2014). Adoption of the Linked Data Best Practices in Different Topical Domains. In Proceedings of the 13th International Semantic Web Conference ISWC 2014. Ermilov, I., Martin, M., Lehmann, J. and Auer, S., 2013, October. Linked open data statistics: Collection and exploitation. In International Conference on Knowledge Engineering and the Semantic Web (pp. 242-249). Springer Berlin Heidelberg. Hu, W., Qiu, H. and Dumontier, M., 2015, October. Link Analysis of Life Science Linked Data. In International Semantic Web Conference (pp. 446-462). Springer International Publishing. A Zaveri, A Rula, A Maurino, R Pietrobon, J Lehmann, S Auer. Quality assessment for linked data: A survey Semantic Web 7 (1), 63-93 37 37
  • 38. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data References Neto, C.B., Kontokostas, D., Hellmann, S., Müller, K. and Brümmer, M., 2016, April. Assessing quantity and quality of links between linked data datasets. In Proceedings of the Workshop on Linked Data on the Web Co-located with the 25th International World Wide Web Conference (WWW 2016). Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. 2012. Assessing linked data mappings using network measures. In Proceedings of the 9th international conference on The Semantic Web: research and applications (ESWC'12). Riccardo Albertoni and Asunción Gómez Pérez. 2013. Assessing linkset quality for complementing third-party datasets. In Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT '13). ACM, New York, NY, USA, 52-59. DOI=http://dx.doi.org/10.1145/2457317.2457327 38 38
  • 39. Cristina Sarasua Methods for Intrinsic Evaluation of Links in the Web of Data References Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication, Univ of Illinois Press. ISBN 0-252-72548-4 39 39