SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
How links can make your
open data even greater
Cristina Sarasua
Institute for Web Science and Technologies (WeST)
University of Koblenz-Landau, DE
Open Data Day 2017
Zurich, CH
Goals
● Spread the word about Semantic Web and
Linked Data technologies
● Share tips on how to link your data properly
Open Data
● Data implemented in any open format
○ CSV/TSV, XML,JSON etc.
● Made available for free
● By any organisation or individual person
● “Usable, reusable and distributable“
(Open Definition, by OKFN [1])
● Findable
○ Registered in data repositories
○ Via search engines
❖ Boosts
transparency
❖ Enables
reinterpretation of
the data
❖ Facilitates the
development of
new applications
[1] http://opendefinition.org/
The Web, an ocean of data
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
The Web, an ocean of data
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
How many trains
go from Zurich
to Basel SBB
daily?
How frequently do
trains arrive to
Basel Bad Bf?
What are the most
populated cities in
the south of
Germany?
The Web, an ocean of data
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
How many trains
go from Zurich
to Basel SBB
daily?
How frequently
do trains arrive
to Basel BB?
What are the most
populated cities in
the south of
Germany?
Who is in average more
punctual in locations with
more than 50,000
inhabitants, German or
Swiss trains?
The Web, an ocean of data
● Collect data
● Put the data together
● Transform data
● Enable joint query
The main effort is on the
data consumer side
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
CSV JSON HTML
Land
Schweiz CH
Staat Country
Switzerland
Linien Haltestelle Cities
The Web, an ocean of data
● Collect data
● Put the data together
● Transform data
● Enable joint query
The main effort is on the
data consumer side
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
CSV JSON HTML
Land
Schweiz CH
Staat Country
Switzerland
Linien Haltestelle Cities
The Web, an ocean of data
● Collect data
● Put the data together
● Transform data
● Enable joint query
The main effort is on the
data consumer side
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
CSV JSON HTML
Land
Schweiz CH
Staat Country
Switzerland
Linien Haltestelle Cities
The Web, an ocean of data
● Collect data
● Put the data together
● Transform data
● Enable joint query
The main effort is on the
data consumer side
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
CSV JSON HTML
Land
Schweiz CH
Staat Country
Switzerland
Linien Haltestelle Cities
The Web, an ocean of data
● Collect data
● Put the data together
● Transform data
● Enable joint query
The main effort is on the
data consumer side
Logos:
SBB: http://www.sbb.ch/en/home.html
Deutsche Bahn:https://www.bahn.de/p/view/index.shtml
Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
CSV JSON HTML
Land
Schweiz CH
Staat Country
Switzerland
Linien Haltestelle Cities
The Web of Data
● Publish an explicit description of
the schema separately
● Follow RDF data model [2]
● Align schemas, link to other
entities in distributed data sources
The data publisher does a
data integration effort
[2] https://www.w3.org/RDF/
Land
Schweiz CH
Staat Country
Switzerland
Concepts
Entities
equivalent equivalent
same same
Typed relations between concepts and between entities from distributed and
(possibly) heterogeneous data sources.
subject predicate object
nyt:Zürich owl:sameAs dbpedia:Zürich
dbpedia:Tim_Berne
rs-Lee
rdf:type foaf:Person
uzh:hackzhodd geo:location geop:Point564
uzh:hackzhodd rdfs:seeAlso wdt:Q25112115
Links
[4] https://www.w3.org/DesignIssues/LinkedData.html
[3] LOD diagram by Abele et al. 2017
http://lod-cloud.net/versions/2017-02-20/lod.svg
Links
Some key advantages
✔ The data publisher knows her data
✔ No need to integrate schemas upfront
✔ Data and metadata can be easily extended and modified
✔ Applications may query the schema description
✔ Structured search
Read more about it: Franklin et al. 2005, Heath et al. 2011
I have data, what should I do?
I have data,
what should I do?
Standard process
CSV
Transform it into RDF
vocabulary
data with
metadata
HowTo: Best Practices for publishing Linked Data, by Hyland et al. 2014.
Comparison of technology: Nentwig et al. 2015, Survey Link Discovery Frameworks.
Link to other entities
Data interlinking
Link discovery
Entity resolution
Open source framework for
interlinking (Isele et al.2009-2017):
http://silkframework.org
Publish
1. Target data set(s)
2. Type of entities to be connected
(e.g.Persons and Humans)
3. Link predicate (e.g. owl:sameAs)
4. Interlinking criteria (e.g. if similar
names)
Common “mistakes” in data interlinking
● Link only to “popular” data sets
●
● Think solely of owl:sameAs links
● Focus on target data sets of similar
topical domain and provenance
● No documented links
● Minimum number of links to appear in
the LOD diagram
● No link maintenance
[5] https://www.w3.org/TR/void/
➢ Gaining visibility is good, but that’s not the
only reason for interlinking.
➢ uzh:r_user_groupMaybe no one
described it yet!
➢
➢ Specify your outlinks in the data set
description to help Web data crawlers!
VoiD [5] :UZH a void:Linkset;
void:target :Wikidata;
void:linkPredicate rdfs:seeAlso;
void:triples 100; . .
➢ Target data sets die, and new data sets
appear all the time.
# Tip 1: Answer these questions and design the
interlinking accordingly
●
● Who should benefit from the interlinking?
a. You, as data publisher
b. Applications (and end-users) consuming your data
c. Applications (and end-users) consuming a collection of data sets, yours among others
● Why do you want to interlink your data?
a. To gain visibility (via other data sets)
b. To complement your data
c. To enable “on update cascade”
● What things do you want to connect?
a. Are there alternative ways of naming such things? E.g. Person, Human
b. Are there more general / more specific terms to label such things? E.g. Animals, Mammals.
# Tip 2: When implementing the interlinking
● Assess the quality of target data sets, or your own data quality will be
damaged. (See Zaveri et al. 2015 for quality issues and quality
control methods).
● Publish outlinks, but also send link requests to others for inlinks.
●
● Check how others interlinked by querying link repositories [6] or the
data sets.
○ Consider declared data set IDs and not raw PLDs e.g.
http://ns.nature.com/subjects/
[6] http://sameas.org/ , http://www.linklion
uzh
nyt
wdt
Wikidata
“Wikidata records
what other sources
say” Lydia
Pintscher, 2016 [7].
Introduction to Wikidata, Sarasua 2016: https://goo.gl/gGzMzK
[7] https://goo.gl/On9Qz1
# Tip 3: Link, improve, repeat.
● Stop criterion should not be the % entities interlinked.
a. If you have very specific data, being able to connect 1% of
source entities might be normal.
● Improve quality in these two dimensions
a. semantic accuracy
❌ cch:koblenz owl:sameAs cde:koblenz
b. links should “enable the discovery of more things” (4th LD
Principle)
[4] https://www.w3.org/DesignIssues/LinkedData.html
Semantic accuracy
● Let humans revise the links.
Detailed Crowdsourcing Tutorial by Demartini et
al.:https://itsgettingcrowded.wordpress.com/
See Demartini et al. 2012, 2013;
Sarasua et al. 2012, 2015
With microtask crowdsourcing
Enable the discovery of more things
● Link to entities that make you learn
something new and non-redundant
about the source entity
○ New value
○ New classification
○ New way of describing
metadata
● The more entities you linked to, the
better.
● The more data sets you connect to,
the better
See also Sarasua et al. 2017.
When you publish your open data, consider using
Semantic Web / Linked Data technologies and
linking your data to other people’s data.
Thanks! Danke! Grazie! Merci!
Cristina Sarasua
E-mail: csarasua@uni-koblenz.de
Twitter: @csarasuagar
Don’t forget that you can
also become a
Wikimedia member
and donate :)
https://wikimedia.de/wiki/Mitgliedschaft
References
Franklin et al. 2005. From databases to dataspaces. ACM SIGMOD Record.
https://homes.cs.washington.edu/~alon/files/dataspacesDec05.pdf
Heath et al. 2011. Linked data: Evolving the web into a global data space. Morgan &
Claypool.
http://www.morganclaypool.com/doi/abs/10.2200/s00334ed1v01y201102wbe001
Hyland et al. 2014. Best Practices for Publishing Linked Data https://www.w3.org/TR/ld-bp/
Nentwig et al. 2015. Survey of Current Link Discovery Frameworks. Semantic Web Journal.
http://www.semantic-web-journal.net/system/files/swj1029.pdf
Zaveri et al. 2015. Quality Assessment for Linked Data: A Survey. Semantic Web Journal.
http://www.semantic-web-journal.net/system/files/swj773.pdf
References
Demartini et al. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for
Large-Scale Entity Linking, WWW2012.
https://diuf.unifr.ch/main/xi/sites/diuf.unifr.ch.main.xi/files/fp0982-demartini.pdf
Demartini et al. 2013. Large-scale linked data integration using probabilistic reasoning and
crowdsourcing.. VLDB Journal. 2013. https://link.springer.com/article/10.1007/s00778-013-0324-z
Sarasua et al. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. ISWC2012.
http://web.stanford.edu/~natalya/papers/iswc2012_crowdmap.pdf
Sarasua 2015. Programmatic Access to Crowdsourced Human Computation for Designing and Enhancing
Interlinking. SemWebDev, ESWC 2015. http://ceur-ws.org/Vol-1361/paper6.pdf
Sarasua et al. 2017. Methods for Intrinsic Evaluation of Links in the Web of Data. ESWC 2017. Upcoming.

Weitere ähnliche Inhalte

Was ist angesagt?

Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataAI4BD GmbH
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsChris Bizer
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
RDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorialRDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorialJerven Bolleman
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) DataAli Khalili
 
DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddataStefan Gradmann
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data UsecasesMyungjin Lee
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataOntotext
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebOscar Corcho
 

Was ist angesagt? (20)

Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Web at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open DataWeb at 25 - Ontos Linked Open Data
Web at 25 - Ontos Linked Open Data
 
Adoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical DomainsAdoption of the Linked Data Best Practices in Different Topical Domains
Adoption of the Linked Data Best Practices in Different Topical Domains
 
Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
RDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorialRDF: what and why plus a SPARQL tutorial
RDF: what and why plus a SPARQL tutorial
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
DBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of Data
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
20130527 library linkeddata
20130527 library linkeddata20130527 library linkeddata
20130527 library linkeddata
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Linked Data Usecases
Linked Data UsecasesLinked Data Usecases
Linked Data Usecases
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 

Ähnlich wie How links can make your open data even greater

Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentMartin Kaltenböck
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutionsOpen Data Support
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !Christophe Guéret
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
EPA OEI Linked Data Process
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process3 Round Stones
 
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014KDZ - Zentrum für Verwaltungsforschung
 
Deploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsDeploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsNikolaos Konstantinou
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
 
Institutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsInstitutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsJohann Höchtl
 
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation WorkshopHKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation Workshopl_ernest
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 

Ähnlich wie How links can make your open data even greater (20)

Linked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable developmentLinked Open Data Principles, benefits of LOD for sustainable development
Linked Open Data Principles, benefits of LOD for sustainable development
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Let's downscale the semantic web !
Let's downscale the semantic web !Let's downscale the semantic web !
Let's downscale the semantic web !
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
EPA OEI Linked Data Process
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process
 
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
Enterprise linked data - open or closed, Andreas Blumauer, Keynote SMWCon 2014
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Deploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software ToolsDeploying Linked Open Data: Methodologies and Software Tools
Deploying Linked Open Data: Methodologies and Software Tools
 
DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Enabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and ReuseEnabling Low-cost Open Data Publishing and Reuse
Enabling Low-cost Open Data Publishing and Reuse
 
Institutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, ToolsInstitutionalising open data quality - Processes Standards, Tools
Institutionalising open data quality - Processes Standards, Tools
 
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation WorkshopHKU Data Curation MLIM7350 Student Project: Data Curation Workshop
HKU Data Curation MLIM7350 Student Project: Data Curation Workshop
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Linked Data
Linked DataLinked Data
Linked Data
 

Mehr von Cristina Sarasua

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata EditorsCristina Sarasua
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataCristina Sarasua
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonCristina Sarasua
 
Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsCristina Sarasua
 
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Cristina Sarasua
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCristina Sarasua
 
Dbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openDbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openCristina Sarasua
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Cristina Sarasua
 

Mehr von Cristina Sarasua (16)

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata Editors
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of Data
 
Closing session
Closing sessionClosing session
Closing session
 
Reviews and awards
Reviews and awardsReviews and awards
Reviews and awards
 
Crowd statement marathon
Crowd statement marathonCrowd statement marathon
Crowd statement marathon
 
Paper presentations1
Paper presentations1Paper presentations1
Paper presentations1
 
Paper presentations2
Paper presentations2Paper presentations2
Paper presentations2
 
Hello session
Hello sessionHello session
Hello session
 
Tecnología e Igualdad
Tecnología e IgualdadTecnología e Igualdad
Tecnología e Igualdad
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata Editathon
 
Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAs
 
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
Programmatic Access to Crowdsourced Human Computation for Designing and Enhan...
 
Swib2014csarasua
Swib2014csarasuaSwib2014csarasua
Swib2014csarasua
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro Work
 
Dbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openDbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_open
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

How links can make your open data even greater

  • 1. How links can make your open data even greater Cristina Sarasua Institute for Web Science and Technologies (WeST) University of Koblenz-Landau, DE Open Data Day 2017 Zurich, CH
  • 2. Goals ● Spread the word about Semantic Web and Linked Data technologies ● Share tips on how to link your data properly
  • 3. Open Data ● Data implemented in any open format ○ CSV/TSV, XML,JSON etc. ● Made available for free ● By any organisation or individual person ● “Usable, reusable and distributable“ (Open Definition, by OKFN [1]) ● Findable ○ Registered in data repositories ○ Via search engines ❖ Boosts transparency ❖ Enables reinterpretation of the data ❖ Facilitates the development of new applications [1] http://opendefinition.org/
  • 4. The Web, an ocean of data Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png
  • 5. The Web, an ocean of data Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png How many trains go from Zurich to Basel SBB daily? How frequently do trains arrive to Basel Bad Bf? What are the most populated cities in the south of Germany?
  • 6. The Web, an ocean of data Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png How many trains go from Zurich to Basel SBB daily? How frequently do trains arrive to Basel BB? What are the most populated cities in the south of Germany? Who is in average more punctual in locations with more than 50,000 inhabitants, German or Swiss trains?
  • 7. The Web, an ocean of data ● Collect data ● Put the data together ● Transform data ● Enable joint query The main effort is on the data consumer side Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png CSV JSON HTML Land Schweiz CH Staat Country Switzerland Linien Haltestelle Cities
  • 8. The Web, an ocean of data ● Collect data ● Put the data together ● Transform data ● Enable joint query The main effort is on the data consumer side Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png CSV JSON HTML Land Schweiz CH Staat Country Switzerland Linien Haltestelle Cities
  • 9. The Web, an ocean of data ● Collect data ● Put the data together ● Transform data ● Enable joint query The main effort is on the data consumer side Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png CSV JSON HTML Land Schweiz CH Staat Country Switzerland Linien Haltestelle Cities
  • 10. The Web, an ocean of data ● Collect data ● Put the data together ● Transform data ● Enable joint query The main effort is on the data consumer side Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png CSV JSON HTML Land Schweiz CH Staat Country Switzerland Linien Haltestelle Cities
  • 11. The Web, an ocean of data ● Collect data ● Put the data together ● Transform data ● Enable joint query The main effort is on the data consumer side Logos: SBB: http://www.sbb.ch/en/home.html Deutsche Bahn:https://www.bahn.de/p/view/index.shtml Wikipedia: https://commons.wikimedia.org/wiki/File:Wikipedia-logo.png CSV JSON HTML Land Schweiz CH Staat Country Switzerland Linien Haltestelle Cities
  • 12. The Web of Data ● Publish an explicit description of the schema separately ● Follow RDF data model [2] ● Align schemas, link to other entities in distributed data sources The data publisher does a data integration effort [2] https://www.w3.org/RDF/ Land Schweiz CH Staat Country Switzerland Concepts Entities equivalent equivalent same same
  • 13. Typed relations between concepts and between entities from distributed and (possibly) heterogeneous data sources. subject predicate object nyt:Zürich owl:sameAs dbpedia:Zürich dbpedia:Tim_Berne rs-Lee rdf:type foaf:Person uzh:hackzhodd geo:location geop:Point564 uzh:hackzhodd rdfs:seeAlso wdt:Q25112115 Links [4] https://www.w3.org/DesignIssues/LinkedData.html [3] LOD diagram by Abele et al. 2017 http://lod-cloud.net/versions/2017-02-20/lod.svg
  • 14. Links
  • 15. Some key advantages ✔ The data publisher knows her data ✔ No need to integrate schemas upfront ✔ Data and metadata can be easily extended and modified ✔ Applications may query the schema description ✔ Structured search Read more about it: Franklin et al. 2005, Heath et al. 2011
  • 16. I have data, what should I do? I have data, what should I do?
  • 17. Standard process CSV Transform it into RDF vocabulary data with metadata HowTo: Best Practices for publishing Linked Data, by Hyland et al. 2014. Comparison of technology: Nentwig et al. 2015, Survey Link Discovery Frameworks. Link to other entities Data interlinking Link discovery Entity resolution Open source framework for interlinking (Isele et al.2009-2017): http://silkframework.org Publish 1. Target data set(s) 2. Type of entities to be connected (e.g.Persons and Humans) 3. Link predicate (e.g. owl:sameAs) 4. Interlinking criteria (e.g. if similar names)
  • 18. Common “mistakes” in data interlinking ● Link only to “popular” data sets ● ● Think solely of owl:sameAs links ● Focus on target data sets of similar topical domain and provenance ● No documented links ● Minimum number of links to appear in the LOD diagram ● No link maintenance [5] https://www.w3.org/TR/void/ ➢ Gaining visibility is good, but that’s not the only reason for interlinking. ➢ uzh:r_user_groupMaybe no one described it yet! ➢ ➢ Specify your outlinks in the data set description to help Web data crawlers! VoiD [5] :UZH a void:Linkset; void:target :Wikidata; void:linkPredicate rdfs:seeAlso; void:triples 100; . . ➢ Target data sets die, and new data sets appear all the time.
  • 19. # Tip 1: Answer these questions and design the interlinking accordingly ● ● Who should benefit from the interlinking? a. You, as data publisher b. Applications (and end-users) consuming your data c. Applications (and end-users) consuming a collection of data sets, yours among others ● Why do you want to interlink your data? a. To gain visibility (via other data sets) b. To complement your data c. To enable “on update cascade” ● What things do you want to connect? a. Are there alternative ways of naming such things? E.g. Person, Human b. Are there more general / more specific terms to label such things? E.g. Animals, Mammals.
  • 20. # Tip 2: When implementing the interlinking ● Assess the quality of target data sets, or your own data quality will be damaged. (See Zaveri et al. 2015 for quality issues and quality control methods). ● Publish outlinks, but also send link requests to others for inlinks. ● ● Check how others interlinked by querying link repositories [6] or the data sets. ○ Consider declared data set IDs and not raw PLDs e.g. http://ns.nature.com/subjects/ [6] http://sameas.org/ , http://www.linklion uzh nyt wdt
  • 21. Wikidata “Wikidata records what other sources say” Lydia Pintscher, 2016 [7]. Introduction to Wikidata, Sarasua 2016: https://goo.gl/gGzMzK [7] https://goo.gl/On9Qz1
  • 22. # Tip 3: Link, improve, repeat. ● Stop criterion should not be the % entities interlinked. a. If you have very specific data, being able to connect 1% of source entities might be normal. ● Improve quality in these two dimensions a. semantic accuracy ❌ cch:koblenz owl:sameAs cde:koblenz b. links should “enable the discovery of more things” (4th LD Principle) [4] https://www.w3.org/DesignIssues/LinkedData.html
  • 23. Semantic accuracy ● Let humans revise the links. Detailed Crowdsourcing Tutorial by Demartini et al.:https://itsgettingcrowded.wordpress.com/ See Demartini et al. 2012, 2013; Sarasua et al. 2012, 2015 With microtask crowdsourcing
  • 24. Enable the discovery of more things ● Link to entities that make you learn something new and non-redundant about the source entity ○ New value ○ New classification ○ New way of describing metadata ● The more entities you linked to, the better. ● The more data sets you connect to, the better See also Sarasua et al. 2017.
  • 25. When you publish your open data, consider using Semantic Web / Linked Data technologies and linking your data to other people’s data.
  • 26. Thanks! Danke! Grazie! Merci! Cristina Sarasua E-mail: csarasua@uni-koblenz.de Twitter: @csarasuagar Don’t forget that you can also become a Wikimedia member and donate :) https://wikimedia.de/wiki/Mitgliedschaft
  • 27. References Franklin et al. 2005. From databases to dataspaces. ACM SIGMOD Record. https://homes.cs.washington.edu/~alon/files/dataspacesDec05.pdf Heath et al. 2011. Linked data: Evolving the web into a global data space. Morgan & Claypool. http://www.morganclaypool.com/doi/abs/10.2200/s00334ed1v01y201102wbe001 Hyland et al. 2014. Best Practices for Publishing Linked Data https://www.w3.org/TR/ld-bp/ Nentwig et al. 2015. Survey of Current Link Discovery Frameworks. Semantic Web Journal. http://www.semantic-web-journal.net/system/files/swj1029.pdf Zaveri et al. 2015. Quality Assessment for Linked Data: A Survey. Semantic Web Journal. http://www.semantic-web-journal.net/system/files/swj773.pdf
  • 28. References Demartini et al. 2012. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking, WWW2012. https://diuf.unifr.ch/main/xi/sites/diuf.unifr.ch.main.xi/files/fp0982-demartini.pdf Demartini et al. 2013. Large-scale linked data integration using probabilistic reasoning and crowdsourcing.. VLDB Journal. 2013. https://link.springer.com/article/10.1007/s00778-013-0324-z Sarasua et al. 2012. CrowdMap: Crowdsourcing Ontology Alignment with Microtasks. ISWC2012. http://web.stanford.edu/~natalya/papers/iswc2012_crowdmap.pdf Sarasua 2015. Programmatic Access to Crowdsourced Human Computation for Designing and Enhancing Interlinking. SemWebDev, ESWC 2015. http://ceur-ws.org/Vol-1361/paper6.pdf Sarasua et al. 2017. Methods for Intrinsic Evaluation of Links in the Web of Data. ESWC 2017. Upcoming.