Designing IA for AI - Information Architecture Conference 2024
Link Sets And Why They Are Important (EDF2012)
1. Link Sets and Why They ARE Important
Anja Jentzsch, Freie Universität Berlin
6 June 2012
Realising and Exploiting the EU data cloud
European Data Forum, Copenhagen, Denmark
3. Links
• 4th Linked Data principle: set RDF links to other data sources on the Web
• fundamental to the Web of Data
• connect data islands into a global, interconnected data space
• enable discovery of additional data sources
4. Links
• Definition: Anexternal RDF link is an RDF triple in which the subject of the triple
is a URI reference in the namespace of one data set, while the predicate and/or
object of the triple are URI references pointing into the namespaces of other
data sets.
5. Link Types
1. Relationship Links point at related things in other data sources, for instance,
other people, places or genes.
2. Identity Links point at URI aliases used by other data sources to identify the
same real-world object or abstract concept.
3. Vocabulary Links point from data to the definitions of the vocabulary terms
that are used to represent the data, as well as from these definitions to the
definitions of related terms in other vocabularies.
6. Motivation
• Web of Data is a single global data space because data sources are connected by links
• Over 30 billion triples published as Linked Open Data (09/19/2011)
• But:
• Less than 500 million links
• Most publishers only link to one other dataset
LOD data sets by the number of other data
sources that are target of outgoing RDF links.
7. State of the LOD Cloud
http://lod-cloud.net/state
8. Challenges for Link Discovery
• Large range of domains
• 277 data sources in the LOD cloud from a variety of domains
Link distribution by topical domain
9. Link Discovery Tools
• Tools enable data publishers to set links
• Most tools generate links based on user-defined linkage rules
• A linkage rule specifies the conditions data items must fulfill in order to be
interlinked
• Popular Link Discovery Tools:
• Silk Link Discovery Framework
• LIMES
• Others: http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/
EquivalenceMining
10. (Simplified) Linking Workflow
Select Datasets
Write Linkage Rule
Generate Links
• Select two data sources
• Specifies how two • Locally or on a Hadoop
• Select the entity types entities are compared
Cluster
to be interlinked
• Can be written manually • Write Links to file or a
or learned
triple store
11. Silk Workbench
• Web application which guides the user through the process of interlinking
different data sources
• Enables the user to manage different sets of data sources and linking tasks
• Offers a graphical editor which enables the user to easily create and edit linkage
rules
• Offers tools to evaluate the current linkage rule
• Includes support for learning linkage rules
13. LATC Workbench
• Project in Workspace consists of:
• Data Sources
• Holds all information that is needed to
retrieve entities from it
• E.g. a file dump or a SPARQL endpoint
• Linking Tasks
• Interlinks a type of entity between two data
sources
• e.g. Interlinking movies in DBpedia and
LinkedMDB
14. LATC Linkage Rule Editor
• Allows to view and edit linkage rules
• Linkage Rules are shown as a tree
• Editing using drag & drop
15. Learning Linkage Rules
• Linkage Rules can be learned interactively
• Can be used to generate new linkage rules or to improve existing rules
• Learned Linkage Rule can be viewed and edited by the user