1. Linked Data:
Enabler of Semantic Web
2011.06.30
Sung-Kook Han
Semantic Technology Lab
Won Kwang Univ.
skhan@wku.ac.kr 1
2. Outline
Introduction to Semantic Technology
Semantic Technology + Web Technology
• Semantic Web
• Web 2.0
• Linked Data
Design and Publication of Linked Data
• 9 steps towards Linked Open Data
skhan@wku.ac.kr 2
3. Why Semantic Technology??
the ways of thinking, cognition…
George Boole: An Investigation of the Laws of Thought (1854)
Claude Shannon: 1937 master's thesis,
A Symbolic Analysis of Relay and Switching Circuits
John von Neumann Kurt Gödel Alan Turing
skhan@wku.ac.kr 3
6. Communication
Human vs. Human
Human vs. Alien
Human vs. Computer
Computer vs. Computer
skhan@wku.ac.kr 6
7. Semantic Technology
Semantic technology has been a distinct research field for more
than 40 years.
Formal Logic (since Russell and Frege)
Knowledge Representation Systems in AI
Semantic Networks and ATN (William Woods, 1975)
DARPA and European Commission programs in information integration
Development of simple tractable logics
Relational Algebras and Schemas in Database Systems
Library Science (classifications, thesauri, taxonomies)
New challenges of Semantic Technology: Semantic Web
A massive store of information that computers cannot use
A way to get around needing the “big data warehouse”
Another place where “a little semantics can go a long way”...
cf: The Relationship Between Web 2.0 And the Semantic Web - Dr. Mark Greaves, Vulcan, Inc.
skhan@wku.ac.kr 7
8. Ontology Spectrum
strong semantics
Modal Logic
has_experience_in works Company
First Order Logic
Technologies
Knowledge
Representation
Programs Personnel
Logical Theory Is Disjoint Subclass
Management S1 illusion
Agent Natural
Language
Project am
AS
Description Logic of with transitivity
Program AS AS Department
Telecommunication
Task Technical
Paulnderleez
Leo DAML+OIL, OWL property
Semantic Director EcDARPA has WISO
Interoperability
Request
Reza
Assistant
Director
Navy
Intelligence UML
Ann Brad
Howard Conceptual Model
Is Subclass of
RDF/S Semantic Interoperability
XTM
Extended ER
Thesaurus Has Narrower Meaning Than
ER
DB Schemas, XML Schema Animal
Structural Interoperability
Taxonomy Mammal Reptile
Is Sub-Classification of
Bird
Relational Snake
Dog Cat
Model, XML Syntactic Interoperability
Cocker
Spaniel
weak semantics
Lady Based on Leo Obrst, The Ontology Spectrum & Semantic Models
skhan@wku.ac.kr 8
9. Semantic Technology
Intelligence Integration Interoperability
Machine-processible Digital
Semantics Information Resources
Web resources
Ontology
Services
Semantic
Image
Metadata
Audio/Video
Technology
controlled
Documents
vocabulary
skhan@wku.ac.kr 9
10. Web Technology
Web of machine-processible Data
Common vocabularies: Metadata and Ontology
Query and reasoning
Web of Services
Classic Web Internet of Services
Web of Documents Internet of Things
HTML as document format
HTTP URLs as globally unique IDs
Hyperlinks to connect everything
Social Web
Connect human-being
Web as a platform
Programmable APIs and proprietary interfaces
Mashups based on a fixed set of data sources
skhan@wku.ac.kr 10
11. Semantic Web
Standardizations
Trio of Semantic Web
Metadata / Ontology: RDF, RDFS, OWL
Query Language: SPARQL
Rule Language: RIF (SWRL)
SKOS, RDFa, GRRDL, WSMO,…
SOAP/ REST
Tools and Systems
Authoring, Reasoning Engines,…
835 items in Sweet Tools
Best Practices
Linked Open Data
Semantic MediaWiki
NEPOMUK, SIOC, Garlik
W3C Semantic Web Use cases
Sweet Tools: http://www.mkbergman.com/new-version-sweet-tools-sem-web/
W3C Semantic Web Case Studies and Use Cases: http://www.w3.org/2001/sw/sweo/public/UseCases/
skhan@wku.ac.kr 11
12. Semantic Applications
Semantic Wave 2008, Industry Roadmap to Web 3.0, Project10X
http://www.mkbergman.com/new-version-sweet-tools-sem-web/
skhan@wku.ac.kr 12
13. Web 2.0
Resharpen the way of viewing the Web
Web as the platform
Web as the social media
Web as the collaboration tool
Web as ……
Web 2.0 Manifestation
Openness / Sharing
Participation / Collaboration
Web 2.0 Syndrome
Library 2.0
Government 2.0
Enterprise 2.0
……
New Web applications
wiki, blog, RSS,…
skhan@wku.ac.kr 13
15. Semantic Web Today
Major future issues:
• Vocabularies
• Scalability
• Provenance
• Personal Infospheres
• Mobile and Real World Networks
skhan@wku.ac.kr 15
16. Web 2.0 APIs Today
No Single global space: Web APIs slice the Web into Walled Gardens.
• Mashups of APIs are proprietary.
• No links between data.
MashUp
Web Web Web
API API API
A B C
Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)
skhan@wku.ac.kr 16
17. The Web is Dead??
http://www.wired.com/magazine/2010/08/ff_webrip/
skhan@wku.ac.kr 17
18. Long Live the Web !
http://www.scientificamerican.com/article.cfm?id=long-live-the-web
skhan@wku.ac.kr 18
19. Lessons Learned
Data is more important than API code.
Data is the Intel Inside.
Open data is more important than open source
Structured data is more valuable than unstructured.
We should seek to structure our data well.
Metadata will play a core role of data structure.
A little semantics goes a long way.
Beware the usefulness of shallow ontology shown in LOD.
Linking data and services are essential.
Link every thing.
Rich user experiences are the key for adaption.
We should consider mobile computing and personalization.
Visualize and navigate.
skhan@wku.ac.kr 19
21. Web of Documents
A global file systems of documents (document silos on the
Web).
Implicit semantics of content and links
Designed for human consumption
Disconnected data
skhan@wku.ac.kr 21
22. Architecture: Web of Documents
Analogy
Web Search a global file system
Browsers Engines
Designed for
HTTP URL human consumption
Primary objects
documents
HTML HTML HTML
Links between
Doc. Doc. Doc. documents (or sub-parts of)
Degree of structure in objects
hyperlink hyperlink
document link document link fairly low
Main Usage
Search and browsing
DB-A DB-B DB-C Semantics of content and links
implicit
skhan@wku.ac.kr 22
23. Machine-Processible Data
Web of Documents
Documents
Information Resources Documents
Human processible
Data
Database
Machine processible
Web of Data
Open the data silos and get rid of repository-centric mindset
Publish data of public interest on the Web
In a way that other applications can access and interpret the data
Using common Web technologies
skhan@wku.ac.kr 23
24. Semantic Web: Web of Data
The vision of a Semantic Web:
building a global Web of machine-readable data
Berners-Lee, Hendler & Lassila, 2001; Marshall & Shipman, 2003
The first step is putting data on the Web in a form that machines can
naturally understand, or converting it to that form. This creates what I call a
Semantic Web - a web of data that can be processed directly or indirectly by
machines. Therefore, while the Semantic Web, or Web of Data, is the goal or
the end result of this process, Linked Data provides the means to reach that
goal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009
Linked Data Foundation
can lower the barrier to reuse, integration and application of data from multiple,
distributed and heterogeneous sources.
the more sophisticated proposals associated with the Semantic Web vision,
such as intelligent agents, may become a reality.
skhan@wku.ac.kr 24
25. Linked Data: Web of Data
Goal: Web-scale Data Integration
Alternative to classic data integration systems in order to cope with growing
number of data sources.
Querying across data sources
Global distributed database RDF
Extend the Web with a single global data space
Giant Global Graph (GGG)
Demonstrate the possibility of Semantic Web
By using RDF to publish structured data RDF
By setting links between data single
RDF
universal
information space.
RDF
RDF
RDF
skhan@wku.ac.kr 25
26. Architecture: Linked Data
Analogy
a global database
Linked Data Linked Data Search Designed for
Browsers Mashup Engines machines first, humans later
HTTP URI Primary objects
things (or descriptions (data) of
things)
Links between
RDF RDF RDF things
triples Triples triples Degree of structure in
RDF link RDF link (descriptions of) things
data link data link high
DB-A DB-B DB-C
Main usage
query, navigation and reasoning
Semantics of content and links
explicit
skhan@wku.ac.kr 26
27. Linked Data Principles
Set of best practices for publishing structured data on the Web in accordance with
the general architecture of the Web.
Use URIs as names for things.
Use URIs as names for things, not just for documents or homepages
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful RDF information.
Include RDF statements that link to other URIs so that they can discover
related things. URI
URI URI
URI
RDF Link
URI RDF triple Information
URI HTTP URI URI
skhan@wku.ac.kr 27
28. Linked Open Data
Community effort to
publish existing open license datasets as Linked Data on the Web
interlink things between different data sources
develop clients that consume Linked Data from the Web
began early 2007
skhan@wku.ac.kr 28
29. LOD Data sets on the Web
25 billion RDF triples, which are interlinked by around 395 million RDF links (Sep. 2010).
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.svg
skhan@wku.ac.kr 29
30. Summary: Web of Linked Data
A global, distributed database built on a simple set of
standards
RDF, URI, HTTP
Explicit semantics of content and links
Resources are connected by semantic links.
creating a single global data graph that span data sources
enables the discovery of new data sources
Provides for data co-existence
Anyone can publish data to the Web of Linked Data
Data publishers are not constrained in choice of vocabularies with
which to represent data.
Designed for computer first, humans later
skhan@wku.ac.kr 30
32. Europeana
European digital library: Europeana: This European Commission initiative
encompasses not only libraries but also museums, archives and other holders of cultural
heritage material.
http://version1.europeana.eu/web/europeana-project
skhan@wku.ac.kr 32
33. Linked Library Cloud
Libraries have been producing
metadata for ages.
Libraries (often) produce high-
quality metadata.
Library develops many metadata
standards such as DC, SKOS,
BIBO, OAI-ORE including
MARC 21, MODS, FRBR,..
Integrate Library Catalogues on
global scale
http://code4lib.org/conference/2010/singer
skhan@wku.ac.kr 33
34. Linking Open Drug Data
linking the various sources of
drug data together to answer
interesting scientific and
business questions.
Survey publicly available data
sets about drugs
Publish and interlink these data
sets on the Web
Explore interesting questions that
could be answered if the data sets
are linked.
8 million RDF triples, which are
interlinked by more than
370,000 RDF links (As of
August 2009)
skhan@wku.ac.kr 34
35. BBC Semantic Project
Publish program / music data as RDF/XML or RDFa
Build semantically linked and annotated web pages about artists and
singers whose songs are played on BBC radio stations.
semantically interconnected
skhan@wku.ac.kr 35
36. DBpedia Mobile
Show map with information about nearby locations
Linked data browser
GPS + Google Maps + DBpedia + Flickr + Revyu
skhan@wku.ac.kr 36
37. Attention by Search Engines
Yahoo!
crawls Linked Data in its RDFa serialization as well as Microformat
Yahoo Search Monkey to make search results more useful and visually
appealing
provides access to crawled data through the Yahoo BOSS API
Google
use Social Graph API
is developing Google Squared and Google Fusion Table
merged MetaWeb
manage Freebase, a DBpedia/YAGO competitor
Rich Snippets
skhan@wku.ac.kr 37
40. 9 Steps to publishing Linked Data
Publicize your Data Sets
Describe your Data Sets
Link to other Data Sets
Triplify Data Sets
Choose URIs for Things in your Data
Create Vocabularies
Understand your data
Setup Your Infrastructure for Linked Data
Understand the principles
skhan@wku.ac.kr 40
42. Linked Data: Overview
Benefits of Linked Data Enables web-scale data distributed
publication with web-based discovery mechanisms.
Linked Data Web Resources are generic real-world data
objects or entities:
People, Places, and other physical things
Abstract concepts (e.g., emotion, notion,…)
Subject matter (e.g., science, economics, arts,…)
Linked Data is not just structured data published on the
Web.
Linked Data is based on well-established Web standards
Linked Data adds value: less redundancy, greater
discoverability, network effects.
skhan@wku.ac.kr 42
43. Linked Data Principles (TimBL, 2006)
Use URIs as names for things
not just for documents
http://dbpedia.org/resource/ontology
you are not your homepage
http://mentalist.com/actor/patrick_jane
Use HTTP URIs
globally unique names, distributed ownership
allows people to look up those names
Provide useful information in RDF
when someone looks up a URI
Include RDF links to other URIs
to enable discovery of related information
skhan@wku.ac.kr 43
44. 5 Star rating
On the web, open licensed: Available on the web (whatever
format), but with an open license
Machine-readable data: Available as machine-readable
structured data (e.g. excel instead of image scan of a table)
Non-proprietary format (e.g. csv instead of excel)
RDF standards: Use open standards from W3C (RDF and SPARQL)
to identify things, so that people can point at your stuff
Linked RDF: Link your data to other people’s data to provide
context
skhan@wku.ac.kr 44
45. Linked Data Core Stack
http://linkeddata-specs.info/
RFC 2616 Hypertext Transfer Protocol
• HTTP/1.1 Defines HTTP, a generic and stateless application-level protocol for distributed,
collaborative, hypermedia information systems.
RFC 3986 Uniform Resource Identifier (URI):
• Generic Syntax Defines a generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs
on the Internet.
RDF Concepts and Abstract Syntax
• Defines the RDF graph data model and key concepts.
SPARQL Query Language for RDF
• Defines the syntax and semantics of the SPARQL query language for RDF.
skhan@wku.ac.kr 45
46. Core Technology
Uniform Resource Identifier (URI)
Names (identifiers) for resources in an open Web environment
Resource Description Framework (RDF)
a model for representing metadata on the web
triple structure
RDF Schema and OWL
languages for defining vocabularies
RDF/XML, N3, Turtle,…
serialization and de-serialization of RDF triples for exchanging RDF
data
Simple Knowledge Organization System (SKOS)
a language for describing controlled vocabularies
SPARQL
a query language and protocol for accessing RDF data via the Web
skhan@wku.ac.kr 46
47. Linked Data Modeling
Data Modeling Data Linking
RDF data model to publish RDF links to interlink data
structured data on the Web from different data sources
RDF triple: subject, predicate, and object
Subject: URI identifying the described resource
Predicate: relation exists between subject and object,
vocabularies, collections of URIs that can be used to represent information about a certain
domain
Object: a simple literal value, or the URI of another resource that is related to the subject
skhan@wku.ac.kr 47
48. Linked Data Model
dbp-prop:title The Lord of the rings
http://.../isbn/46316
Flexible graph-based model: RDF graph
skos:subject
dbp-prop:author English novels
dbp-prop:publisher
The HTTP protocol brings together identification
dbp-prop:name and retrieval again.
foaf:homepage dbpidia:Allen&Unwin
J.R.R. Tolkien
opencyc:headquarter
dbp-prop:city
Deeper into the Web
wkp-en:J.R.R.Tolkien
London
fb:guid…..92df7
URI: global primary key fb:creator
skos:subject = http://www.w3.org/2004/02/skos/core#subject fb:street_address
dbp-prop:title = http://dbpedia.org/property/title
Marivie
83 Alexander St 83
Alexander
skhan@wku.ac.kr 48
50. Basic Infrastructure
packaging
search
Data/
extraction discovery
Content
navigation
SPARQL
link
RDF Triple Base generation
index Query
Engine
DB conversion
triple store
Interface Framework + APIs
Delivery Web Server (Apache)
Application browser navigator search
skhan@wku.ac.kr 50
51. Infrastructure Construction
Configuration of Web server
Configuring the server for correct MIME types application/rdf+xml
Code samples for ConNeg and 303 Redirects:
http://linkeddata.org/tools
use cURL: http://curl.haxx.se/ to configure Apache
Configure for hash URI or Slash URI
Testing your content negotiation
Install the LiveHTTPHeaders and Modify Headers extensions for
Firefox
Try LiveHTTPHeaders against my URI
http://www.skyhigh.com/id/hong
do the same with URIs from other data sets
Modify your headers to ask for application/rdf+xml
skhan@wku.ac.kr 51
52. Supporting Technologies
Linked Data Browsers
provide for navigating between data sources and for exploring the dataspace.
Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF
Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco
Hyperdata Browser Berlin, Fenfire (DERI, Irland)
Web of Data Search Engines
crawl the data space and provide best-effort query answers over crawled data.
Falcons (IWS, China), Sig.ma (DERI, Ireland), Swoogle (UMBC, USA),
VisiNav (DERI, Ireland), Watson (Open University, UK), TAP, Sindice
skhan@wku.ac.kr 52
53. Supporting Technologies
Describing data set
discovery and usage of linked datasets
voiD, Ding
Registry
an open registry of data and content packages
CKAN
Linking tool
discovering relationships between data items within different Linked Data
sources
SILK
Mapping tool
mapping database to RDF triples
Triplify, D2R Server
LOD platform
D2R Server, Virtuoso Universal Server,
Talis Platform, Pubby, …
skhan@wku.ac.kr 53
54. 3. Understand Data to be published
• Review about Data to be published
• Requirement analysis
skhan@wku.ac.kr 54
55. Review about Data to be published
What
think about the key things to be presented in Linked Data
analysis of data properties
What vocabularies can be used to describe these?
Why
purposes and goals of linked data to be published
What for
how to use and apply linked data (use cases)
How to serve
Serving Linked Data as Static RDF/XML Files
Serving Linked Data as RDF Embedded in HTML Files
Serving RDF and HTML with Custom Server-Side Scripts
Serving Linked Data from Relational Databases
Serving Linked Data from RDF Triple Stores
Serving Linked Data by Wrapping Existing Application or Web APIs
skhan@wku.ac.kr 55
57. Guideline for Vocabulary Creation
Do not define new vocabularies from scratch, but complement existing
vocabularies with additional terms (in your own namespace) to represent your
data as required.
Provide for both humans and machines. Use rdfs:comments for each term
invented. Always provide a label for each term using the rdfs:label property.
Make term URIs de-referenceable following the W3C Best Practice Recipes
for Publishing RDF Vocabularies.
Make use of other people's terms. Using other people's terms, or providing
mappings to them, by means of rdfs:subClassOf or rdfs:subPropertyOf.
State all important information explicitly. For example, state all ranges and
domains explicitly.
Do not create over-constrained, brittle models; leave some flexibility for
growth. Do not use full-featured OWL or RDF to define your vocabulary.
Unless you know exactly what you are doing, use RDF Schema to define
vocabularies.
skhan@wku.ac.kr 57
58. Potential Ontologies / Vocabularies
Friend-of-a-Friend (FOAF), vocabulary for describing people.
Dublin Core (DC) defines general metadata attributes. See also their new
domains and ranges draft.
Semantically-Interlinked Online Communities (SIOC), vocabulary for
representing online communities.
Description of a Project (DOAP), vocabulary for describing projects.
Simple Knowledge Organization System (SKOS), vocabulary for
representing taxonomies and loosely structured knowledge.
Music Ontology provides terms for describing artists, albums and tracks.
Review Vocabulary, vocabulary for representing reviews.
Creative Commons (CC), vocabulary for describing license terms
Geo, vocabulary for describing geographical locations
GoodRelations, vocabulary for describing products
skhan@wku.ac.kr 58
60. Definition of Vocabulary
# Definition of the class "Lover"
<http://sites.movie.org/pub/LoveVocabulary#Lover>
rdf:type rdfs:Class ;
rdfs:label "Lover"@en ;
rdfs:label "Liebender"@de ;
rdfs:comment "A person who loves somebody."@en ;
rdfs:comment "Eine Person die Jemanden liebt."@de ;
rdfs:subClassOf foaf:Person .
# Definition of the property "loves"
<http://sites.movie.org/pub/LoveVocabulary#loves>
rdf:type rdf:Property ;
rdfs:label "loves"@en ;
rdfs:label "liebt"@de ;
rdfs:comment "Relation between a lover and a loved person."@en ;
rdfs:subPropertyOf foaf:knows ;
rdfs:domain <http://sites.movie.org/pub/LoveVocabulary#Lover> ;
rdfs:range foaf:Person .
skhan@wku.ac.kr 60
61. Tools for Vocabulary Definition
Ontology editors
Protégé:
an open-source ontology editor with a dedicated OWL plug-in
Neologism:
Web-based tool for creating, managing and publishing simple RDFS
vocabularies.
open-source and implemented in PHP on top of the Drupal-platform.
TopBraid Composer:
a powerful commercial modeling environment for developing Semantic
Web ontologies
NeOn Toolkit:
an open-source ontology engineering environment with an extensive set of
plug-ins.
skhan@wku.ac.kr 61
62. 5. Choose URIs
• Resource Identification
• Types of URIs
• De-Referencing
• Common URI Patterns
skhan@wku.ac.kr 62
63. Resource Identification
Separation of Identity and Representation
Identity
Identity (URI) of an Object or Entity should be unambiguous and globally unique
Representation
On the Web a URI should provide an unambiguous data access path
Access
Reference to abstract (physically inaccessible)
Objects or Entities is only achievable via conduit documents that carry
representations of entity descriptions (which at best are facets of an entire description)
URI Requirements:
Keep out of other peoples' namespaces
Use a namespace that you control
Abstract away from implementation details (Short is better…)
Stable and persistent
Hash or Slash
Use common URI patterns
skhan@wku.ac.kr 63
64. URI
URI: Unique Resource Identifier
home page??
(Web document)
http://www.example.com/people/alice
information
object ??
URI: identification of people, products, places, ideas and concepts such as
ontology classes, including URLs for Web documents
hash URI
Two Approaches
slash URI
skhan@wku.ac.kr 64
65. Hash / Slash URI
Hash URI
URIs can contain a fragment, a special part that is separated from the
rest of the URI by a hash symbol (“#”).
http://www.example.com/products/BiBimBab#this
http://www.travel.com /nation/Korea/KyungJu#main
simply publish a description document containing RDF about the things
at the base URI
Slash URI
examples:
http://www.example.com/products/BiBimBab
http://www.travel.com /nation/Korea/KyungJu
must publish your description document at another, distinct URI.
skhan@wku.ac.kr 65
66. hash URI
http://www.skyhigh.com/person/GilDong#this
Separating identification
and naming from
representation
Metadata:
content-type:
application/xhtml+ xml
Data:
<html xmlns=“..
<head> Entity
<title> Our hero…
(GilDong)
</html>
http://www.skyhigh.com/person/GilDong
skhan@wku.ac.kr 66
67. slash URI
http://www.skyhigh.com/person/hero/GilDong/id
Separating identification
and naming from
representation
Metadata:
content-type:
application/xhtml+ xml
Metadata:
Data:
content-type:
<html xmlns=“..
application/rdf+xml
<head> Entity
<title> Our hero…
(GilDong) Data:
<html xmlns=“..
</html>
<head>
<title> Our hero…
http://www.skyhigh.com/person/hero/GilDong/page </html>
http://www.skyhigh.com/person/hero/GilDong/data
skhan@wku.ac.kr 67
68. Slash vs. Hash
Slash URI
HTTP redirection (30X response) is required in order for resource "Identity" to be
separated from "representation". :
http://www.skyhigh.com/person/hero/GilDong/id (URI of an Organization Entity)
http://www.skyhigh.com/person/hero/GilDong/page (HTML representation of
Entity description)
http://www.skyhigh.com/person/hero/GilDong/data (RDF representation that
describes the Entity which could be: Turtle, N3. RDF/XML etc. based data
serialization)
Hash URI
HTTP redirection isn't required in order for resource "Identity" to be separated from
"representation". :
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (URI of an
Organization Entity)
http://demo.openlinksw.com/Northwind/Customer/ALFKI a document (HTML,
Turtle, N3, RDF/XML, representation of Entity description).
skhan@wku.ac.kr 68
69. DeReferencing Hash URI
Without content negotiation With content negotiation
http://www.example.com/about#alice
http://www.example.com/about#alice
ID
ID automatic truncation of fragment
http://www.example.com/about
automatic truncation of fragment
application/rdf+xml win text/html win
content
negotiation
RDF
RDF
http://www.example.com/about http://www.example.com/about.rdf HTML
http://www.example.com/about.html
skhan@wku.ac.kr 69
70. DeReferencing Slash URI
One Generic Document Different documents
http://www.example.com/id/alice
http://www.example.com/id/alice
ID
ID
303 redirected
text/html win
http://www.example.com/doc/alice application/rdf+xml win
generic document 303 redirected
application/rdf+xml win text/html win with content
negotiation
content RDF
negotiation
http://www.example.com/doc/alice.rdf
RDF HTML
http://www.example.com/doc/alice.rdf HTML http://www.example.com/doc/alice.html
http://www.example.com/doc/alice.html
skhan@wku.ac.kr 70
73. Common URI Pattern
http://dbpedia.org/resource/New_York_City Thing
http://dbpedia.org/data/New_York_City RDF data
http://dbpedia.org/page/New_York_City HTML page
http://revyu.com/people/tom Thing
http://revyu.com/people/tom/about/rdf RDF data
http://revyu.com/people/tom/about/html HTML page
http://www.bbc.co.uk/music/artists/db4624cf#artist Thing
http://www.bbc.co.uk/music/artists/db4624cf.rdf RDF data
http://www.bbc.co.uk/music/artists/db4624cf.html HTML page
http://id.dbpedia.org/Berlin Thing
http://data.dbpedia.org/Berlin RDF Data
http://page.dbpedia.org/Berlin HTML page
http://www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X ISBN
skhan@wku.ac.kr 73
74. Choosing URI
http://www.culture.com/LOD/{class}/{member}
http://www.culture.com/LOD/{class}/{member}.rdf
http://www.culture.com/LOD/{class}/{member}.html
Examples:
URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
HTML representation of Entity description
http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page
RDF representation that describes the Entity which could be: Turtle, N3.
RDF/XML etc. based data serialization
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
skhan@wku.ac.kr 74
75. 6. Triplify Data Sets
• Publication Strategies
• Conversion of Database
skhan@wku.ac.kr 75
76. Linked Data Publication
Types of data Structured Data Text
RDF-izers Entity
Data Preparation For CVS, xml, Extractor
Excel (e.g. Calais)
Relational Data Source RDF RDF
Data storage Database With API Store
files
CMS with
RDB-to-RDF Custom Linked Data Web
RDFa
Data Publication Wrapper
(e.g. D2R)
Output
Linked Data Interface Server
wrapper (e.g. Pubby (e.g. Apache)
(e.g. Drupal)
Linked Data on the Web
skhan@wku.ac.kr 76
77. Publication Strategy
Strategy
From unstructured sources
use NLP, text mining, annotation,…
OpenCalais, Ontos
From semi-structured sources
Dbpedia, Linked GeoData, SCOVO,…
efficient bi-directional synchronization
From structured sources (relational database)
Declarative syntax and semantics of data model translation
RDB2RDF,…
skhan@wku.ac.kr 77
78. Conversion of Database
Books Authors
ID ID
Year Name
Homepage
Publishers
ID
PublisherName
City
Books
ID Author Title Publisher Year
ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000
Authors
ID Name Home page
id_xyz Ghosh, Amitav http://www.amitavghosh.com
Publishers
ID Publisher Name City
id_qpr Harper Collins London
skhan@wku.ac.kr 78
79. Conversion of Database
Tools for mapping RDB to Linked Data
D2R Server for customizable mappings from relational databases to ontologies
[Bizer, Cyganiak 06]
Browser-based tools for defining RDB-to-RDF mappings
[Zhou, Xu, Chen, Idehen 08]
Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]
OpenLink Data Spaces [Idehen, Erling 08]
skhan@wku.ac.kr 79
80. RDF Features Best Avoided
Do not use the full expressivity of the RDF data model.
Use a subset of the RDF features
No blank nodes.
It is impossible to set external RDF links to a blank node,
Do not use RDF reification as the semantics of reification
unclear and cumbersome to query with the SPARQL query language.
Metadata can be attached to the information resource instead
Be careful before using RDF collections or RDF containers
do not work well together with SPARQL
skhan@wku.ac.kr 80
81. 7. Link to other Data sets
• Types of Linking
• Linking manually
• Automatic generation of Link
skhan@wku.ac.kr 81
82. Link ! Reuse !!
Reuse. Do not invent the wheel again…
The URIs are de-referenceable.
For instance, using the DBpedia URI http://dbpedia.org/page/Doom to
identify the computer game Doom gives you an extensive description of
the game including abstracts in 10 different languages and various
classifications.
The URIs are already linked to URIs from other data sources.
For instance, you can navigate from the DBpedia URI
http://dbpedia.org/resource/Innsbruck to data about Innsbruck provided by
Geonames and EuroStat.
Therefore, by using concept URIs form these datasets, you interlink your
data with a rich and fast-growing network of other data sources.
skhan@wku.ac.kr 82
83. Types of Linking to other Data Sets
Relationship Links
point at related things in other data sources, for instance, other people, places or
genes.
<http://www.skyhigh.com/people/GilDong>
rdf:type foaf:Person ;
foaf:name “Hong, Gil-Dong" ;
foaf:based_near <http://dbpedia.org/resource/Seoul> ;
foaf:topic_interest <http://dbpedia.org/resource/Justice> ;
foaf:knows <http://dbpedia.org/resource/HalBingDang> .
Identity Links
point at URI aliases used by other data sources to identify the same real-world
object or abstract concept.
<http:// www.skyhigh.com/people/GilDong > <http://www.w3.org/2002/07/owl#sameAs>
<http://www.korea.org/history/hero>
Vocabulary Links
point to the definitions of related terms in other vocabularies
<http://www.university.org/terms/professor>
rdf:type rdfs:Class ;
rdfs:subClassOf <http://dbpedia.org/ontology/Person> .
rdfs:subClassOf <http://sw.opencyc.org/concept/Mx4rvbGdrcN5Y29ycA> ;
owl:equivalentClass <http://rdf.dictionary.com/entry/facultyMember>
skhan@wku.ac.kr 83
84. Link to other Data Sets
URI aliases
In an open environment like the Web it often happens that different
information providers talk about the same non-information resource. As
they do not know about each other, they introduce different URIs for
identifying the same real-world object.
http://dbpedia.org/resource/Berlin
http://sws.geonames.org/2950159/
URI aliases provide an important social function to the Web of Data as they
are de-referenced to different descriptions of the same non-information
resource and thus allow different views and opinions to be expressed.
owl:sameAs
Common Properties
rdfs:seeAlso, foaf:knows, foaf:based_near, foaf:topic_interest,…
Two approaches for linking data:
RDF Links Manually
Auto-generating RDF Links
skhan@wku.ac.kr 84
85. RDF Links Manually
Find the similar data sets as suitable linking targets manually search in
these for the URI references you want to link to.
If a data source doesn't provide a search interface, you can use Linked
Data browsers like Tabulator or Disco to explore the dataset and find
the right URIs.
Useful sites:
Sindice and Falcons provide indexes to identify candidate URIs for linking.
CKAN site : a registry of open linked data and projects.
Uriqr - A URI Search Engine: http://dev.uriqr.com/
Freebase: http://www.freebase.com
MOAT: Meaning Of A Tag Framework
For manually interlinking tags with Semantic Web URIs (such as URIs from
DBpedia, Geonames … or any knowledge base)
Remember that data sources might use HTTP-303 redirects to redirect
clients from URIs identifying non-information resources to URIs
identifying information resources that describe the non-information
resources.
skhan@wku.ac.kr 85
86. Auto-generating RDF Links
Various approaches
Pattern-based Algorithms
Similarity-based Approaches
Complex property-based Algorithms
Yves Equivalence Miner: interlinking Jamendo and Musicbrainz.
Equivalence Mining and Matching Frameworks
Silk - A Link Discovery Framework for the Web of Data.
Silk can be run on a single machine or on a Hadoop cluster (for instance
Amazon EC2).
LIMES - Link Discovery Framework for Metric Spaces.
time-efficient and lossless approaches for large-scale link discovery based on
the characteristics of metric spaces.
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
TopBraid Composer
a wizard for linking ontology instances to corresponding DBpedia concepts.
SemMF
a flexible framework for calculating semantic similarity between objects that
are represented as arbitrary RDF graphs.
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining
skhan@wku.ac.kr 86
87. 8. Describe Data Sets
• Metadata for Description
skhan@wku.ac.kr 87
88. Publishing Descriptions of a Data set
Help others discover and index your data
Apply a license or waiver to your data set
Metadata about the published linked data set
authorship of a data set, its currency (i.e., how recently the data set was updated), its
licensing terms, the provenance and timeliness of a data set and the terms for
licensing
Important issues:
Provenance:
the ability to track the origin of data
key component in building trustworthy, reliable applications
Open Provenance Model84
Licenses vs. Waivers
Norms : a means for data publishers who waive their legal rights (through
application of a waiver) to define expectations they have about how the data is used
Two primary mechanisms
Semantic Sitemaps: http://sw.deri.org/2007/07/sitemapextension/
voiD : http://semanticweb.org/wiki/VoiD
skhan@wku.ac.kr 88
89. Description
Metadata about published data, such as a URI identifying the author
Metadata and licensing information.
Description Description of dataset that have the resource's URI as the subject.
Description of dataset that have the resource's URI as the object.
Backlinks This is redundant, but it allows browsers and crawlers to traverse links
in either direction.
Related Any additional information about related resources, i.e., answering
information about a book with the author information.
descriptions A moderate approach not overloaded excessively.
Various ways to serialize RDF descriptions.
At least provide RDF descriptions as RDF/XML which is the only
Syntax official syntax for RDF.
Additionally provide Turtle descriptions Trix, and other
skhan@wku.ac.kr 89
90. Data Set Description: Example
# Metadata and Licensing Information
<http://dbpedia.org/data/Alec_Empire>
rdfs:label "RDF description of Alec Empire" ;
rdf:type foaf:Document ;
dc:publisher <http://dbpedia.org/resource/DBpedia> ;
dc:date "2007-07-13"^^xsd:date ;
dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .
# The description
<http://dbpedia.org/resource/Alec_Empire>
foaf:name "Empire, Alec" ;
rdf:type foaf:Person ;
rdf:type <http://dbpedia.org/class/yago/musician> ;
rdfs:comment
"Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;
rdfs:comment
"Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;
dbpedia:genre <http://dbpedia.org/resource/Techno> ;
dbpedia:associatedActs <http://dbpedia.org/resource/Atari_Teenage_Riot> ;
foaf:page <http://en.wikipedia.org/wiki/Alec_Empire> ;
foaf:page <http://dbpedia.org/page/Alec_Empire> ;
rdfs:isDefinedBy <http://dbpedia.org/data/Alec_Empire> ;
owl:sameAs <http://zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b> .
skhan@wku.ac.kr 90
91. Data Set Description: Example
# Backlinks
<http://dbpedia.org/resource/60_Second_Wipeout>
dbpedia:producer <http://dbpedia.org/resource/Alec_Empire> .
<http://dbpedia.org/resource/Limited_Editions_1990-1994>
dbpedia:artist <http://dbpedia.org/resource/Alec_Empire> .
skhan@wku.ac.kr 91
92. 9. Publish Data Sets
• Serialization
• Linked Data Storage
• Test and Debugging
skhan@wku.ac.kr 92
93. Publishing Linked Data
Serialization of Data
Publication
Advantages Disadvantages
Method
RDF/XML Document Oldest, best supported Confusingly like normal XML
Turtle (N3) Not technically a standard
Simplest
Document yet
HTML Document Fits inside HTML,
Can get very complicated
with RDFa but also RDF
Promising, but still being
JSON Normal JSON, but also RDF
developed
Needs to download+run
GRDDL Use the XML you have/want
XSLT
SPARQL Query Protocol Query Protocol
RDF files shouldn't be larger than, say, a few hundred kilobytes. Break
them up into several RDF files
Make sure multiple RDF files are linked to each other through RDF
triples.
skhan@wku.ac.kr 93
97. Linked Data Storage
RDB to RDF Middleware
D2R Server
Native RDF Storage (manage it yourself)
4Store
AllegroGraph
Bigdata
BigOWLIM
Jena TDB
Neo4j
Sesame
Virtuoso
Native RDF Storage (managed)
Talis Platform
Pubby
Linked Data front-end for SPARQL Endpoints
Paget Framework
skhan@wku.ac.kr 97
98. Testing and Debugging Linked Data
To ensure it adheres to the Linked Data principles and best
practices
correctness of URIs dereference
Vapour Linked Data Validator at http://idi.fundacionctic.org/vapour
RDF:Alerts at http://swse.deri.org/RDFAlerts/
Sindice Inspector at http://inspector.sindice.com/
manual validation and debugging of Linked Data
cURL, Firefox browser extensions LiveHTTPHeaders and
ModifyHeaders
technical debugging and validation
Linked Data browsers can be used for.
Tabulator, Marbles, LOD Browser Switch
skhan@wku.ac.kr 98
99. Summary: Linked Data
Semantic Technologies need to go where the data is !
Long Live Semantic Technology !
Early adaptation of Semantic Technology is the king !
Growth in data volumes is very rapid.
Link, Integrate, Reuse
Linked Data is a truly Web-friendly way of publishing data.
Linked Data is the common global data space.
Gun for killer apps of semantic technology…
Catalyst and enabler to make semantic technology real…
Unlimited opportunities ahead…
skhan@wku.ac.kr 99
100. References
Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao, Describing linked datasets, In
Proceedings of the WWW2009 Workshop on Linked Data on the Web, 2009.
Tim Berners-Lee, Linked Data - Design Issues, 2006, http://www.w3.org/DesignIssues/LinkedData.html.
Tim Berners-Lee, Giant global graph, http://dig.csail.mit.edu/breadcrumbs/node/215, 2007.
Christian Bizer, Tom Heath, and Tim Berners-Lee, Linked data - the story so far, Int. J. Semantic Web Inf.
Syst., 5(3):1–22, 2009.
Chris Bizer, Richard Cyganiak, and Tom Heath, How to Publish Linked Data on the Web,
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
W3C Working Draft, Cool URIs for the Semantic Web,
http://www.w3.org/TR/2008/WD-cooluris-20080321/
http://data.gov.uk/linked-data
http://www.w3.org/2001/sw/Specs.html
Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., and Aumueller, D. (2009). Triplify : lightweight linked
data publication from relational databases. In Proceedings of the 17th International Conference on World
Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009
A Survey of current approaches for mapping of relational databases to RDF:
http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt
Miles et al.: Best Practices Recipes for Publishing RDF Vocabularies, Available at:
http://www.w3.org/TR/swbp-vocab-pub/
skhan@wku.ac.kr 100
101. Semantic Technology
Your World, Your Way
skhan@wku.ac.kr
skhan@wku.ac.kr 101