SlideShare ist ein Scribd-Unternehmen logo
1 von 66
Linked Data -
Evolving the Web into a Global Data Space
Anja Jentzsch - @anjeve
Hasso-Plattner-Institute, Potsdam, Germany
1st Linked Data Meetup @ Foo Café, Malmö
2103/05/23
Outline
1. Foundations of Linked Data
• What is the vision and goal?
2. The Web of Linked Data
• What data is out there?
• What is being done with the data?
3. How to publish and consume Linked Data?
Architecture of the classic Web
B C
HTMLHTML
Web
Browsers
Search
Engines
hyper-
links
A
HTML
• Single global document space
• Small set of simple standards
• HTML as document format
• HTTP URLs as globally unique IDs
• Retrieval mechanism: Hyperlinks to connect
everything
Web 2.0 APIs and Mashups
No single global dataspace
Shortcomings
1. APIs have proprietary interfaces
2. Mashups are based on a fixed set of
data sources
3. No hyperlinks between data items
within different APIs
Web
API
A
Mashup
Web
API
B
Web
API
C
Web
API
D
Web APIs slice the Web into Walled Gardens
Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY
Extend the Web with a single global dataspace
1. by using RDF to publish structured data on the Web
2. by setting links between data items within different data sources
Linked Data
B C
RDF
RDF
Links
A D E
RDF
Links
RDF
Links
RDF
Links
RDF
RDF
RDF
RDF
RDF RDF
RDF
RDF
RDF
Linked Data Principles
Set of best practices for publishing structured data on the Web in
accordance with the general architecture of the Web.
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful RDF information.
4. Include RDF statements that link to other URIs so that they can discover
related things.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
The RDF Data Model
Anja Jentzsch
dbpedia:Berlin
foaf:name
foaf:based_near
foaf:Person
rdf:type
ns:anja
Data Items are identified with HTTP URIs
ns:anja = http://www.anjeve.de#anja
dbpedia:Berlin = http://dbpedia.org/resource/Berlin
foaf:name
foaf:based_near
foaf:Person
rdf:type
ns:anja
dbpedia:Berlin
Anja Jentzsch
Resolving URIs over the Web
dp:Cities_in_Germany
3.499.879
dp:population
skos:subject
dbpedia:Berlin
foaf:name
foaf:based_near
foaf:Person
rdf:type
ns:anja
Anja Jentzsch
Dereferencing URIs over the Web
dbpedia:Hamburg
dbpedia:Muenchen
skos:subject
skos:subject
dp:Cities_in_Germany
3.499.879
dp:population
skos:subject
dbpedia:Berlin
foaf:name
foaf:based_near
foaf:Person
rdf:type
ns:anja
Anja Jentzsch
RDF Representation Formats
• RDF/XML
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person rdf:about="http://anjeve.de#anja">
<foaf:name>Anja Jentzsch</foaf:name>
</foaf:Person>
• RDF N-Triples
<http://anjeve.de#anja> <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://xmlns.com/foaf/0.1/Person> .
<http://anjeve.de#anja> <http://xmlns.com/foaf/0.1/name> „Anja
Jentzsch“ .
• RDF N3
RDF Representation Formats
<http://anjeve.de#anja> <http://www.w3.org/1999/02/22-
rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/
Person> .
• Subject Predicate Object
• In the end it‘s all triples!
foaf:Person
rdf:type
ns:anja
Properties of the Web of Linked Data
• Global, distributed dataspace build on a simple set of standards
• RDF, URIs, HTTP
• Entities are connected by links
• creating a global data graph that spans data sources and
• enables the discovery of new data sources
• Provides for data-coexistence
• Everyone can publish data to the Web of Linked Data
• Everyone can express their personal view on things
• Everybody can use the vocabularies/schemas that they like
W3C Linking Open Data Project
• Grassroots community effort to
• publish existing open license datasets as Linked Data on the Web
• interlink things between different data sources
LOD Data Sets on the Web: May 2007
• 12 data sets
• Over 500 million RDF triples
• Around 120,000 RDF links between data sources
LOD Data Sets on the Web: November 2007
• 28 data sets
LOD Data Sets on the Web: September 2008
• 45 data sets
• Over 2 billion RDF triples
LOD Data Sets on the Web: July 2009
• 95 data sets
• Over 6.5 billion RDF triples
LOD Data Sets on the Web: September 2010
• 203 data sets
• Over 24,7 billion RDF triples
• Over 436 million RDF links between data sources
LOD Data Sets on the Web: September 2011
• 295 data sets
• Over 31 billion RDF triples
• Over 504 million RDF links between data sources http://lod-cloud.net
The Growth in Numbers
0
35000000000
2007 2008 2009 2010 2011
Year Data Sets Triples Growth
2007 12 500,000,000
2008 45 2,000,000,000 300%
2009 95 6,726,000,000 236%
2010 203 26,930,509,703 300%
2011 295 31,634,213,770 33%
LOD Data Set statistics as of 09/2011
LOD Cloud Data Catalog on the Data Hub
• http://datahub.io/group/lodcloud
More statistics
• http://lod-cloud.net/state/
DBpedia –The Hub
on the Web of Data
• DBpedia is a joint project with the following goals
• extracting structured information from Wikipedia
• publish this information under an open license on the Web
• setting links to other data sources
• Partners
• Freie Universität Berlin (Germany)
• Universität Leipzig (Germany)
• OpenLink Software (UK)
• neofonie (Germany)
Extracting structured data from Wikipedia
Extracting structured data from Wikipedia
dbpedia:Berlin rdf:type dbpedia-owl:City ,
dbpedia-owl:PopulatedPlace ,
dbpedia-owl:Place ;
rdfs:label "Berlin"@en , "Berlino"@it ;
dbpedia-owl:population 3499879 ;
wgs84:lat 52.500557 ;
wgs84:long 13.398889 .
!
dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin .
• access to DBpedia data:
• dumps
• SPARQL endpoint
• Linked Data interface
The DBpedia Data Set
• Information on more than 3.77 million “things”
• 764,000 persons
• 192,000 organisations
• 573,000 places
• 112,000 music albums
• 72,000 movies
• 202,000 species
• overall more than 1 billion RDF triples
• title and abstract in 111 different languages
• 8,000,000 links to images
• 24,400,000 links to external web pages
• 27,200,000 links to other Linked Data sets
DBpedia Mappings
• since March 2010 collaborative editing of
• DBpedia ontology
• mappings from Wikipedia infoboxes and tables to DBpedia ontology
• curated in a public wiki with instant validation methods
• http://mappings.dbpedia.org
• multi-langual mappings to the DBpedia ontology:
• ar, bg, bn, ca, cs, de, el, en, es, et, eu, fr, ga, hi, hr, hu, it, ja, ko, nl, pl, pt, ru, sl,
tr
• allows for a significant increase of the extracted data’s quality
• each domain has its experts
DBpedia Use Cases
1. Improvement of Wikipedia search
2. Data source for applications and mashups
3. Text analysis and annotation
4. Hub for the growing Web of Data
DBpedia Mobile
• displays Wikipedia data on map
• aggregates different data sources
Faceted Wikipedia Search
• faceted browsing and free text search
http://spotlight.dbpedia.org
Uptake in the Government Domain
• The EU is pushing Linked Data (LOD2, LATC, EuroStat)
• W3C eGovernment Interest Group
Uptake in the Libraries Community
• Institutions publishing Linked Data
• Library of Congress (subject headings)
• German National Library (PND dataset and subject headings)
• Swedish National Library (Libris - catalog)
• Hungarian National Library (OPAC and Digital Library)
• Europeana project just released data about 4 million artifacts
• W3C Library Linked Data Incubator Group
• OKFN Working Group on Bibliographic Data
• Goals:
• Integrate library catalogs on global scale
• Interconnect resources between repositories (by topic, by location, by
historical period, by ...)
Uptake in Life Sciences
• W3C Linking Open Drug Data Effort
• Bio2RDF Project
• Allen Brain Atlas
• Goal: Smoothly integrate internal and external data in a pay-as-you-go-fashion.
Uptake in the Media Industry
• Publish data as RDF/XML or RDFa
• Goal: Drive traffic to websites via
search engines
Linked Data Applications
Linked Data Browsers
• Tabulator Browser (MIT, USA)
• Marbles (FU Berlin, DE)
• OpenLink RDF Browser (OpenLink, UK)
• Zitgist RDF Browser (Zitgist, USA)
• Humboldt (HP Labs, UK)
• Disco Hyperdata Browser (FU Berlin, DE)
• Fenfire (DERI, Irland)
Web of Data Search Engines
• Sig.ma (DERI, Ireland)
• Falcons (IWS, China)
• Swoogle (UMBC, USA)
• VisiNav (DERI, Ireland)
• Watson (Open University, UK)
schema.org
• jointly proposed vocabularies for embedding data into HTML pages (Microdata)
• available since June 2011
• What does Linked Data mean to startups?
Economic Opportunities
Huge Reservoir of Free Content
• can be used to provide background facts about
• places,
• people,
• topics
• …
Background Facts
Lower Data Integration Costs
The overall data integration effort is split
between the data publisher, the data consumer
and third parties.
• Data Publisher
– publishes data as RDF
– sets identity links
– reuses terms or publishes mappings
• Third Parties
– set identity links pointing at your data
– publish mappings to the Web
• Data Consumer
– has to do the rest
– using record linkage and schema matching
techniques
Options for Search Companies
• new vertical search engines
• Sig.ma, FalconS, …
• Yahoo and Google are not sleeping
• improving text search with background knowledge
Options for Data Integration Intermediaries
• business model
• crawl, integrate and clean Web data
• sell clean data to subscribers
• existing Linked Data integration intermediaries
Is your data 5 star?
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010
Make your stuff available on the Web (whatever format) under
an open license.
Make it available as structured data (e.g., Excel instead of image
scan of a table) so that it can be reused.
Use non-proprietary, open formats (e.g., CSV instead of Excel).
Use URIs to identify things, so that people can point at your stuff
and serve RDF from it.
Link your data to other data to provide context.
★
★ ★
★ ★ ★
★ ★ ★ ★
★ ★ ★ ★ ★
How to publish Linked Data
1. Tasks: Make data available as RDF via HTTP
2. Set RDF links pointing at other data sources
3. Make your data self-descriptive
4. Reuse common vocabularies
Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space
http://linkeddatabook.com/
Linked Data Publishing Patterns
Make Data available as RDF via HTTP
•Ready to use tools (examples)
• D2R Server
• provides for mapping relational
databases into RDF and for
serving them as Linked Data
• Pubby
• Linked Data Frontend for
SPARQL Endpoints
• More tools
• http://esw.w3.org/TaskForces/
CommunityProjects/
LinkingOpenData/PublishingTools
Set RDF links to other data sources
• Examples of RDF links
<http://dbpedia.org/resource/Berlin> owl:sameAs <http://
sws.geonames.org/2950159> .
<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest
<http://dbpedia.org/resource/Semantic_Web> .
<http://example-bookshop.com/book006251587X> owl:sameAs <http://
www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> .
How to generate RDF links?
• Pattern-based approaches
• Exploit naming conventions within URIs (for instance ISBNs, ISINs, …)
• Similarity-based approaches
• Compare items within different data sources using various similarity metrics
• Ready to use tools (Examples)
• Silk Link Discovery Framework
• provides a declarative language for specifying link conditions
which may combine different similarity metrics
• Silk Single Machine, Silk MapReduce
• More tools
• http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/
Make your Data Self-Descriptive
• Increase the usefulness of your data and ease data integration
• Aspects of self-descriptiveness
• Enable clients to retrieve the schema
• Reuse terms from common vocabularies
• Publish schema mappings for proprietary terms
• Provide provenance metadata
• Provide licensing metadata
• Provide data-set-level metadata using voiD
• Refer to additional access methods using voiD
Enable Clients to retrieve the Schema
Clients can resolve the URIs that identify
vocabulary terms in order to get their RDFS or
OWL definitions.
<http://xmlns.com/foaf/0.1/Person>
rdf:type owl:Class ;
rdfs:label "Person";
rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ;
rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> .
<http://richard.cyganiak.de/foaf.rdf#cygri>
foaf:name "Richard Cyganiak" ;
rdf:type <http://xmlns.com/foaf/0.1/Person> .
RDFS or OWL definition
Some data on the Web
Resolve unknown term http://xmlns.com/foaf/0.1/Person
ReuseTerms from CommonVocabularies
• CommonVocabularies
• Friend-of-a-Friend for describing people and their social network
• SIOC for describing forums and blogs
• SKOS for representing topic taxonomies
• Organization Ontology for describing the structure of organizations
• GoodRelations provides terms for describing products and business entities
• Music Ontology for describing artists, albums, and performances
• ReviewVocabulary provides terms for representing reviews
• Common sources of identifiers (URIs) for real world objects
• LinkedGeoData and Geonames locations
• GeneID and UniProt life science identifiers
How to consume Linked Data
Integrated Linked Data Consumption
Tools
• LDIF – Linked Data Integration Framework
• provides for data translation and identity resolution
• http://www4.wiwiss.fu-berlin.de/bizer/ldif/
• ALOE - Assisted Linked Data Consumption framework
• provides for data translation and identity resolution
• http://aksw.org/projects/aloe
• Information Workbench
• provides for visualization and application development
• http://iwb.fluidops.com/
62
Finding Linked Data sets
• Search engines
• find data sets based on keywords
• Data catalogs / directories
• explore data sets and faceted search
• Data Marketplaces
• explore and consume data sets
63
The Data Hub
• Manually curated list of (>3.500) data sets, at least 356 Linked Data Sets
• Various metadata for each data set
64
Conclusion
• The Web of Data is growing fast
• Linked Data provides a standardized data access interface
• Linked Data allows for the development of a variety of tools to integrate,
enhance and and view the data
• Web search is evolving into query answering
• Search engines will increasingly rely on structured data from the Web
• Next step: Linked Data within enterprises
• alternative to data warehouses and EAI middleware
• advantages: schema-less data model, pay-as-you go data integration
Thanks!
References:
• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space
http://linkeddatabook.com/
• Linking Open Data Project Wiki
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
Email: anja@anjeve.de
Twitter: @anjeve

Weitere ähnliche Inhalte

Was ist angesagt?

2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
vafopoulos
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
vafopoulos
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
Vlad Posea
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 

Was ist angesagt? (20)

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Linked Data for Libraries: Experiments between Cornell, Harvard and StanfordLinked Data for Libraries: Experiments between Cornell, Harvard and Stanford
Linked Data for Libraries: Experiments between Cornell, Harvard and Stanford
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
Building Linked Data Applications
Building Linked Data ApplicationsBuilding Linked Data Applications
Building Linked Data Applications
 
Social semantic web
Social semantic webSocial semantic web
Social semantic web
 
Linking Open Data
Linking Open DataLinking Open Data
Linking Open Data
 
Towards digitizing scholarly communication
Towards digitizing scholarly communicationTowards digitizing scholarly communication
Towards digitizing scholarly communication
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)
 
McDanold-1-jun15
McDanold-1-jun15McDanold-1-jun15
McDanold-1-jun15
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
 
Interaction with Linked Data
Interaction with Linked DataInteraction with Linked Data
Interaction with Linked Data
 
Linked data for Enterprise Data Integration
Linked data for Enterprise Data IntegrationLinked data for Enterprise Data Integration
Linked data for Enterprise Data Integration
 

Ähnlich wie Linked Data (1st Linked Data Meetup Malmö)

Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
OCLC
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
Anja Jentzsch
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
Bernhard Haslhofer
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
Anna Fensel
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
Sören Auer
 

Ähnlich wie Linked Data (1st Linked Data Meetup Malmö) (20)

Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data Basics
Linked Data BasicsLinked Data Basics
Linked Data Basics
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
Linked Data
Linked DataLinked Data
Linked Data
 
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata SilosConnecting the Dots: Linking Digitized Collections Across Metadata Silos
Connecting the Dots: Linking Digitized Collections Across Metadata Silos
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
NISO/DCMI Webinar: Schema.org and Linked Data: Complementary Approaches to Pu...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
Open Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LODOpen Data Masterclass - Europeana and LOD
Open Data Masterclass - Europeana and LOD
 
The web of interlinked data and knowledge stripped
The web of interlinked data and knowledge strippedThe web of interlinked data and knowledge stripped
The web of interlinked data and knowledge stripped
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 

Mehr von Anja Jentzsch (6)

Wikidata
WikidataWikidata
Wikidata
 
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling QueriesLODOP - Multi-Query Optimization for Linked Data Profiling Queries
LODOP - Multi-Query Optimization for Linked Data Profiling Queries
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
 
Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)Link Sets And Why They Are Important (EDF2012)
Link Sets And Why They Are Important (EDF2012)
 
Visualizing Web Data Query Results
Visualizing Web Data Query ResultsVisualizing Web Data Query Results
Visualizing Web Data Query Results
 
Finding Data Sets
Finding Data SetsFinding Data Sets
Finding Data Sets
 

Linked Data (1st Linked Data Meetup Malmö)

  • 1. Linked Data - Evolving the Web into a Global Data Space Anja Jentzsch - @anjeve Hasso-Plattner-Institute, Potsdam, Germany 1st Linked Data Meetup @ Foo Café, Malmö 2103/05/23
  • 2. Outline 1. Foundations of Linked Data • What is the vision and goal? 2. The Web of Linked Data • What data is out there? • What is being done with the data? 3. How to publish and consume Linked Data?
  • 3. Architecture of the classic Web B C HTMLHTML Web Browsers Search Engines hyper- links A HTML • Single global document space • Small set of simple standards • HTML as document format • HTTP URLs as globally unique IDs • Retrieval mechanism: Hyperlinks to connect everything
  • 4. Web 2.0 APIs and Mashups No single global dataspace Shortcomings 1. APIs have proprietary interfaces 2. Mashups are based on a fixed set of data sources 3. No hyperlinks between data items within different APIs Web API A Mashup Web API B Web API C Web API D
  • 5. Web APIs slice the Web into Walled Gardens Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY
  • 6. Extend the Web with a single global dataspace 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources Linked Data B C RDF RDF Links A D E RDF Links RDF Links RDF Links RDF RDF RDF RDF RDF RDF RDF RDF RDF
  • 7. Linked Data Principles Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web. 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things. Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
  • 8. The RDF Data Model Anja Jentzsch dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type ns:anja
  • 9. Data Items are identified with HTTP URIs ns:anja = http://www.anjeve.de#anja dbpedia:Berlin = http://dbpedia.org/resource/Berlin foaf:name foaf:based_near foaf:Person rdf:type ns:anja dbpedia:Berlin Anja Jentzsch
  • 10. Resolving URIs over the Web dp:Cities_in_Germany 3.499.879 dp:population skos:subject dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type ns:anja Anja Jentzsch
  • 11. Dereferencing URIs over the Web dbpedia:Hamburg dbpedia:Muenchen skos:subject skos:subject dp:Cities_in_Germany 3.499.879 dp:population skos:subject dbpedia:Berlin foaf:name foaf:based_near foaf:Person rdf:type ns:anja Anja Jentzsch
  • 12. RDF Representation Formats • RDF/XML <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person rdf:about="http://anjeve.de#anja"> <foaf:name>Anja Jentzsch</foaf:name> </foaf:Person> • RDF N-Triples <http://anjeve.de#anja> <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://anjeve.de#anja> <http://xmlns.com/foaf/0.1/name> „Anja Jentzsch“ . • RDF N3
  • 13. RDF Representation Formats <http://anjeve.de#anja> <http://www.w3.org/1999/02/22- rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/ Person> . • Subject Predicate Object • In the end it‘s all triples! foaf:Person rdf:type ns:anja
  • 14. Properties of the Web of Linked Data • Global, distributed dataspace build on a simple set of standards • RDF, URIs, HTTP • Entities are connected by links • creating a global data graph that spans data sources and • enables the discovery of new data sources • Provides for data-coexistence • Everyone can publish data to the Web of Linked Data • Everyone can express their personal view on things • Everybody can use the vocabularies/schemas that they like
  • 15. W3C Linking Open Data Project • Grassroots community effort to • publish existing open license datasets as Linked Data on the Web • interlink things between different data sources
  • 16. LOD Data Sets on the Web: May 2007 • 12 data sets • Over 500 million RDF triples • Around 120,000 RDF links between data sources
  • 17. LOD Data Sets on the Web: November 2007 • 28 data sets
  • 18. LOD Data Sets on the Web: September 2008 • 45 data sets • Over 2 billion RDF triples
  • 19. LOD Data Sets on the Web: July 2009 • 95 data sets • Over 6.5 billion RDF triples
  • 20. LOD Data Sets on the Web: September 2010 • 203 data sets • Over 24,7 billion RDF triples • Over 436 million RDF links between data sources
  • 21. LOD Data Sets on the Web: September 2011 • 295 data sets • Over 31 billion RDF triples • Over 504 million RDF links between data sources http://lod-cloud.net
  • 22. The Growth in Numbers 0 35000000000 2007 2008 2009 2010 2011 Year Data Sets Triples Growth 2007 12 500,000,000 2008 45 2,000,000,000 300% 2009 95 6,726,000,000 236% 2010 203 26,930,509,703 300% 2011 295 31,634,213,770 33%
  • 23. LOD Data Set statistics as of 09/2011 LOD Cloud Data Catalog on the Data Hub • http://datahub.io/group/lodcloud More statistics • http://lod-cloud.net/state/
  • 24. DBpedia –The Hub on the Web of Data • DBpedia is a joint project with the following goals • extracting structured information from Wikipedia • publish this information under an open license on the Web • setting links to other data sources • Partners • Freie Universität Berlin (Germany) • Universität Leipzig (Germany) • OpenLink Software (UK) • neofonie (Germany)
  • 25. Extracting structured data from Wikipedia
  • 26. Extracting structured data from Wikipedia dbpedia:Berlin rdf:type dbpedia-owl:City , dbpedia-owl:PopulatedPlace , dbpedia-owl:Place ; rdfs:label "Berlin"@en , "Berlino"@it ; dbpedia-owl:population 3499879 ; wgs84:lat 52.500557 ; wgs84:long 13.398889 . ! dbpedia:SoundCloud dbpedia-owl:location dbpedia:Berlin . • access to DBpedia data: • dumps • SPARQL endpoint • Linked Data interface
  • 27. The DBpedia Data Set • Information on more than 3.77 million “things” • 764,000 persons • 192,000 organisations • 573,000 places • 112,000 music albums • 72,000 movies • 202,000 species • overall more than 1 billion RDF triples • title and abstract in 111 different languages • 8,000,000 links to images • 24,400,000 links to external web pages • 27,200,000 links to other Linked Data sets
  • 28. DBpedia Mappings • since March 2010 collaborative editing of • DBpedia ontology • mappings from Wikipedia infoboxes and tables to DBpedia ontology • curated in a public wiki with instant validation methods • http://mappings.dbpedia.org • multi-langual mappings to the DBpedia ontology: • ar, bg, bn, ca, cs, de, el, en, es, et, eu, fr, ga, hi, hr, hu, it, ja, ko, nl, pl, pt, ru, sl, tr • allows for a significant increase of the extracted data’s quality • each domain has its experts
  • 29. DBpedia Use Cases 1. Improvement of Wikipedia search 2. Data source for applications and mashups 3. Text analysis and annotation 4. Hub for the growing Web of Data
  • 30.
  • 31. DBpedia Mobile • displays Wikipedia data on map • aggregates different data sources
  • 32. Faceted Wikipedia Search • faceted browsing and free text search
  • 33.
  • 35. Uptake in the Government Domain • The EU is pushing Linked Data (LOD2, LATC, EuroStat) • W3C eGovernment Interest Group
  • 36. Uptake in the Libraries Community • Institutions publishing Linked Data • Library of Congress (subject headings) • German National Library (PND dataset and subject headings) • Swedish National Library (Libris - catalog) • Hungarian National Library (OPAC and Digital Library) • Europeana project just released data about 4 million artifacts • W3C Library Linked Data Incubator Group • OKFN Working Group on Bibliographic Data • Goals: • Integrate library catalogs on global scale • Interconnect resources between repositories (by topic, by location, by historical period, by ...)
  • 37. Uptake in Life Sciences • W3C Linking Open Drug Data Effort • Bio2RDF Project • Allen Brain Atlas • Goal: Smoothly integrate internal and external data in a pay-as-you-go-fashion.
  • 38. Uptake in the Media Industry • Publish data as RDF/XML or RDFa • Goal: Drive traffic to websites via search engines
  • 40. Linked Data Browsers • Tabulator Browser (MIT, USA) • Marbles (FU Berlin, DE) • OpenLink RDF Browser (OpenLink, UK) • Zitgist RDF Browser (Zitgist, USA) • Humboldt (HP Labs, UK) • Disco Hyperdata Browser (FU Berlin, DE) • Fenfire (DERI, Irland)
  • 41.
  • 42. Web of Data Search Engines • Sig.ma (DERI, Ireland) • Falcons (IWS, China) • Swoogle (UMBC, USA) • VisiNav (DERI, Ireland) • Watson (Open University, UK)
  • 43.
  • 44.
  • 45. schema.org • jointly proposed vocabularies for embedding data into HTML pages (Microdata) • available since June 2011
  • 46. • What does Linked Data mean to startups? Economic Opportunities
  • 47. Huge Reservoir of Free Content • can be used to provide background facts about • places, • people, • topics • … Background Facts
  • 48. Lower Data Integration Costs The overall data integration effort is split between the data publisher, the data consumer and third parties. • Data Publisher – publishes data as RDF – sets identity links – reuses terms or publishes mappings • Third Parties – set identity links pointing at your data – publish mappings to the Web • Data Consumer – has to do the rest – using record linkage and schema matching techniques
  • 49. Options for Search Companies • new vertical search engines • Sig.ma, FalconS, … • Yahoo and Google are not sleeping • improving text search with background knowledge
  • 50. Options for Data Integration Intermediaries • business model • crawl, integrate and clean Web data • sell clean data to subscribers • existing Linked Data integration intermediaries
  • 51. Is your data 5 star? Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010 Make your stuff available on the Web (whatever format) under an open license. Make it available as structured data (e.g., Excel instead of image scan of a table) so that it can be reused. Use non-proprietary, open formats (e.g., CSV instead of Excel). Use URIs to identify things, so that people can point at your stuff and serve RDF from it. Link your data to other data to provide context. ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★ ★
  • 52. How to publish Linked Data 1. Tasks: Make data available as RDF via HTTP 2. Set RDF links pointing at other data sources 3. Make your data self-descriptive 4. Reuse common vocabularies Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space http://linkeddatabook.com/
  • 54. Make Data available as RDF via HTTP •Ready to use tools (examples) • D2R Server • provides for mapping relational databases into RDF and for serving them as Linked Data • Pubby • Linked Data Frontend for SPARQL Endpoints • More tools • http://esw.w3.org/TaskForces/ CommunityProjects/ LinkingOpenData/PublishingTools
  • 55. Set RDF links to other data sources • Examples of RDF links <http://dbpedia.org/resource/Berlin> owl:sameAs <http:// sws.geonames.org/2950159> . <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest <http://dbpedia.org/resource/Semantic_Web> . <http://example-bookshop.com/book006251587X> owl:sameAs <http:// www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> .
  • 56. How to generate RDF links? • Pattern-based approaches • Exploit naming conventions within URIs (for instance ISBNs, ISINs, …) • Similarity-based approaches • Compare items within different data sources using various similarity metrics • Ready to use tools (Examples) • Silk Link Discovery Framework • provides a declarative language for specifying link conditions which may combine different similarity metrics • Silk Single Machine, Silk MapReduce • More tools • http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/
  • 57. Make your Data Self-Descriptive • Increase the usefulness of your data and ease data integration • Aspects of self-descriptiveness • Enable clients to retrieve the schema • Reuse terms from common vocabularies • Publish schema mappings for proprietary terms • Provide provenance metadata • Provide licensing metadata • Provide data-set-level metadata using voiD • Refer to additional access methods using voiD
  • 58. Enable Clients to retrieve the Schema Clients can resolve the URIs that identify vocabulary terms in order to get their RDFS or OWL definitions. <http://xmlns.com/foaf/0.1/Person> rdf:type owl:Class ; rdfs:label "Person"; rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ; rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> . <http://richard.cyganiak.de/foaf.rdf#cygri> foaf:name "Richard Cyganiak" ; rdf:type <http://xmlns.com/foaf/0.1/Person> . RDFS or OWL definition Some data on the Web Resolve unknown term http://xmlns.com/foaf/0.1/Person
  • 59. ReuseTerms from CommonVocabularies • CommonVocabularies • Friend-of-a-Friend for describing people and their social network • SIOC for describing forums and blogs • SKOS for representing topic taxonomies • Organization Ontology for describing the structure of organizations • GoodRelations provides terms for describing products and business entities • Music Ontology for describing artists, albums, and performances • ReviewVocabulary provides terms for representing reviews • Common sources of identifiers (URIs) for real world objects • LinkedGeoData and Geonames locations • GeneID and UniProt life science identifiers
  • 60. How to consume Linked Data
  • 61. Integrated Linked Data Consumption Tools • LDIF – Linked Data Integration Framework • provides for data translation and identity resolution • http://www4.wiwiss.fu-berlin.de/bizer/ldif/ • ALOE - Assisted Linked Data Consumption framework • provides for data translation and identity resolution • http://aksw.org/projects/aloe • Information Workbench • provides for visualization and application development • http://iwb.fluidops.com/
  • 62. 62 Finding Linked Data sets • Search engines • find data sets based on keywords • Data catalogs / directories • explore data sets and faceted search • Data Marketplaces • explore and consume data sets
  • 63. 63 The Data Hub • Manually curated list of (>3.500) data sets, at least 356 Linked Data Sets • Various metadata for each data set
  • 64. 64
  • 65. Conclusion • The Web of Data is growing fast • Linked Data provides a standardized data access interface • Linked Data allows for the development of a variety of tools to integrate, enhance and and view the data • Web search is evolving into query answering • Search engines will increasingly rely on structured data from the Web • Next step: Linked Data within enterprises • alternative to data warehouses and EAI middleware • advantages: schema-less data model, pay-as-you go data integration
  • 66. Thanks! References: • Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space http://linkeddatabook.com/ • Linking Open Data Project Wiki http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData Email: anja@anjeve.de Twitter: @anjeve