Adoption of the Linked Data Best Practices in Different Topical Domains

Chris Bizer
Chris BizerProfessor um University of Mannheim
Max Schmachtenberg 
Christian Bizer 
Heiko Paulheim 
Adoption of the Linked Data Best Practices 
in Different Topical Domains 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 1
The Linked Data Best Practices 
Central idea of Linked Data: Ease data discovery and 
integration by complying to a set of best practices. 
1. Linking Best Practices 
• Set RDF links pointing at instances in other data sources. 
2. Vocabulary Best Practices 
• Reuse terms from widely-used vocabularies. 
• Make definitions of proprietary terms dereferencable. 
• Link vocabulary terms to terms in other vocabularies. 
3. Metadata Best Practices 
• Publish machine-readable provenance and licensing metadata. 
• Publish metadata about alternative access methods (SPARQL, dumps) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 2
State of the LOD Cloud Report - 2011 
 http://lod-cloud.net/state/ 
 Based on information 
by provided dataset 
publishers via the 
datahub.io catalog 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 3
LOD Cloud - 2011 
Consists of 
295 datasets. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 4
Outline 
Goal: Update the State of the LOD Cloud report 
and LOD Cloud itself to 2014. 
1. Methodology 
2. Adoption of the Linking Best Practices 
3. Adoption of the Vocabulary Best Practices 
4. Adoption of the Metadata Best Practices 
5. Conclusions (in Relation to Schema.org) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 5
1. Methodology: Crawl of the Linked Data Web 
 Crawler: LDSpider, Crawl Date: April 2014 
 Seeds: 560,000 seed URIs from 
1. Example URIs in datahub.io catalog 
2. URIs from BTC2012 dataset 
3. URIs from datasets advertised on public-lod@w3.org mailing list 
 Crawled Data Corpus 
• 900,000 documents containing 
• 8,038,000 resources 
• 1014 datasets 
• 77 datasets prevent 
crawling via robots.txt 
• Distribution by dataset 
• Red line: documents 
• Blue line: resources 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 6
Categorization by Topical Domain 
 Used categorization from datahub.io for existing datasets. 
 Manually categorized remaining datasets. 
 Added new category Social Networking 
 Growth without new category Social Networking: 94 % 
 LODstats (http://stats.lod2.eu/) discovered similar number of datasets: 1048 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 7
2. Adoption of the Linking Best Practices 
Data publishers should set RDF links as: 
1. Discoverability depends on being linked. 
2. RDF links ease data integration. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 8
Degrees 
 56% of all datasets set RDF links pointing to other datasets. 
• The remaining 44% are either only the target of RDF links from other 
datasets or are isolated. 
 Datasets with Top In- and Outdegrees: 
 Most widely used linking predicates: owl:sameAs, rdfs:seeAlso, foaf:knows 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 9
“Crawlable” LOD Cloud 2014 
 ss 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 10
Degree Distributions 
 Dotted line: Social Networking (status.net, etc.) 
 Solid line: Cross-Domain datasets (DBpedia, etc.) 
 Largest Strongly Connected Component: 36% (377 datasets) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 11
Conclusion concerning Linking Best Practices 
 Some datasets put a lot of effort into linking. 
 Many datasets only link to a small number of other datasets 
or do not set RDF links at all. 
 Similar situation as in 2011. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 12
3. Adoption of the Vocabulary Best Practices 
Goal: Help applications understand the data by 
1. Reusing terms from widely-used vocabularies. 
2. Making definitions of proprietary terms 
dereferencable. 
3. Linking vocabulary terms to terms in other 
vocabularies. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 13
Widely-Used and Proprietary Vocabularies 
 Strong agreement on some vocabularies. 
 Proprietary vocabularies are used in 
addition to common ones, 
as data is often very specific 
Widely-Used Vocabularies 
Proprietary Vocabularies 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 14
Dereferencability of Term URIs and Vocabulary Linking 
 28% of the proprietary vocabularies provide dereferencable URIs. 
 21% set RDF links to other vocabularies (8% in 2011) 
• Popular linking predicates: rdfs:range, rdfs:subClassOf, rdfs:subClassOf 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 15
Adoption of the Metadata Best Practices 
1. Publish machine-readable provenance information. 
2. Publish machine-readable licensing information. 
3. Publish metadata about alternative access methods 
(SPARQL endpoints, RDF dumps) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 16
Provenance and Licensing Metadata 
 37% of the datasets provide provenance information 
• Dublin Core is used more than W3C Prov 
 10% provide machine-readable licensing information 
• Most used predicates dc:license, cc:license 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 17
Dataset Level Metadata (VoID) 
 15% of the datasets publish VoID descriptions. 
 Via these descriptions, it is possible to discover SPARQL 
endpoints and dumps for about 10% of the data sources. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 18
Conclusion concerning Metadata Best Practices 
 Applications can not rely on availability of metadata, 
as only a small fraction of all data sources publishes such data. 
 The Government and Library domains are positive exceptions. 
 Similarly low numbers as in 2011. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 19
“Full” LOD Cloud Diagram 
570 datasets 
 374 datahub.io 
 196 our crawl 
http://lod-cloud.net/ 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 20
Growth of the “Full” LOD Cloud Diagram 
 2011: 295 datasets 
 2014: 570 datasets (+ 93 %) 
http://lod-cloud.net/ 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 21
Comparison of Linked Data and Schema.org 
Schema.org 
1. does not expect data publishers to set data links. 
2. relies on marking up data in HTML pages. 
3. Strong application pull by Google, Microsoft, Yahoo! 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 22
Adoption 
WebDataCommons, 2013*: 
463,000 websites (PLDs) provide Microdata annotations. 
Google, 2014**: 
5 million websites provide Schema.org data. 
 Orders of magnitude more Schema.org data sources. 
* WebDataCommons extracts Microdata, RDFa, Microformat data 
from the CommonCrawl (2.2 billion HTML pages from 12.8 million PLDs). 
** Guha in LDOW2014 Keynote 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 23
Schema.org Topical Focus 
Different topics 
compared to 
Linked Data. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 24
Class / Property Distribution 
Microdata 2012 
 Only a small set of classes / properties is actually used. 
 Less variety compared to Linked Data. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 25
Shallowness of the Schema.org Data 
schema:Product schema:JobPosting 
Product Names 
• AppleMacBook Air MC968/A 11.6-Inch Laptop 
• Apple MacBook Air 11-in, Intel Core i5 1.60GHz, 64 GB, Lion 10.7 
JobPostings 
• More specific properties like skills are hardly used. 
• 57% of all hiringOrganizations are strings not instances. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 26
Conclusion 
Linked Data Schema.org 
~ 1,000 sources > 460,000 sources 
covers wider range of specific topics 
(government, libraries, science) 
topics focused on search engines 
(products, organizations) 
contains more complex 
data structures 
very simple and shallow 
data structures 
partial ontology agreement strong ontology agreement 
identity resolution eased by RDF links identity resolution often requires 
value parsing 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 27
Thank you. 
References 
 Report 
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/ 
 Catalog 
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/ 
Acknowledgement 
 This work was supported by 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 28
1 von 28

Recomendados

Search Joins with the Web - ICDT2014 Invited Lecture von
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
3.5K views73 Folien
DBpedia - An Interlinking Hub in the Web of Data von
DBpedia - An Interlinking Hub in the Web of DataDBpedia - An Interlinking Hub in the Web of Data
DBpedia - An Interlinking Hub in the Web of DataChris Bizer
2.5K views43 Folien
The Graph Structure of the Web - Aggregated by Pay-Level Domain von
The Graph Structure of the Web - Aggregated by Pay-Level DomainThe Graph Structure of the Web - Aggregated by Pay-Level Domain
The Graph Structure of the Web - Aggregated by Pay-Level Domainoli-unima
3.4K views20 Folien
Graph Structure in the Web - Revisited. WWW2014 Web Science Track von
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackGraph Structure in the Web - Revisited. WWW2014 Web Science Track
Graph Structure in the Web - Revisited. WWW2014 Web Science TrackChris Bizer
5.1K views24 Folien
Evolving the Web into a Global Dataspace – Advances and Applications von
Evolving the Web into a Global Dataspace – Advances and ApplicationsEvolving the Web into a Global Dataspace – Advances and Applications
Evolving the Web into a Global Dataspace – Advances and ApplicationsChris Bizer
5.5K views76 Folien
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ... von
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary ...Robert Meusel
1.8K views27 Folien

Más contenido relacionado

Was ist angesagt?

The Semantic Web – A Vision Come True, or Giving Up the Great Plan? von
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
2.3K views19 Folien
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap... von
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
559 views19 Folien
Extending Tables with Data from over a Million Websites von
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million WebsitesChris Bizer
3.6K views25 Folien
2013 open analytics-meetup-mortar von
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortarOpen Analytics
1.6K views28 Folien
Cenitpede: Analyzing Webcrawl von
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
3.3K views17 Folien
Produce and consume_linked_data_with_drupal von
Produce and consume_linked_data_with_drupalProduce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupalSTI Innsbruck
411 views20 Folien

Was ist angesagt?(20)

The Semantic Web – A Vision Come True, or Giving Up the Great Plan? von Martin Hepp
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp2.3K views
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap... von Data Beers
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
Data Beers559 views
Extending Tables with Data from over a Million Websites von Chris Bizer
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
Chris Bizer3.6K views
2013 open analytics-meetup-mortar von Open Analytics
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
Open Analytics1.6K views
Produce and consume_linked_data_with_drupal von STI Innsbruck
Produce and consume_linked_data_with_drupalProduce and consume_linked_data_with_drupal
Produce and consume_linked_data_with_drupal
STI Innsbruck411 views
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information von Kai Schlegel
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel1.1K views
Industry@RuleML2015 DataGraft von RuleML
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
RuleML704 views
Internet in space - Networkshop44 von Jisc
Internet in space - Networkshop44Internet in space - Networkshop44
Internet in space - Networkshop44
Jisc1.8K views
Uk discovery-jisc-project-showcase von RDTF-Discovery
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
RDTF-Discovery1.1K views
Grid Computing July 2009 von Ian Foster
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
Ian Foster1.6K views
Health Sciences Research Informatics, Powered by Globus von Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Globus 482 views
Better together: building services for public good on top of content from the... von petrknoth
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth748 views
Better together: building services for public good on top of content from the... von petrknoth
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth257 views
Unlocking the full potential of five-star addresses by using Linked Data Frag... von Raf Buyle
Unlocking the full potential of five-star addresses by using Linked Data Frag...Unlocking the full potential of five-star addresses by using Linked Data Frag...
Unlocking the full potential of five-star addresses by using Linked Data Frag...
Raf Buyle354 views
The Power of Semantic Technologies to Explore Linked Open Data von Ontotext
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext1.3K views

Similar a Adoption of the Linked Data Best Practices in Different Topical Domains

Linked Data: Opportunities for Entrepreneurs von
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs3 Round Stones
822 views43 Folien
Linked Data and Semantic Web Application Development by Peter Haase von
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLaboratory of Information Science and Semantic Technologies
857 views62 Folien
EPA OEI Linked Data Process von
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process3 Round Stones
1.1K views36 Folien
THOR Workshop - Data Publishing Elsevier von
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierMaaike Duine
411 views11 Folien
Lecture20 von
Lecture20Lecture20
Lecture20srikanthhadoop
126 views37 Folien
NIH Data Summit - The NIH Data Commons von
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
768 views37 Folien

Similar a Adoption of the Linked Data Best Practices in Different Topical Domains(20)

Linked Data: Opportunities for Entrepreneurs von 3 Round Stones
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
3 Round Stones822 views
EPA OEI Linked Data Process von 3 Round Stones
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process
3 Round Stones1.1K views
THOR Workshop - Data Publishing Elsevier von Maaike Duine
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
Maaike Duine411 views
NIH Data Summit - The NIH Data Commons von Vivien Bonazzi
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi768 views
Llinked open data training for EU institutions von Open Data Support
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
Open Data Support4.2K views
Data Management for Research (New Faculty Orientation) von aaroncollie
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
aaroncollie610 views
Bonazzi commons bd2 k ahm 2016 v2 von Vivien Bonazzi
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
Vivien Bonazzi738 views
Bourne RDAP11 Data Publication Repositories von ASIS&T
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
ASIS&T1.1K views
Linked Data for the Masses: The approach and the Software von IMC Technologies
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
IMC Technologies771 views
The Commons: Leveraging the Power of the Cloud for Big Data von Philip Bourne
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
Philip Bourne852 views
2012 Fall Data Management Planning Workshop von Lizzy_Rolando
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
Lizzy_Rolando410 views
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co... von Cory Lampert
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Cory Lampert670 views
The Linked Data Lifecycle von geoknow
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
geoknow1.4K views
Linked Data (1st Linked Data Meetup Malmö) von Anja Jentzsch
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch1.5K views
Using Linked Data Resources to generate web pages based on a BBC case study von Leila Zemmouchi-Ghomari
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study

Más de Chris Bizer

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration? von
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?Chris Bizer
39 views56 Folien
Integrating Product Data from the Semantic Web using Deep Learning Techniques von
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesChris Bizer
35 views47 Folien
Using the Semantic Web as Training Data for Product Matching von
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingChris Bizer
552 views26 Folien
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web von
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebChris Bizer
873 views53 Folien
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the... von
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Chris Bizer
933 views58 Folien
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch... von
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Chris Bizer
2.6K views53 Folien

Más de Chris Bizer(9)

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration? von Chris Bizer
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
Chris Bizer39 views
Integrating Product Data from the Semantic Web using Deep Learning Techniques von Chris Bizer
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Chris Bizer35 views
Using the Semantic Web as Training Data for Product Matching von Chris Bizer
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product Matching
Chris Bizer552 views
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web von Chris Bizer
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
Chris Bizer873 views
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the... von Chris Bizer
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Chris Bizer933 views
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch... von Chris Bizer
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Chris Bizer2.6K views
Data Search and Search Joins (Universität Heidelberg 2015) von Chris Bizer
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)
Chris Bizer423 views
Exploring the Application Potential of Relational Web Tables von Chris Bizer
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web Tables
Chris Bizer679 views
Evolving the Web into a Global Database - Advances and Applications. von Chris Bizer
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
Chris Bizer1.6K views

Último

information von
informationinformation
informationkhelgishekhar
10 views4 Folien
How to think like a threat actor for Kubernetes.pptx von
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptxLibbySchulze1
5 views33 Folien
Affiliate Marketing von
Affiliate MarketingAffiliate Marketing
Affiliate MarketingNavin Dhanuka
17 views30 Folien
The Dark Web : Hidden Services von
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden ServicesAnshu Singh
5 views24 Folien
Building trust in our information ecosystem: who do we trust in an emergency von
Building trust in our information ecosystem: who do we trust in an emergencyBuilding trust in our information ecosystem: who do we trust in an emergency
Building trust in our information ecosystem: who do we trust in an emergencyTina Purnat
109 views18 Folien
ATPMOUSE_융합2조.pptx von
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptxkts120898
24 views70 Folien

Último(9)

How to think like a threat actor for Kubernetes.pptx von LibbySchulze1
How to think like a threat actor for Kubernetes.pptxHow to think like a threat actor for Kubernetes.pptx
How to think like a threat actor for Kubernetes.pptx
LibbySchulze15 views
The Dark Web : Hidden Services von Anshu Singh
The Dark Web : Hidden ServicesThe Dark Web : Hidden Services
The Dark Web : Hidden Services
Anshu Singh5 views
Building trust in our information ecosystem: who do we trust in an emergency von Tina Purnat
Building trust in our information ecosystem: who do we trust in an emergencyBuilding trust in our information ecosystem: who do we trust in an emergency
Building trust in our information ecosystem: who do we trust in an emergency
Tina Purnat109 views
ATPMOUSE_융합2조.pptx von kts120898
ATPMOUSE_융합2조.pptxATPMOUSE_융합2조.pptx
ATPMOUSE_융합2조.pptx
kts12089824 views
IETF 118: Starlink Protocol Performance von APNIC
IETF 118: Starlink Protocol PerformanceIETF 118: Starlink Protocol Performance
IETF 118: Starlink Protocol Performance
APNIC394 views
Marketing and Community Building in Web3 von Federico Ast
Marketing and Community Building in Web3Marketing and Community Building in Web3
Marketing and Community Building in Web3
Federico Ast14 views

Adoption of the Linked Data Best Practices in Different Topical Domains

  • 1. Max Schmachtenberg Christian Bizer Heiko Paulheim Adoption of the Linked Data Best Practices in Different Topical Domains Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 1
  • 2. The Linked Data Best Practices Central idea of Linked Data: Ease data discovery and integration by complying to a set of best practices. 1. Linking Best Practices • Set RDF links pointing at instances in other data sources. 2. Vocabulary Best Practices • Reuse terms from widely-used vocabularies. • Make definitions of proprietary terms dereferencable. • Link vocabulary terms to terms in other vocabularies. 3. Metadata Best Practices • Publish machine-readable provenance and licensing metadata. • Publish metadata about alternative access methods (SPARQL, dumps) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 2
  • 3. State of the LOD Cloud Report - 2011  http://lod-cloud.net/state/  Based on information by provided dataset publishers via the datahub.io catalog Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 3
  • 4. LOD Cloud - 2011 Consists of 295 datasets. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 4
  • 5. Outline Goal: Update the State of the LOD Cloud report and LOD Cloud itself to 2014. 1. Methodology 2. Adoption of the Linking Best Practices 3. Adoption of the Vocabulary Best Practices 4. Adoption of the Metadata Best Practices 5. Conclusions (in Relation to Schema.org) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 5
  • 6. 1. Methodology: Crawl of the Linked Data Web  Crawler: LDSpider, Crawl Date: April 2014  Seeds: 560,000 seed URIs from 1. Example URIs in datahub.io catalog 2. URIs from BTC2012 dataset 3. URIs from datasets advertised on public-lod@w3.org mailing list  Crawled Data Corpus • 900,000 documents containing • 8,038,000 resources • 1014 datasets • 77 datasets prevent crawling via robots.txt • Distribution by dataset • Red line: documents • Blue line: resources Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 6
  • 7. Categorization by Topical Domain  Used categorization from datahub.io for existing datasets.  Manually categorized remaining datasets.  Added new category Social Networking  Growth without new category Social Networking: 94 %  LODstats (http://stats.lod2.eu/) discovered similar number of datasets: 1048 Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 7
  • 8. 2. Adoption of the Linking Best Practices Data publishers should set RDF links as: 1. Discoverability depends on being linked. 2. RDF links ease data integration. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 8
  • 9. Degrees  56% of all datasets set RDF links pointing to other datasets. • The remaining 44% are either only the target of RDF links from other datasets or are isolated.  Datasets with Top In- and Outdegrees:  Most widely used linking predicates: owl:sameAs, rdfs:seeAlso, foaf:knows Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 9
  • 10. “Crawlable” LOD Cloud 2014  ss Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 10
  • 11. Degree Distributions  Dotted line: Social Networking (status.net, etc.)  Solid line: Cross-Domain datasets (DBpedia, etc.)  Largest Strongly Connected Component: 36% (377 datasets) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 11
  • 12. Conclusion concerning Linking Best Practices  Some datasets put a lot of effort into linking.  Many datasets only link to a small number of other datasets or do not set RDF links at all.  Similar situation as in 2011. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 12
  • 13. 3. Adoption of the Vocabulary Best Practices Goal: Help applications understand the data by 1. Reusing terms from widely-used vocabularies. 2. Making definitions of proprietary terms dereferencable. 3. Linking vocabulary terms to terms in other vocabularies. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 13
  • 14. Widely-Used and Proprietary Vocabularies  Strong agreement on some vocabularies.  Proprietary vocabularies are used in addition to common ones, as data is often very specific Widely-Used Vocabularies Proprietary Vocabularies Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 14
  • 15. Dereferencability of Term URIs and Vocabulary Linking  28% of the proprietary vocabularies provide dereferencable URIs.  21% set RDF links to other vocabularies (8% in 2011) • Popular linking predicates: rdfs:range, rdfs:subClassOf, rdfs:subClassOf Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 15
  • 16. Adoption of the Metadata Best Practices 1. Publish machine-readable provenance information. 2. Publish machine-readable licensing information. 3. Publish metadata about alternative access methods (SPARQL endpoints, RDF dumps) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 16
  • 17. Provenance and Licensing Metadata  37% of the datasets provide provenance information • Dublin Core is used more than W3C Prov  10% provide machine-readable licensing information • Most used predicates dc:license, cc:license Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 17
  • 18. Dataset Level Metadata (VoID)  15% of the datasets publish VoID descriptions.  Via these descriptions, it is possible to discover SPARQL endpoints and dumps for about 10% of the data sources. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 18
  • 19. Conclusion concerning Metadata Best Practices  Applications can not rely on availability of metadata, as only a small fraction of all data sources publishes such data.  The Government and Library domains are positive exceptions.  Similarly low numbers as in 2011. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 19
  • 20. “Full” LOD Cloud Diagram 570 datasets  374 datahub.io  196 our crawl http://lod-cloud.net/ Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 20
  • 21. Growth of the “Full” LOD Cloud Diagram  2011: 295 datasets  2014: 570 datasets (+ 93 %) http://lod-cloud.net/ Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 21
  • 22. Comparison of Linked Data and Schema.org Schema.org 1. does not expect data publishers to set data links. 2. relies on marking up data in HTML pages. 3. Strong application pull by Google, Microsoft, Yahoo! Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 22
  • 23. Adoption WebDataCommons, 2013*: 463,000 websites (PLDs) provide Microdata annotations. Google, 2014**: 5 million websites provide Schema.org data.  Orders of magnitude more Schema.org data sources. * WebDataCommons extracts Microdata, RDFa, Microformat data from the CommonCrawl (2.2 billion HTML pages from 12.8 million PLDs). ** Guha in LDOW2014 Keynote Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 23
  • 24. Schema.org Topical Focus Different topics compared to Linked Data. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 24
  • 25. Class / Property Distribution Microdata 2012  Only a small set of classes / properties is actually used.  Less variety compared to Linked Data. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 25
  • 26. Shallowness of the Schema.org Data schema:Product schema:JobPosting Product Names • AppleMacBook Air MC968/A 11.6-Inch Laptop • Apple MacBook Air 11-in, Intel Core i5 1.60GHz, 64 GB, Lion 10.7 JobPostings • More specific properties like skills are hardly used. • 57% of all hiringOrganizations are strings not instances. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 26
  • 27. Conclusion Linked Data Schema.org ~ 1,000 sources > 460,000 sources covers wider range of specific topics (government, libraries, science) topics focused on search engines (products, organizations) contains more complex data structures very simple and shallow data structures partial ontology agreement strong ontology agreement identity resolution eased by RDF links identity resolution often requires value parsing Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 27
  • 28. Thank you. References  Report http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/  Catalog http://linkeddatacatalog.dws.informatik.uni-mannheim.de/ Acknowledgement  This work was supported by Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 28