SlideShare ist ein Scribd-Unternehmen logo
1 von 58
09/30/15 Heiko Paulheim 1
What the Adoption of schema.org
Tells about Linked Open Data
Heiko Paulheim, Data and Web Science Group, University of Mannheim
09/30/15 Heiko Paulheim 2
Synopsis
• Microdata/schema.org vs. Linked Open Data
– Microdata/schema.org in a Nutshell
– Commonalities & Differences
• schema.org adoption and conformance
• Co-evolution of schema.org and deployed data
• Conclusion and Take-aways
Disclaimer:
I do not speak for the
schema.org community.
All views expressed in
this talk are my own.
09/30/15 Heiko Paulheim 3
Microdata in a Nutshell
• Adding structured information to web pages
– By marking up contents
– Arbitrary vocabularies are possible
• Similar to RDFa
<div itemscope
itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Data and Web Science Group</span>
<span itemprop="addressLocality">Mannheim</span>,
<span itemprop="postalCode">68131</span>
<span itemprop="addressCountry">Germany</span>
</div>
09/30/15 Heiko Paulheim 4
Microdata in a Nutshell
• Markup can be extracted to RDF
– See W3C Interest Group Note: Microdata to RDF [1]
[1] http://www.w3.org/TR/microdata-rdf/
<div itemscope
itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Data and Web Science Group</span>
<span itemprop="addressLocality">Mannheim</span>,
<span itemprop="postalCode">68131</span>
<span itemprop="addressCountry">Germany</span>
</div>
_:1 a <http://schema.org/PostalAddress> .
_:1 <http://schema.org/name> "Data and Web Science Group" .
_:1 <http://schema.org/addressLocality> "Mannheim" .
_:1 <http://schema.org/postalCode> "68131" .
_:1 <http://schema.org/adressCounty> "Germany" .
09/30/15 Heiko Paulheim 5
schema.org in a Nutshell
• Microdata can be used with any vocabulary
– Practically, only schema.org is deployed on a large scale
– Plus its historical predecessor, data-vocabulary.org
• schema.org
– Vocabulary for marking up web content
– promoted and used by major search engines
– Google, Bing, Yahoo!, and Yandex
– Can be used with Microdata and RDFa
• ...but RDFa is hardly used
09/30/15 Heiko Paulheim 6
schema.org in a Nutshell
• schema.org (as of May 2015, release 2.0)
– 675 classes
– 965 properties
09/30/15 Heiko Paulheim 7
schema.org in a Nutshell
• schema.org has incorporated some popular vocabularies, including
– Good Relations (2012)
– W3C BibExtend (2014)
– MusicBrainz vocabulary (2015)
– Automotive Ontology (2015)
09/30/15 Heiko Paulheim 8
So...
09/30/15 Heiko Paulheim 9
Microdata/schema.org vs. LOD
• Commonalities
– Both encode machine-interpretable knowledge
– Schema.org uses a standard vocabulary
– Both can be encoded as RDF
09/30/15 Heiko Paulheim 10
Microdata/schema.org vs. LOD
• Differences
– Microdata is embedded in the DOM tree
• i.e., the resulting RDF is always a set of trees
• not a general directed graph
• no cycles, no reification
– Microdata uses only blank nodes and literals
09/30/15 Heiko Paulheim 11
Microdata/schema.org vs. LOD
• Linked Data Principles (TimBL 2006)
– Use URIs as names for things
– Use HTTP URIs that can be looked up
– When someone looks up a HTTP URI,
provide useful information using a standard
– Include links to other URIs
HTML5+MD is a standard
Blank nodes cannot be looked up
MD2RDF creates blank nodes
This is possible with
schema.org/sameas
09/30/15 Heiko Paulheim 12
Microdata/schema.org vs. LOD
• Linked Data Principles (TimBL 2006)
– Use URIs as names for things
– Use HTTP URIs that can be looked up
– When someone looks up a HTTP URI,
provide useful information using a standard
HTML5+MD is a standard
Blank nodes cannot be looked up
MD2RDF creates blank nodes
<div itemscope
itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Data and Web Science Group</span>
<span itemprop="addressLocality">Mannheim</span>,
<span itemprop="postalCode">68131</span>
<span itemprop="addressCountry">Germany</span>
</div>
<http://foo.bar/#1> a <http://schema.org/PostalAddress> .
<http://foo.bar/#1> <http://schema.org/name> "Data and Web
Science Group" .
<http://foo.bar/#1> <http://schema.org/addressLocality>
"Mannheim" .
<http://foo.bar/#1> <http://schema.org/postalCode> "68131" .
<http://foo.bar/#1> <http://schema.org/adressCounty> "Germany" .
09/30/15 Heiko Paulheim 13
Microdata/schema.org vs. LOD
• Linked Data Principles (TimBL 2006)
– Use URIs as names for things
– Use HTTP URIs that can be looked up
– When someone looks up a HTTP URI,
provide useful information using a standard
– Include links to other URIs
This is possible with
schema.org/sameas
• Linkage within schema.org Microdata:
– Only 0.02% of all data providers
use schema.org/sameas
09/30/15 Heiko Paulheim 14
Microdata/schema.org vs. LOD
• Five Star Scheme (TimBL 2010)
* Available on the web with an open license
** Available as machine-readable, structured data
*** as (**), using a non-proprietary format
**** plus: using open standards by the W3C
***** plus: links to other datasets
• What's the license of web data?
09/30/15 Heiko Paulheim 15
Analyzing schema.org Microdata
• Web Data Commons project
– Extracts different sorts of data sets from public web crawl
– RDFa, Microdata, Microformats
– Graph structure of the Web
– Web Content Tables
• http://webdatacommons.org/
We use that corpus
for quantitative analysis
09/30/15 Heiko Paulheim 16
Microdata/schema.org vs. LOD
LOD Cloud 2012 LOD Cloud 2014
09/30/15 Heiko Paulheim 17
...why we should care
• MD/schema.org and LOD have a lot in common
• schema.org is a success story
– Hundreds of thousands of adopters
• schema.org is large-scale
– Billions of triples (using the same schema!)
09/30/15 Heiko Paulheim 18
schema.org vs. LOD: Topical Domains
• Domains by number of datasets in Linked Open Data
– As of 2014
– Classified based on data provider tags
– More than half of the datasets is social web (mostly FOAF files)
Social web
Government
Publications
Life sciences
User-generated content
Cross-domain
Media
Geographic
09/30/15 Heiko Paulheim 19
WebPage
Blog
PostalAddress
Product
Article
BlogPosting
Offer
LocalBusiness
Organization
AggregateRating
Person
ImageObject
Review
other
schema.org vs. LOD: Topical Domains
• Main topics of schema.org:
– Meta information on web page content (web page, blog...)
– Business data (products, offers, …)
– Contact data (businesses, persons, ...)
– (Product) reviews and ratings
• ...and a massive long tail
09/30/15 Heiko Paulheim 20
schema.org vs. LOD: Topical Domains
• schema.org
– High increases in some areas
09/30/15 Heiko Paulheim 21
schema.org Compliance vs. LOD Compliance
• A closer look:
– How is schema.org actually used?
– Where is the schema violated?
• Comparison with LOD
– Hogan et al. (2010): Weaving the Pedantic Web
– Schmachtenberg et al. (2014): Adoption of the Linked Data Best
Practices in Different Topical Domains
09/30/15 Heiko Paulheim 22
schema.org Compliance vs. LOD Compliance
• Compliance dimensions analyzed
– Usage of wrong namespaces, such as http:/schema.org/
– Usage of undefined types
– Usage of undefined properties
– Confusion of datatype properties and object properties
– Property domain violations
– Object property range violations
– Datatype range violations
09/30/15 Heiko Paulheim 23
Building a Corpus of schema.org Microdata
• Starting point: all pages in the CommonCrawl that contain Microdata
• What could be (meant to be) schema.org?
– Everything that contains “schema.org” as a substring in a namespace
– Everything that contains URIs where the protocol and authority is
similar to http://schema.org/ (within an EditDistance of 1)
– Noise filtering: removing all namespaces that occur only once
09/30/15 Heiko Paulheim 24
Building a Corpus of schema.org Microdata
• More than 98% of the preselected pages use a correct namespace
• Frequent namespace variations
– http://www.schema.org/
– https://schema.org
– http://SChema.org/
– http:/schema.org
– ...
debated!
09/30/15 Heiko Paulheim 25
Building a Corpus of schema.org Microdata
• Final corpus of schema.org Microdata
– 6.4 Billion triples
– extracted from 217,018,636 URLs
– belonging to 398,542 PLDs
• This corresponds to ~86% of the overall Microdata collection
09/30/15 Heiko Paulheim 26
schema.org Compliance vs. LOD Compliance
• Usage of undefined types (6.07% of all PLDs)
• Typical causes:
– Misspellings: http://schema.orgStore
– Miscapitalization: http://schema.org/localbusiness
• Comparison:
– 5.8% of all Microdata documents
– 38.8% of all LOD documents
(Hogan et al., 2010)
09/30/15 Heiko Paulheim 27
schema.org Compliance vs. LOD Compliance
• Usage of undefined properties (3.9% of all PLDs)
• Main causes:
– Miscapitalization: http://schema.org/contentURL
– Close, but miss
• http://schema.org/currency
• http://schema.org/fax
– Made up
• http://schema.org/blogId
• http://schema.org/postId
• Comparison:
– 9.7% of all Microdata documents
– 72.4% of all LOD documents
(Hogan et al., 2010)
contentUrl
faxNumber
priceCurrency
09/30/15 Heiko Paulheim 28
schema.org Compliance vs. LOD Compliance
• Using Object Properties as Data Properties (56.6% of all PLDs!)
– i.e., using an object property with a string value
• Typical properties:
– http://schema.org/addresscountry
– http://schema.org/manufacturer
– http://schema.org/author
– http://schema.org/brand
• Comparison:
– 24.35% of all Microdata documents
– 8% of all LOD documents
(Hogan et al., 2010)
09/30/15 Heiko Paulheim 29
schema.org Compliance vs. LOD Compliance
• Using Data Properties as Object Properties
– i.e., using a data property with a complex object
• Neglectable on Microdata (0.2% of all PLDs)
• Comparison:
– 0.6% of all Microdata documents
– 2.2% of all LOD documents
(Hogan et al., 2010)
09/30/15 Heiko Paulheim 30
schema.org Compliance vs. LOD Compliance
• Property Domain Violations (4.0% of all PLDs)
– i.e., using a property with a subject not included in its domain
• Typical problem: shortcuts, e.g.
– price on used on Product
– streetAddress used on LocalBusiness
– …
• Difficult to compare directly
– Semantics are different
– schema.org: domain list is exhaustive
– LOD: open world assumption
Product → Offer → price
LocalBusiness
→ PostalAddress
→ streetAddress
09/30/15 Heiko Paulheim 31
schema.org Compliance vs. LOD Compliance
• Datatype range violations (9.6% of all PLDs)
– i.e., using a data property with an “incompatible” literal
– we use an optimistic parser for URLs, numbers and dates
• Top 20 problems
– 13/20 are dates
– Three are URLS
– Two each are numbers and times
• Comparison:
– 12.06% of all Microdata documents
– 4.6% of all LOD documents
(Hogan et al., 2010)
– Dates are the main problem in both cases
09/30/15 Heiko Paulheim 32
schema.org Compliance vs. LOD Compliance
• Object Property range violations (8.6% of all PLDs)
– i.e., using an Object Property with an object type outside its range
• Typical problem:
– mainContentOfPage with http://schema.org/Blog
– ...instead of http://schema.org/WebPageElement
– Hints at a missing hierarchy relation?
• Difficult to compare directly
– Semantics are different
– schema.org: domain list is exhaustive
– LOD: open world assumption
09/30/15 Heiko Paulheim 33
schema.org Compliance vs. LOD Compliance
• Microdata compliance to schema.org is surprisingly high
– Providers are often not technology evangelists
– (unlike for LOD)
• Most often higher than for LOD
– Notable exception: confusion of Datatype and Object Properties
09/30/15 Heiko Paulheim 34
Fixing Microdata on the Fly
• Many of the problems presented can be fixed
– On the fly on the consumer side
– With a set of simple heuristics
• Advertisement:
– Robert Meusel, Heiko Paulheim:
Heuristics for Fixing Common Errors
in Deployed schema.org Microdata
– Thursday, 2pm, Room Emerald 1
09/30/15 Heiko Paulheim 35
Digging for Reasons
• So, Microdata is often more schema compliant
• But why? Some hypotheses…
– Availability of documentation
– Tool support
– Business incentive
– Schema flexibility
• Can we confirm/reject those from looking at the data?
09/30/15 Heiko Paulheim 36
A Diachronic Perspective
• Versions of schema.org are archived over time
– Plus: there are several crawl releases per year
– i.e., we can look at change over time
• If we look at both schema and deployed data, we may observe
– Adoption rates of schema changes
– Data-first changes to the schema
– Convergence or divergence of deployed data
09/30/15 Heiko Paulheim 37
A Diachronic Perspective
• Three releases of WDC Microdata corpus
– 2012, 2013, and 2014
• Versions of schema.org that were valid
– At the beginning of the crawl
– At the end of the crawl
09/30/15 Heiko Paulheim 38
Overall Convergence
• Measuring convergence
– i.e., homogeneity of descriptions of classes
– Example: two instances of LocalBusiness
_:1
_:2
rdf:type
s:address rdf:type
“Birmingham”
“Main Street 24”
s:LocalBusiness
s:PostalAddress
s:address-
Locality
s:street-
Address
_:1
_:2
rdf:type
s:address rdf:type
“Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
s:address-
Locality
s:street-
Address
09/30/15 Heiko Paulheim 39
Overall Convergence
• Recap
– RDF from Microdata is a set of trees
– i.e., we can enumerate all paths to leaf nodes
(omitting literals)
• Example:
– rdf:type-s:LocalBusiness,
s:address-rdf:type-s:PostalAddress,
s:address-s:addressLocality,
s:address-s:streetAddress _:1
_:2
rdf:type
s:address rdf:type
“Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
s:address-
Locality
s:street-
Address
09/30/15 Heiko Paulheim 40
Overall Convergence
• Using all paths, we can compute the entropy for each class as
• A low entropy refers to a high homogeneity
• We normalize both by maximum entropy
and the total number of paths
– i.e., we use normalized entropy rate as a measure for homogeneity
09/30/15 Heiko Paulheim 41
Overall Convergence
• Observations
– Overall entropy decreases over time
• Classes with high convergence rates
– WebSite, Blog, …
– Hotel, Restaurant, …
– Product, Offer, …
– Rating, Review
Influence of CMS adoption
Yellow pages
Google Rich Snippets
...all of the above
09/30/15 Heiko Paulheim 42
Top-down Adoption
• How fast are changes in the schema adopted?
– New classes/properties
– Deprecations
– Domain/range changes
• Measuring adoption: challenges
– Different crawls
– Overall growth of deployed Microdata
• Measure: normalized usage increase (nui) from i to j:
– nui(s)>1.05: usage of schema element s has increased significantly
– nui(s)<0.95: usage of schema element s has decreased significantly
09/30/15 Heiko Paulheim 43
Top-down Adoption
• Adoption of new classes and properties
– Almost half of all introduced classes are never used!
– Similar for new properties
• Reasons
– Bulk-addition of vocabularies
• not every term is equally needed
• e.g., medical vocabulary
– Blind spot of our approach
• some terms are mainly for e-mail markup
• e.g., Actions
SURPRISE!
09/30/15 Heiko Paulheim 44
Top-down Adoption
• Main domains of positive adoption
– Meta data for web content
(schema.org/Website has the highest NUI)
– Broadcasting (e.g., TV Episodes)
– Questions & Answers
– Postal addresses
• Classes featured in Google Rich Snippets
– Still growth on high level (tens of thousands of PLDs)
– But NUI<0.95
Yellow Pages
Search Engine Listings
Collaboration
with BBC and EBU
Influence of CMS adoption
Q&A Pages, such as
Stackoverflow
09/30/15 Heiko Paulheim 45
Top-down Adoption
• Adoption of domain/range changes
– Again: rather low overall adoption
• Adopted well for
– Products (height, width, itemCondition, …)
– Broadcasting domain (episode, season, actor, ...)
Search Engine Listings
Collaboration
with BBC and EBU
09/30/15 Heiko Paulheim 46
Top-down Adoption
• Adoption of deprecations
– Works well (29 out of 32 have a significantly low NUI)
• Exceptions
– s:map (← s:hasMap)
– s:maps (← s:hasMap)
For Google Maps
(lots of outdated tutorials)
09/30/15 Heiko Paulheim 47
Bottom-up Evolution
• Martin Luther
– Started the protestant church
– A success story, too (like schema.org)
(i.e., 800 million adopters worldwide)
• Famous quote:
– “Man muss […] dem gemeinen Mann aufs Maul schauen”
(roughly:
“You have to listen to the way the common man really speaks.”)
Martin Luther,
1483-1546
Disclaimer:
I do not speak for the
protestant church.
09/30/15 Heiko Paulheim 48
Bottom-up Evolution
• Are new features in the schema first used “inofficially”?
– New classes/properties
– Domain/range changes
• Instrument for measurement: ROC curves
– True positives mapped against false positives
– tp: elements used before
– fp: elements not used before
– Ranking by #PLDs
09/30/15 Heiko Paulheim 49
Bottom-up Evolution
• There are some mild influences observable
– Stronger for domain/range changes
• especially range changes
– Weaker for new classes/properties
2012→ 2013 2013→ 2014 2012→ 2014
classes properties domains ranges
09/30/15 Heiko Paulheim 50
Bottom-up Evolution
• Extension mechanism
– Allows for user-defined classes/properties
– Those become subclasses implicitly
• Analysis over time
– No measurable impact on standard evolution
– “Inofficial” use is likelier than use of extension mechanism
09/30/15 Heiko Paulheim 51
Key Adoption Drivers
• Search Engine Optimization
– Web site providers want to be high in Google rankings
– Direct business incentive!
• Tool adoption
– Major CMSs use schema.org
• Standard Agility
– schema.org: 25 revisions in last three years
– cf. FOAF: six revisions in last eight years
09/30/15 Heiko Paulheim 52
What Does that Mean for LOD?
• Schema.org has been a success story
• Mainly because of
– Business incentive
– Documentation
– Platform adoption (marketing!)
– Standard agility
09/30/15 Heiko Paulheim 53
What Does that Mean for LOD?
• Business Incentive for Data Providers
– not: the cool thing I can do if you provide your data as LOD
– but: what's in for you if you provide your data as LOD
• Documentation
– schema.org has copy-and-paste snippets for HTML5
– what's the effort of setting up a LOD endpoint?
• Platform Adoption
– Think: What if MySQL had an out-of-the-box LOD endpoint?
• Standard Agility
– Observe how your schemas are “abused”
– ...and don't blame the abusers
09/30/15 Heiko Paulheim 54
Wrap-up
• We have looked at the deployment of schema.org
– From a synchronic and a diachronic perspective
• From looking at the data, we can observe
– Driving factors for adoption
– Patterns in standard evolution
– Influence of data consumers, documentation, tool adoption
• ...and identify factors that lead to the wide adoption
• LOD and schema.org/MD have a lot in common
– so we may take away some lessons for boosting the adoption of LOD...
09/30/15 Heiko Paulheim 55
Acknowledgements
• Colleagues at Data&Web Science Group
who contributed to this talk
– Robert Meusel (Number crunching)
– Christian Bizer (Fruitful discussions)
• ...and the Common Crawl Foundation
– Who makes such an analysis possible
09/30/15 Heiko Paulheim 56
Advertisement
• Want to get your hands dirty with data?
• Linked Data for Information Extraction (LD4IE) challenge 2015
– Use schema.org Microdata
– Train a web-scale IE system
– Present at LD4IE workshop (co-located with ISWC 2015)
– Win a prize sponsored by Springer!
• Details
– http://oak.dcs.shef.ac.uk/ld4ie2015
09/30/15 Heiko Paulheim 57
Thank you! Questions?
09/30/15 Heiko Paulheim 58
What the Adoption of schema.org
Tells about Linked Open Data
Heiko Paulheim, Data and Web Science Group

Weitere ähnliche Inhalte

Was ist angesagt?

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Heiko Paulheim
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
vafopoulos
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
vafopoulos
 

Was ist angesagt? (19)

Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Linked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need ReconciliationLinked Data Snowball, or Why We Need Reconciliation
Linked Data Snowball, or Why We Need Reconciliation
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Linked open data for cultural heritage
Linked open data for cultural heritageLinked open data for cultural heritage
Linked open data for cultural heritage
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
An introduction to Linked Open Data
An introduction to Linked Open DataAn introduction to Linked Open Data
An introduction to Linked Open Data
 
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
 

Andere mochten auch

Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
Sante J. Achille
 
Jaws in Space - How to develop & pitch creative ideas
Jaws in Space - How to develop & pitch creative ideasJaws in Space - How to develop & pitch creative ideas
Jaws in Space - How to develop & pitch creative ideas
Hannah Smith
 
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
Matthew Barby
 

Andere mochten auch (20)

#RIMC15 - The Other Side of Optimisation: Improving Internal Productivity & E...
#RIMC15 - The Other Side of Optimisation: Improving Internal Productivity & E...#RIMC15 - The Other Side of Optimisation: Improving Internal Productivity & E...
#RIMC15 - The Other Side of Optimisation: Improving Internal Productivity & E...
 
Publishing Linked Data using Schema.org
Publishing Linked Data using Schema.orgPublishing Linked Data using Schema.org
Publishing Linked Data using Schema.org
 
Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012Schema.org - HTML semântico - Front in Maceio 2012
Schema.org - HTML semântico - Front in Maceio 2012
 
Schema.org
Schema.orgSchema.org
Schema.org
 
Statistics: the grammar of Data Science
Statistics: the grammar of Data ScienceStatistics: the grammar of Data Science
Statistics: the grammar of Data Science
 
Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
Schemas >> Schema.org >> Take Your Website to a New Level with Schema Markup
 
Jaws in Space - How to develop & pitch creative ideas
Jaws in Space - How to develop & pitch creative ideasJaws in Space - How to develop & pitch creative ideas
Jaws in Space - How to develop & pitch creative ideas
 
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
10 Ways to Build a Link in 20 Minutes Flat + Bonus Content
 
ADVANCED COMPETITIVE ANALYSIS - Three questions only the SERPs can answer (Br...
ADVANCED COMPETITIVE ANALYSIS - Three questions only the SERPs can answer (Br...ADVANCED COMPETITIVE ANALYSIS - Three questions only the SERPs can answer (Br...
ADVANCED COMPETITIVE ANALYSIS - Three questions only the SERPs can answer (Br...
 
BrightonSEO - David Naylor 10th April 2015
BrightonSEO - David Naylor 10th April 2015BrightonSEO - David Naylor 10th April 2015
BrightonSEO - David Naylor 10th April 2015
 
BrightonSEO - Proactive Competitive Intelligence or “Where the *^%&# Should I...
BrightonSEO - Proactive Competitive Intelligence or “Where the *^%&# Should I...BrightonSEO - Proactive Competitive Intelligence or “Where the *^%&# Should I...
BrightonSEO - Proactive Competitive Intelligence or “Where the *^%&# Should I...
 
Social Image Sharing for BrightonSEO 2015
Social Image Sharing for BrightonSEO 2015Social Image Sharing for BrightonSEO 2015
Social Image Sharing for BrightonSEO 2015
 
#BrightonSEO: Guerilla User Testing for Search - Stephen Kenwright
#BrightonSEO: Guerilla User Testing for Search - Stephen Kenwright#BrightonSEO: Guerilla User Testing for Search - Stephen Kenwright
#BrightonSEO: Guerilla User Testing for Search - Stephen Kenwright
 
Brighton SEO > The Head Term is Dead > Leveraging Content to Own the Long Tai...
Brighton SEO > The Head Term is Dead > Leveraging Content to Own the Long Tai...Brighton SEO > The Head Term is Dead > Leveraging Content to Own the Long Tai...
Brighton SEO > The Head Term is Dead > Leveraging Content to Own the Long Tai...
 
Brighton April 2015 - Schema, JSON-LD & the semantic web
Brighton April 2015 - Schema, JSON-LD & the semantic webBrighton April 2015 - Schema, JSON-LD & the semantic web
Brighton April 2015 - Schema, JSON-LD & the semantic web
 
SEO | The four main types of Cannibalisation affecting the visibility of your...
SEO | The four main types of Cannibalisation affecting the visibility of your...SEO | The four main types of Cannibalisation affecting the visibility of your...
SEO | The four main types of Cannibalisation affecting the visibility of your...
 
Schema, JSON-LD & the semantic web - Brighton SEO April 2015 - Kirsty Hulse -...
Schema, JSON-LD & the semantic web - Brighton SEO April 2015 - Kirsty Hulse -...Schema, JSON-LD & the semantic web - Brighton SEO April 2015 - Kirsty Hulse -...
Schema, JSON-LD & the semantic web - Brighton SEO April 2015 - Kirsty Hulse -...
 
Making your Competitions Fun - Iain Haywood - Brighton SEO 2015 Presentation
Making your Competitions Fun - Iain Haywood - Brighton SEO 2015 PresentationMaking your Competitions Fun - Iain Haywood - Brighton SEO 2015 Presentation
Making your Competitions Fun - Iain Haywood - Brighton SEO 2015 Presentation
 
BrightonSEO 2015: How good UX can improve SEO
BrightonSEO 2015: How good UX can improve SEOBrightonSEO 2015: How good UX can improve SEO
BrightonSEO 2015: How good UX can improve SEO
 
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
Ecommerce SEO: Boosting visibility with faceted navigation | Slides from Brig...
 

Ähnlich wie What the Adoption of schema.org Tells about Linked Open Data

Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
eswcsummerschool
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
Robert Meusel
 

Ähnlich wie What the Adoption of schema.org Tells about Linked Open Data (20)

Heuristics for Fixing Common Errors in Deployed schema.org Microdata
Heuristics for Fixing Common Errors in Deployed schema.org MicrodataHeuristics for Fixing Common Errors in Deployed schema.org Microdata
Heuristics for Fixing Common Errors in Deployed schema.org Microdata
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Overview of modern software ecosystem for big data analysis
Overview of modern software ecosystem for big data analysisOverview of modern software ecosystem for big data analysis
Overview of modern software ecosystem for big data analysis
 
Achieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLCAchieving the Digital Thread through PLM and ALM Integration using OSLC
Achieving the Digital Thread through PLM and ALM Integration using OSLC
 
Achieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslcAchieving the digital thread through PLM and ALM integration using oslc
Achieving the digital thread through PLM and ALM integration using oslc
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019
 
Linked data tooling XML
Linked data tooling XMLLinked data tooling XML
Linked data tooling XML
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020WEBINAR: Open Research Data in Horizon 2020
WEBINAR: Open Research Data in Horizon 2020
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Introduction to W3C Linked Data Platform
Introduction to W3C Linked Data PlatformIntroduction to W3C Linked Data Platform
Introduction to W3C Linked Data Platform
 
Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...Approaches to combining supplementary datasets across multiple trusted resear...
Approaches to combining supplementary datasets across multiple trusted resear...
 

Mehr von Heiko Paulheim

Mehr von Heiko Paulheim (16)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 
Extending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List PagesExtending DBpedia with Wikipedia List Pages
Extending DBpedia with Wikipedia List Pages
 

Kürzlich hochgeladen

75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Kürzlich hochgeladen (20)

75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 

What the Adoption of schema.org Tells about Linked Open Data

  • 1. 09/30/15 Heiko Paulheim 1 What the Adoption of schema.org Tells about Linked Open Data Heiko Paulheim, Data and Web Science Group, University of Mannheim
  • 2. 09/30/15 Heiko Paulheim 2 Synopsis • Microdata/schema.org vs. Linked Open Data – Microdata/schema.org in a Nutshell – Commonalities & Differences • schema.org adoption and conformance • Co-evolution of schema.org and deployed data • Conclusion and Take-aways Disclaimer: I do not speak for the schema.org community. All views expressed in this talk are my own.
  • 3. 09/30/15 Heiko Paulheim 3 Microdata in a Nutshell • Adding structured information to web pages – By marking up contents – Arbitrary vocabularies are possible • Similar to RDFa <div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span> </div>
  • 4. 09/30/15 Heiko Paulheim 4 Microdata in a Nutshell • Markup can be extracted to RDF – See W3C Interest Group Note: Microdata to RDF [1] [1] http://www.w3.org/TR/microdata-rdf/ <div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span> </div> _:1 a <http://schema.org/PostalAddress> . _:1 <http://schema.org/name> "Data and Web Science Group" . _:1 <http://schema.org/addressLocality> "Mannheim" . _:1 <http://schema.org/postalCode> "68131" . _:1 <http://schema.org/adressCounty> "Germany" .
  • 5. 09/30/15 Heiko Paulheim 5 schema.org in a Nutshell • Microdata can be used with any vocabulary – Practically, only schema.org is deployed on a large scale – Plus its historical predecessor, data-vocabulary.org • schema.org – Vocabulary for marking up web content – promoted and used by major search engines – Google, Bing, Yahoo!, and Yandex – Can be used with Microdata and RDFa • ...but RDFa is hardly used
  • 6. 09/30/15 Heiko Paulheim 6 schema.org in a Nutshell • schema.org (as of May 2015, release 2.0) – 675 classes – 965 properties
  • 7. 09/30/15 Heiko Paulheim 7 schema.org in a Nutshell • schema.org has incorporated some popular vocabularies, including – Good Relations (2012) – W3C BibExtend (2014) – MusicBrainz vocabulary (2015) – Automotive Ontology (2015)
  • 9. 09/30/15 Heiko Paulheim 9 Microdata/schema.org vs. LOD • Commonalities – Both encode machine-interpretable knowledge – Schema.org uses a standard vocabulary – Both can be encoded as RDF
  • 10. 09/30/15 Heiko Paulheim 10 Microdata/schema.org vs. LOD • Differences – Microdata is embedded in the DOM tree • i.e., the resulting RDF is always a set of trees • not a general directed graph • no cycles, no reification – Microdata uses only blank nodes and literals
  • 11. 09/30/15 Heiko Paulheim 11 Microdata/schema.org vs. LOD • Linked Data Principles (TimBL 2006) – Use URIs as names for things – Use HTTP URIs that can be looked up – When someone looks up a HTTP URI, provide useful information using a standard – Include links to other URIs HTML5+MD is a standard Blank nodes cannot be looked up MD2RDF creates blank nodes This is possible with schema.org/sameas
  • 12. 09/30/15 Heiko Paulheim 12 Microdata/schema.org vs. LOD • Linked Data Principles (TimBL 2006) – Use URIs as names for things – Use HTTP URIs that can be looked up – When someone looks up a HTTP URI, provide useful information using a standard HTML5+MD is a standard Blank nodes cannot be looked up MD2RDF creates blank nodes <div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span> </div> <http://foo.bar/#1> a <http://schema.org/PostalAddress> . <http://foo.bar/#1> <http://schema.org/name> "Data and Web Science Group" . <http://foo.bar/#1> <http://schema.org/addressLocality> "Mannheim" . <http://foo.bar/#1> <http://schema.org/postalCode> "68131" . <http://foo.bar/#1> <http://schema.org/adressCounty> "Germany" .
  • 13. 09/30/15 Heiko Paulheim 13 Microdata/schema.org vs. LOD • Linked Data Principles (TimBL 2006) – Use URIs as names for things – Use HTTP URIs that can be looked up – When someone looks up a HTTP URI, provide useful information using a standard – Include links to other URIs This is possible with schema.org/sameas • Linkage within schema.org Microdata: – Only 0.02% of all data providers use schema.org/sameas
  • 14. 09/30/15 Heiko Paulheim 14 Microdata/schema.org vs. LOD • Five Star Scheme (TimBL 2010) * Available on the web with an open license ** Available as machine-readable, structured data *** as (**), using a non-proprietary format **** plus: using open standards by the W3C ***** plus: links to other datasets • What's the license of web data?
  • 15. 09/30/15 Heiko Paulheim 15 Analyzing schema.org Microdata • Web Data Commons project – Extracts different sorts of data sets from public web crawl – RDFa, Microdata, Microformats – Graph structure of the Web – Web Content Tables • http://webdatacommons.org/ We use that corpus for quantitative analysis
  • 16. 09/30/15 Heiko Paulheim 16 Microdata/schema.org vs. LOD LOD Cloud 2012 LOD Cloud 2014
  • 17. 09/30/15 Heiko Paulheim 17 ...why we should care • MD/schema.org and LOD have a lot in common • schema.org is a success story – Hundreds of thousands of adopters • schema.org is large-scale – Billions of triples (using the same schema!)
  • 18. 09/30/15 Heiko Paulheim 18 schema.org vs. LOD: Topical Domains • Domains by number of datasets in Linked Open Data – As of 2014 – Classified based on data provider tags – More than half of the datasets is social web (mostly FOAF files) Social web Government Publications Life sciences User-generated content Cross-domain Media Geographic
  • 19. 09/30/15 Heiko Paulheim 19 WebPage Blog PostalAddress Product Article BlogPosting Offer LocalBusiness Organization AggregateRating Person ImageObject Review other schema.org vs. LOD: Topical Domains • Main topics of schema.org: – Meta information on web page content (web page, blog...) – Business data (products, offers, …) – Contact data (businesses, persons, ...) – (Product) reviews and ratings • ...and a massive long tail
  • 20. 09/30/15 Heiko Paulheim 20 schema.org vs. LOD: Topical Domains • schema.org – High increases in some areas
  • 21. 09/30/15 Heiko Paulheim 21 schema.org Compliance vs. LOD Compliance • A closer look: – How is schema.org actually used? – Where is the schema violated? • Comparison with LOD – Hogan et al. (2010): Weaving the Pedantic Web – Schmachtenberg et al. (2014): Adoption of the Linked Data Best Practices in Different Topical Domains
  • 22. 09/30/15 Heiko Paulheim 22 schema.org Compliance vs. LOD Compliance • Compliance dimensions analyzed – Usage of wrong namespaces, such as http:/schema.org/ – Usage of undefined types – Usage of undefined properties – Confusion of datatype properties and object properties – Property domain violations – Object property range violations – Datatype range violations
  • 23. 09/30/15 Heiko Paulheim 23 Building a Corpus of schema.org Microdata • Starting point: all pages in the CommonCrawl that contain Microdata • What could be (meant to be) schema.org? – Everything that contains “schema.org” as a substring in a namespace – Everything that contains URIs where the protocol and authority is similar to http://schema.org/ (within an EditDistance of 1) – Noise filtering: removing all namespaces that occur only once
  • 24. 09/30/15 Heiko Paulheim 24 Building a Corpus of schema.org Microdata • More than 98% of the preselected pages use a correct namespace • Frequent namespace variations – http://www.schema.org/ – https://schema.org – http://SChema.org/ – http:/schema.org – ... debated!
  • 25. 09/30/15 Heiko Paulheim 25 Building a Corpus of schema.org Microdata • Final corpus of schema.org Microdata – 6.4 Billion triples – extracted from 217,018,636 URLs – belonging to 398,542 PLDs • This corresponds to ~86% of the overall Microdata collection
  • 26. 09/30/15 Heiko Paulheim 26 schema.org Compliance vs. LOD Compliance • Usage of undefined types (6.07% of all PLDs) • Typical causes: – Misspellings: http://schema.orgStore – Miscapitalization: http://schema.org/localbusiness • Comparison: – 5.8% of all Microdata documents – 38.8% of all LOD documents (Hogan et al., 2010)
  • 27. 09/30/15 Heiko Paulheim 27 schema.org Compliance vs. LOD Compliance • Usage of undefined properties (3.9% of all PLDs) • Main causes: – Miscapitalization: http://schema.org/contentURL – Close, but miss • http://schema.org/currency • http://schema.org/fax – Made up • http://schema.org/blogId • http://schema.org/postId • Comparison: – 9.7% of all Microdata documents – 72.4% of all LOD documents (Hogan et al., 2010) contentUrl faxNumber priceCurrency
  • 28. 09/30/15 Heiko Paulheim 28 schema.org Compliance vs. LOD Compliance • Using Object Properties as Data Properties (56.6% of all PLDs!) – i.e., using an object property with a string value • Typical properties: – http://schema.org/addresscountry – http://schema.org/manufacturer – http://schema.org/author – http://schema.org/brand • Comparison: – 24.35% of all Microdata documents – 8% of all LOD documents (Hogan et al., 2010)
  • 29. 09/30/15 Heiko Paulheim 29 schema.org Compliance vs. LOD Compliance • Using Data Properties as Object Properties – i.e., using a data property with a complex object • Neglectable on Microdata (0.2% of all PLDs) • Comparison: – 0.6% of all Microdata documents – 2.2% of all LOD documents (Hogan et al., 2010)
  • 30. 09/30/15 Heiko Paulheim 30 schema.org Compliance vs. LOD Compliance • Property Domain Violations (4.0% of all PLDs) – i.e., using a property with a subject not included in its domain • Typical problem: shortcuts, e.g. – price on used on Product – streetAddress used on LocalBusiness – … • Difficult to compare directly – Semantics are different – schema.org: domain list is exhaustive – LOD: open world assumption Product → Offer → price LocalBusiness → PostalAddress → streetAddress
  • 31. 09/30/15 Heiko Paulheim 31 schema.org Compliance vs. LOD Compliance • Datatype range violations (9.6% of all PLDs) – i.e., using a data property with an “incompatible” literal – we use an optimistic parser for URLs, numbers and dates • Top 20 problems – 13/20 are dates – Three are URLS – Two each are numbers and times • Comparison: – 12.06% of all Microdata documents – 4.6% of all LOD documents (Hogan et al., 2010) – Dates are the main problem in both cases
  • 32. 09/30/15 Heiko Paulheim 32 schema.org Compliance vs. LOD Compliance • Object Property range violations (8.6% of all PLDs) – i.e., using an Object Property with an object type outside its range • Typical problem: – mainContentOfPage with http://schema.org/Blog – ...instead of http://schema.org/WebPageElement – Hints at a missing hierarchy relation? • Difficult to compare directly – Semantics are different – schema.org: domain list is exhaustive – LOD: open world assumption
  • 33. 09/30/15 Heiko Paulheim 33 schema.org Compliance vs. LOD Compliance • Microdata compliance to schema.org is surprisingly high – Providers are often not technology evangelists – (unlike for LOD) • Most often higher than for LOD – Notable exception: confusion of Datatype and Object Properties
  • 34. 09/30/15 Heiko Paulheim 34 Fixing Microdata on the Fly • Many of the problems presented can be fixed – On the fly on the consumer side – With a set of simple heuristics • Advertisement: – Robert Meusel, Heiko Paulheim: Heuristics for Fixing Common Errors in Deployed schema.org Microdata – Thursday, 2pm, Room Emerald 1
  • 35. 09/30/15 Heiko Paulheim 35 Digging for Reasons • So, Microdata is often more schema compliant • But why? Some hypotheses… – Availability of documentation – Tool support – Business incentive – Schema flexibility • Can we confirm/reject those from looking at the data?
  • 36. 09/30/15 Heiko Paulheim 36 A Diachronic Perspective • Versions of schema.org are archived over time – Plus: there are several crawl releases per year – i.e., we can look at change over time • If we look at both schema and deployed data, we may observe – Adoption rates of schema changes – Data-first changes to the schema – Convergence or divergence of deployed data
  • 37. 09/30/15 Heiko Paulheim 37 A Diachronic Perspective • Three releases of WDC Microdata corpus – 2012, 2013, and 2014 • Versions of schema.org that were valid – At the beginning of the crawl – At the end of the crawl
  • 38. 09/30/15 Heiko Paulheim 38 Overall Convergence • Measuring convergence – i.e., homogeneity of descriptions of classes – Example: two instances of LocalBusiness _:1 _:2 rdf:type s:address rdf:type “Birmingham” “Main Street 24” s:LocalBusiness s:PostalAddress s:address- Locality s:street- Address _:1 _:2 rdf:type s:address rdf:type “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress s:address- Locality s:street- Address
  • 39. 09/30/15 Heiko Paulheim 39 Overall Convergence • Recap – RDF from Microdata is a set of trees – i.e., we can enumerate all paths to leaf nodes (omitting literals) • Example: – rdf:type-s:LocalBusiness, s:address-rdf:type-s:PostalAddress, s:address-s:addressLocality, s:address-s:streetAddress _:1 _:2 rdf:type s:address rdf:type “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress s:address- Locality s:street- Address
  • 40. 09/30/15 Heiko Paulheim 40 Overall Convergence • Using all paths, we can compute the entropy for each class as • A low entropy refers to a high homogeneity • We normalize both by maximum entropy and the total number of paths – i.e., we use normalized entropy rate as a measure for homogeneity
  • 41. 09/30/15 Heiko Paulheim 41 Overall Convergence • Observations – Overall entropy decreases over time • Classes with high convergence rates – WebSite, Blog, … – Hotel, Restaurant, … – Product, Offer, … – Rating, Review Influence of CMS adoption Yellow pages Google Rich Snippets ...all of the above
  • 42. 09/30/15 Heiko Paulheim 42 Top-down Adoption • How fast are changes in the schema adopted? – New classes/properties – Deprecations – Domain/range changes • Measuring adoption: challenges – Different crawls – Overall growth of deployed Microdata • Measure: normalized usage increase (nui) from i to j: – nui(s)>1.05: usage of schema element s has increased significantly – nui(s)<0.95: usage of schema element s has decreased significantly
  • 43. 09/30/15 Heiko Paulheim 43 Top-down Adoption • Adoption of new classes and properties – Almost half of all introduced classes are never used! – Similar for new properties • Reasons – Bulk-addition of vocabularies • not every term is equally needed • e.g., medical vocabulary – Blind spot of our approach • some terms are mainly for e-mail markup • e.g., Actions SURPRISE!
  • 44. 09/30/15 Heiko Paulheim 44 Top-down Adoption • Main domains of positive adoption – Meta data for web content (schema.org/Website has the highest NUI) – Broadcasting (e.g., TV Episodes) – Questions & Answers – Postal addresses • Classes featured in Google Rich Snippets – Still growth on high level (tens of thousands of PLDs) – But NUI<0.95 Yellow Pages Search Engine Listings Collaboration with BBC and EBU Influence of CMS adoption Q&A Pages, such as Stackoverflow
  • 45. 09/30/15 Heiko Paulheim 45 Top-down Adoption • Adoption of domain/range changes – Again: rather low overall adoption • Adopted well for – Products (height, width, itemCondition, …) – Broadcasting domain (episode, season, actor, ...) Search Engine Listings Collaboration with BBC and EBU
  • 46. 09/30/15 Heiko Paulheim 46 Top-down Adoption • Adoption of deprecations – Works well (29 out of 32 have a significantly low NUI) • Exceptions – s:map (← s:hasMap) – s:maps (← s:hasMap) For Google Maps (lots of outdated tutorials)
  • 47. 09/30/15 Heiko Paulheim 47 Bottom-up Evolution • Martin Luther – Started the protestant church – A success story, too (like schema.org) (i.e., 800 million adopters worldwide) • Famous quote: – “Man muss […] dem gemeinen Mann aufs Maul schauen” (roughly: “You have to listen to the way the common man really speaks.”) Martin Luther, 1483-1546 Disclaimer: I do not speak for the protestant church.
  • 48. 09/30/15 Heiko Paulheim 48 Bottom-up Evolution • Are new features in the schema first used “inofficially”? – New classes/properties – Domain/range changes • Instrument for measurement: ROC curves – True positives mapped against false positives – tp: elements used before – fp: elements not used before – Ranking by #PLDs
  • 49. 09/30/15 Heiko Paulheim 49 Bottom-up Evolution • There are some mild influences observable – Stronger for domain/range changes • especially range changes – Weaker for new classes/properties 2012→ 2013 2013→ 2014 2012→ 2014 classes properties domains ranges
  • 50. 09/30/15 Heiko Paulheim 50 Bottom-up Evolution • Extension mechanism – Allows for user-defined classes/properties – Those become subclasses implicitly • Analysis over time – No measurable impact on standard evolution – “Inofficial” use is likelier than use of extension mechanism
  • 51. 09/30/15 Heiko Paulheim 51 Key Adoption Drivers • Search Engine Optimization – Web site providers want to be high in Google rankings – Direct business incentive! • Tool adoption – Major CMSs use schema.org • Standard Agility – schema.org: 25 revisions in last three years – cf. FOAF: six revisions in last eight years
  • 52. 09/30/15 Heiko Paulheim 52 What Does that Mean for LOD? • Schema.org has been a success story • Mainly because of – Business incentive – Documentation – Platform adoption (marketing!) – Standard agility
  • 53. 09/30/15 Heiko Paulheim 53 What Does that Mean for LOD? • Business Incentive for Data Providers – not: the cool thing I can do if you provide your data as LOD – but: what's in for you if you provide your data as LOD • Documentation – schema.org has copy-and-paste snippets for HTML5 – what's the effort of setting up a LOD endpoint? • Platform Adoption – Think: What if MySQL had an out-of-the-box LOD endpoint? • Standard Agility – Observe how your schemas are “abused” – ...and don't blame the abusers
  • 54. 09/30/15 Heiko Paulheim 54 Wrap-up • We have looked at the deployment of schema.org – From a synchronic and a diachronic perspective • From looking at the data, we can observe – Driving factors for adoption – Patterns in standard evolution – Influence of data consumers, documentation, tool adoption • ...and identify factors that lead to the wide adoption • LOD and schema.org/MD have a lot in common – so we may take away some lessons for boosting the adoption of LOD...
  • 55. 09/30/15 Heiko Paulheim 55 Acknowledgements • Colleagues at Data&Web Science Group who contributed to this talk – Robert Meusel (Number crunching) – Christian Bizer (Fruitful discussions) • ...and the Common Crawl Foundation – Who makes such an analysis possible
  • 56. 09/30/15 Heiko Paulheim 56 Advertisement • Want to get your hands dirty with data? • Linked Data for Information Extraction (LD4IE) challenge 2015 – Use schema.org Microdata – Train a web-scale IE system – Present at LD4IE workshop (co-located with ISWC 2015) – Win a prize sponsored by Springer! • Details – http://oak.dcs.shef.ac.uk/ld4ie2015
  • 57. 09/30/15 Heiko Paulheim 57 Thank you! Questions?
  • 58. 09/30/15 Heiko Paulheim 58 What the Adoption of schema.org Tells about Linked Open Data Heiko Paulheim, Data and Web Science Group