SlideShare ist ein Scribd-Unternehmen logo
1 von 27
A Web-scale Study of the Adoption and
Evolution of the schema.org Vocabulary
over Time
Robert Meusel, Christian Bizer and
Heiko Paulheim
2
Motivation - LOD Cloud with 1.000 data providers
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
3
Motivation - schema.org MD with 700k data providers
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
4
Microdata in a Nutshell
 Adding structured information to web pages
• By marking up contents and entities
 Arbitrary vocabularies are possible
• Practically, only schema.org is deployed on a large scale
• Plus its historical predecessor: data-vocabulary.org
 Similar to RDFa
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
<div itemscope itemtype="http://schema.org/PostalAddress">
<span itemprop="name">Data and Web Science Group</span>
<span itemprop="addressLocality">Mannheim</span>,
<span itemprop="postalCode">68131</span>
<span itemprop="addressCountry">Germany</span>
</div>
5
Schema.org in a Nutshell
 Vocabulary for marking up entities on web pages
• 675 classes and 965 properties (as of May 2015, release 2.0)
 Promoted and consumes by major search engine companies
• Google, Bing, Yahoo!, and Yandex
• Google Rich Snippets
 Community-driven
evolution and
development
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
6
Schema.org in a Nutshell – Coverage
 Schema.org has incorporated some popular vocabularies, like:
• Good Relations (2012)
• W3C BibExtend (2014)
• MusicBrainz vocabulary (2015)
• Automotive Ontology (2015)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
7
Microdata with Schema.org in HTML Pages
<html>
…
<body>
…
<div id="main-section" class="performance left" data-
sku="M17242_580“>
<h1> Predator Instinct FG Fußballschuh
</h1>
<div>
<meta content="EUR">
<span
data-sale-price="219.95">219,95</span>
…
</body>
</html>
HTML pages embed directly
markup languages to annotate
items using different vocabularies
<html>
…
<body>
…
<div id="main-section" class="performance left" data-
sku="M17242_580" itemscope
itemtype="http://schema.org/Product">
<h1 itemprop="name"> Predator Instinct FG Fußballschuh
</h1>
<div itemscope itemtype="http://schema.org/Offer"
itemprop="offers">
<meta itemprop="priceCurrency" content="EUR">
<span itemprop="price" data-sale-
price="219.95">219,95</span>
…
</body>
</html>
1._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://schema.org/Product> .
2._:node1 <http://schema.org/Product/name> "Predator
Instinct FG Fußballschuh"@de .
3._:node1 <http://www.w3.org/1999/02/22-rdf-syntax-
ns#type> <http://schema.org/Offer> .
4._:node1 <http://schema.org/Offer/price>
"219,95"@de .
5._:node1 <http://schema.org/Offer/priceCurrency>
"EUR" .
6.…
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
8
Wrap-Up
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
 Semantic annotations are used by more and more websites
 Entities on websites become machine-readable and machine-
understandable
 schema.org together with Microdata is a success story
• Promoted by search engine companies
• Deployed by over 17% of all websites [1] (over 700k data providers)
 Usage is more compliant to the schema than e.g. LOD [2]
[1] http://webdatacommons.org/structureddata/2014-12/stats/stats.html
[2] Meusel and Paulheim, ESWC 2015
9
Digging for Reasons
 So, Microdata is more often deployed and is often more
schema compliant, although there are millions of uncontrolled
providers with different skill sets
 But why? Some hypotheses…
• Availability of documentation
• Tool support
• Business incentive
• Schema flexibility
 Can we confirm/reject those from looking at the data?
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
10
A Diachronic Perspective
 Versions of schema.org are archived over time
• Plus: there are several crawl releases per year
• i.e., we can look at change over time
 If we look at both schema and deployed data, we may observe
• Adoption rates of schema changes
• Data-first changes to the schema
• Convergence or divergence of deployed data
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
11
A Diachronic Perspective
 Three releases of WDC Microdata corpus [1]
• 2012, 2013, and 2014
 Versions of schema.org that were valid
• At the beginning of the crawl
• At the end of the crawl
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
[1] http://webdatacommons.org/structureddata
12
Top-down Adoption
 How fast are changes in the schema adopted?
• New classes/properties
• Deprecations
• Domain/range changes
 Measuring adoption: challenges
• Different crawls
• Overall growth of deployed schema.org
 Measure: normalized usage increase (nui) from i to j:
• nui(s)>1.05: usage of schema element s has increased significantly
• nui(s)<0.95: usage of schema element s has decreased significantly
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
13
Top-down Adoption
 Adoption of new classes and properties
• Almost half of all introduced classes are never used!
• Similar for new properties
 Reasons
• Bulk-addition of vocabularies
• not every term is equally needed
• e.g., medical vocabulary
• Blind spot of our approach
• some terms are mainly for e-mail markup
• e.g., Actions
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
SURPRISE!
14
Top-down Adoption
 Main domains of positive adoption
• Meta data for web content
(schema.org/Website has the highest nui)
• Broadcasting (e.g., TV Episodes)
• Questions & Answers
• Postal addresses
 Classes featured in Google Rich Snippets
• Still growth on high level (tens of thousands of data providers)
• But nui(s)<0.95
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Yellow Pages
Search Engine Listings
Collaboration
with BBC and EBU
Influence of CMS adoption
Q&A Pages, such as
Stackoverflow
15
Top-down Adoption
 Adoption of domain/range changes
• Again: rather low overall adoption
 Adopted well for
• Products (height, width, itemCondition, …)
• Broadcasting domain (episode, actor, ...)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Search Engine Listings
Collaboration
with BBC and EBU
16
Top-down Adoption
 Adoption of deprecations
• Works well (29 out of 32 have a significantly low nui)
 Exceptions
• s:map (← s:hasMap)
• s:maps (← s:hasMap)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
For Google Maps
(lots of outdated tutorials)
17
Bottom-up Evolution
 Martin Luther
• Started the protestant church
• A success story, too (like schema.org)
• (i.e., 800 million adopters worldwide)
 Famous quote:
• “Man muss […] dem gemeinen Mann aufs Maul schauen”
• (roughly:
“You have to listen to the way the common man really speaks.”)
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Martin Luther,
1483-1546
Disclaimer:
I do not speak for the
protestant church.
18
Bottom-up Evolution
 Are new features in the schema first used “inofficially”?
• New classes/properties
• Domain/range changes
 Instrument for measurement: ROC curves
• True positives mapped against false positives
• tp: elements used before
• fp: elements not used before
• Ranking by #PLDs
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
19
Bottom-up Evolution
 There are some mild influences observable
• Stronger for domain/range changes
• especially range changes
• Weaker for new classes/properties
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
2012→ 2013 2013→ 2014 2012→ 2014
classes properties domains ranges
20
Bottom-up Evolution
 Extension mechanism
• Allows for user-defined classes/properties
• Those become subclasses implicitly
 Analysis over time
• No measurable impact on standard evolution
• “Inofficial” use is likelier than use of extension mechanism
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
s:Product/ElectronicProduct
s:price/reducedPrice
21
Overall Convergence
 Measuring convergence
• i.e., homogeneity of descriptions of classes
• Example: two instances of s:LocalBusiness
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Birmingham”
“Main Street 24”
s:LocalBusiness
s:PostalAddress _:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
22
Overall Convergence
 Recap
• RDF from Microdata is a set of trees
• i.e., we can enumerate all paths to leaf nodes
(omitting literals)
 Example:
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
_:1
_:2 “Liverpool”
“Church Street 1”
s:LocalBusiness
s:PostalAddress
rdf:type-s:LocalBusiness,
s:address-rdf:type-s:PostalAddress,
s:address-s:addressLocality,
s:address-s:streetAddress
23
Overall Convergence
 Using all paths, we can compute the entropy for each class as
 A low entropy refers to a high homogeneity
 We normalize both by maximum entropy
and the total number of paths
• i.e., we use normalized entropy rate as a measure for homogeneity
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
24
Overall Convergence
 Observations
• Overall entropy decreases over time
 Classes with high convergence rates
• WebSite, Blog, …
• Hotel, Restaurant, …
• Product, Offer, …
• Rating, Review
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Influence of CMS adoption
Yellow pages
Google Rich Snippets
...all of the above
25
Key Adoption Drivers
 Search Engine Optimization
• Web site providers want to be high in Google rankings
• Direct business incentive!
 Tool adoption
• Major CMSs use schema.org
 Standard Agility
• schema.org: 25 revisions in last three years
• cf. FOAF: six revisions in last eight years
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
26
Summary
 Both ways, top-down and bottom-up adoptions can be
observed
 Homogeneity of deployed schema increase over time
 Described empirical data-driven study reveals valuable insights
to understand how and why schema.org is a success story
 Observed key drivers and obstacles can also help to understand
and analysis adoption of other standards, e.g. LOD
 More fine-grained insights might be revealed when extending
the analysis corpus to the mailing list archive and issue tracker
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
27
Thank you! Questions? Feedback?
Raw data can be found on the website of WebDataCommons:
http://webdatacommons.org/structureddata/
More interesting datasets and analysis:
http://webdatacommons.org/index.html
A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
Acknowledgement
The extraction and analysis of the datasets was supported
by AWS in Education Grant.

Weitere ähnliche Inhalte

Was ist angesagt?

Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Globus
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...Data Beers
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountLeigh Dodds
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009Ian Foster
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern PalimpsestLeigh Dodds
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?Martin Hepp
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF Navid Sedighpour
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million WebsitesChris Bizer
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data StagingHenning Bergmeyer
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference InformationKai Schlegel
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
 

Was ist angesagt? (18)

Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
The RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple CountThe RDF Report Card: Beyond the Triple Count
The RDF Report Card: Beyond the Triple Count
 
Shawn-Averkamp-feb25
Shawn-Averkamp-feb25Shawn-Averkamp-feb25
Shawn-Averkamp-feb25
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Globus: Enabling the Open Storage Network
Globus: Enabling the Open Storage NetworkGlobus: Enabling the Open Storage Network
Globus: Enabling the Open Storage Network
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
The Modern Palimpsest
The Modern PalimpsestThe Modern Palimpsest
The Modern Palimpsest
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
 
20090701 Climate Data Staging
20090701 Climate Data Staging20090701 Climate Data Staging
20090701 Climate Data Staging
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 

Ähnlich wie A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...MakoLab SA
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentAcquia
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?Richard Wallis
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...Sebastien Goiffon
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingsopekmir
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyIndiana Online Users Group
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayMongoDB
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We HaveRichard Wallis
 
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsLeveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsJoachim Neubert
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptxritikgarg48
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Rahul Jain
 
Hypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsHypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsMichael Koster
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Sebastien Goiffon
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan PirvuDataScienceConferenc1
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesRandy Shoup
 

Ähnlich wie A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time (20)

The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
Industry Ontologies: Case Studies in Creating and Extending Schema.org for In...
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
How to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured ContentHow to Optimize Your Drupal Site with Structured Content
How to Optimize Your Drupal Site with Structured Content
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Schema.org where did that come from?
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
 
How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...How city of chicago boosts their sap business objects environment prepares fo...
How city of chicago boosts their sap business objects environment prepares fo...
 
A possible future role of schema.org for business reporting
A possible future role of schema.org for business reportingA possible future role of schema.org for business reporting
A possible future role of schema.org for business reporting
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Accelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO WayAccelerating Delivery of Data Products - The EBSCO Way
Accelerating Delivery of Data Products - The EBSCO Way
 
Telling the World and Our Users What We Have
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
 
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for EconomicsLeveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
Leveraging SKOS to trace the overhaul of the STW Thesaurus for Economics
 
Intern Project Showcase.pptx
Intern Project Showcase.pptxIntern Project Showcase.pptx
Intern Project Showcase.pptx
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Hypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of ThingsHypermedia System Architecture for a Web of Things
Hypermedia System Architecture for a Web of Things
 
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
Case study: Life Cycle Management for SAP BusinessObjects platform as well as...
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Monoliths, Migrations, and Microservices
Monoliths, Migrations, and MicroservicesMonoliths, Migrations, and Microservices
Monoliths, Migrations, and Microservices
 

Kürzlich hochgeladen

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 

Kürzlich hochgeladen (20)

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 

A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time

  • 1. A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time Robert Meusel, Christian Bizer and Heiko Paulheim
  • 2. 2 Motivation - LOD Cloud with 1.000 data providers A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 3. 3 Motivation - schema.org MD with 700k data providers A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 4. 4 Microdata in a Nutshell  Adding structured information to web pages • By marking up contents and entities  Arbitrary vocabularies are possible • Practically, only schema.org is deployed on a large scale • Plus its historical predecessor: data-vocabulary.org  Similar to RDFa A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 <div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="name">Data and Web Science Group</span> <span itemprop="addressLocality">Mannheim</span>, <span itemprop="postalCode">68131</span> <span itemprop="addressCountry">Germany</span> </div>
  • 5. 5 Schema.org in a Nutshell  Vocabulary for marking up entities on web pages • 675 classes and 965 properties (as of May 2015, release 2.0)  Promoted and consumes by major search engine companies • Google, Bing, Yahoo!, and Yandex • Google Rich Snippets  Community-driven evolution and development A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 6. 6 Schema.org in a Nutshell – Coverage  Schema.org has incorporated some popular vocabularies, like: • Good Relations (2012) • W3C BibExtend (2014) • MusicBrainz vocabulary (2015) • Automotive Ontology (2015) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 7. 7 Microdata with Schema.org in HTML Pages <html> … <body> … <div id="main-section" class="performance left" data- sku="M17242_580“> <h1> Predator Instinct FG Fußballschuh </h1> <div> <meta content="EUR"> <span data-sale-price="219.95">219,95</span> … </body> </html> HTML pages embed directly markup languages to annotate items using different vocabularies <html> … <body> … <div id="main-section" class="performance left" data- sku="M17242_580" itemscope itemtype="http://schema.org/Product"> <h1 itemprop="name"> Predator Instinct FG Fußballschuh </h1> <div itemscope itemtype="http://schema.org/Offer" itemprop="offers"> <meta itemprop="priceCurrency" content="EUR"> <span itemprop="price" data-sale- price="219.95">219,95</span> … </body> </html> 1._:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Product> . 2._:node1 <http://schema.org/Product/name> "Predator Instinct FG Fußballschuh"@de . 3._:node1 <http://www.w3.org/1999/02/22-rdf-syntax- ns#type> <http://schema.org/Offer> . 4._:node1 <http://schema.org/Offer/price> "219,95"@de . 5._:node1 <http://schema.org/Offer/priceCurrency> "EUR" . 6.… A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 8. 8 Wrap-Up A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015  Semantic annotations are used by more and more websites  Entities on websites become machine-readable and machine- understandable  schema.org together with Microdata is a success story • Promoted by search engine companies • Deployed by over 17% of all websites [1] (over 700k data providers)  Usage is more compliant to the schema than e.g. LOD [2] [1] http://webdatacommons.org/structureddata/2014-12/stats/stats.html [2] Meusel and Paulheim, ESWC 2015
  • 9. 9 Digging for Reasons  So, Microdata is more often deployed and is often more schema compliant, although there are millions of uncontrolled providers with different skill sets  But why? Some hypotheses… • Availability of documentation • Tool support • Business incentive • Schema flexibility  Can we confirm/reject those from looking at the data? A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 10. 10 A Diachronic Perspective  Versions of schema.org are archived over time • Plus: there are several crawl releases per year • i.e., we can look at change over time  If we look at both schema and deployed data, we may observe • Adoption rates of schema changes • Data-first changes to the schema • Convergence or divergence of deployed data A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 11. 11 A Diachronic Perspective  Three releases of WDC Microdata corpus [1] • 2012, 2013, and 2014  Versions of schema.org that were valid • At the beginning of the crawl • At the end of the crawl A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 [1] http://webdatacommons.org/structureddata
  • 12. 12 Top-down Adoption  How fast are changes in the schema adopted? • New classes/properties • Deprecations • Domain/range changes  Measuring adoption: challenges • Different crawls • Overall growth of deployed schema.org  Measure: normalized usage increase (nui) from i to j: • nui(s)>1.05: usage of schema element s has increased significantly • nui(s)<0.95: usage of schema element s has decreased significantly A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 13. 13 Top-down Adoption  Adoption of new classes and properties • Almost half of all introduced classes are never used! • Similar for new properties  Reasons • Bulk-addition of vocabularies • not every term is equally needed • e.g., medical vocabulary • Blind spot of our approach • some terms are mainly for e-mail markup • e.g., Actions A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 SURPRISE!
  • 14. 14 Top-down Adoption  Main domains of positive adoption • Meta data for web content (schema.org/Website has the highest nui) • Broadcasting (e.g., TV Episodes) • Questions & Answers • Postal addresses  Classes featured in Google Rich Snippets • Still growth on high level (tens of thousands of data providers) • But nui(s)<0.95 A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Yellow Pages Search Engine Listings Collaboration with BBC and EBU Influence of CMS adoption Q&A Pages, such as Stackoverflow
  • 15. 15 Top-down Adoption  Adoption of domain/range changes • Again: rather low overall adoption  Adopted well for • Products (height, width, itemCondition, …) • Broadcasting domain (episode, actor, ...) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Search Engine Listings Collaboration with BBC and EBU
  • 16. 16 Top-down Adoption  Adoption of deprecations • Works well (29 out of 32 have a significantly low nui)  Exceptions • s:map (← s:hasMap) • s:maps (← s:hasMap) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 For Google Maps (lots of outdated tutorials)
  • 17. 17 Bottom-up Evolution  Martin Luther • Started the protestant church • A success story, too (like schema.org) • (i.e., 800 million adopters worldwide)  Famous quote: • “Man muss […] dem gemeinen Mann aufs Maul schauen” • (roughly: “You have to listen to the way the common man really speaks.”) A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Martin Luther, 1483-1546 Disclaimer: I do not speak for the protestant church.
  • 18. 18 Bottom-up Evolution  Are new features in the schema first used “inofficially”? • New classes/properties • Domain/range changes  Instrument for measurement: ROC curves • True positives mapped against false positives • tp: elements used before • fp: elements not used before • Ranking by #PLDs A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 19. 19 Bottom-up Evolution  There are some mild influences observable • Stronger for domain/range changes • especially range changes • Weaker for new classes/properties A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 2012→ 2013 2013→ 2014 2012→ 2014 classes properties domains ranges
  • 20. 20 Bottom-up Evolution  Extension mechanism • Allows for user-defined classes/properties • Those become subclasses implicitly  Analysis over time • No measurable impact on standard evolution • “Inofficial” use is likelier than use of extension mechanism A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 s:Product/ElectronicProduct s:price/reducedPrice
  • 21. 21 Overall Convergence  Measuring convergence • i.e., homogeneity of descriptions of classes • Example: two instances of s:LocalBusiness A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 _:1 _:2 “Birmingham” “Main Street 24” s:LocalBusiness s:PostalAddress _:1 _:2 “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress
  • 22. 22 Overall Convergence  Recap • RDF from Microdata is a set of trees • i.e., we can enumerate all paths to leaf nodes (omitting literals)  Example: A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 _:1 _:2 “Liverpool” “Church Street 1” s:LocalBusiness s:PostalAddress rdf:type-s:LocalBusiness, s:address-rdf:type-s:PostalAddress, s:address-s:addressLocality, s:address-s:streetAddress
  • 23. 23 Overall Convergence  Using all paths, we can compute the entropy for each class as  A low entropy refers to a high homogeneity  We normalize both by maximum entropy and the total number of paths • i.e., we use normalized entropy rate as a measure for homogeneity A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 24. 24 Overall Convergence  Observations • Overall entropy decreases over time  Classes with high convergence rates • WebSite, Blog, … • Hotel, Restaurant, … • Product, Offer, … • Rating, Review A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Influence of CMS adoption Yellow pages Google Rich Snippets ...all of the above
  • 25. 25 Key Adoption Drivers  Search Engine Optimization • Web site providers want to be high in Google rankings • Direct business incentive!  Tool adoption • Major CMSs use schema.org  Standard Agility • schema.org: 25 revisions in last three years • cf. FOAF: six revisions in last eight years A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 26. 26 Summary  Both ways, top-down and bottom-up adoptions can be observed  Homogeneity of deployed schema increase over time  Described empirical data-driven study reveals valuable insights to understand how and why schema.org is a success story  Observed key drivers and obstacles can also help to understand and analysis adoption of other standards, e.g. LOD  More fine-grained insights might be revealed when extending the analysis corpus to the mailing list archive and issue tracker A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015
  • 27. 27 Thank you! Questions? Feedback? Raw data can be found on the website of WebDataCommons: http://webdatacommons.org/structureddata/ More interesting datasets and analysis: http://webdatacommons.org/index.html A Web-scale Study of the Adoption and Evolution of the schema.org Vocabulary over Time - WIMS 2015 Acknowledgement The extraction and analysis of the datasets was supported by AWS in Education Grant.