The document discusses the current state and future of the Semantic Web and linked data initiatives. It notes several successes such as the Linked Open Data cloud and schemas like Schema.org and GoodRelations. However, it argues that the original vision of the Semantic Web, which aimed to allow computers to help process information by applying structured data standards at web scale, has not fully been realized. Schemas like Schema.org focus more on information extraction than direct data consumption. The document calls for challenging assumptions through empirical analysis rather than ideological debates.
myOntology: Community-driven Vocabulary Design and Maintenance for E-Commerce
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
1. The Semantic Web –
A Vision Come True, or Giving Up
the Great Plan?
Martin Hepp, @mfhepp
mfhepp@gmail.com
2. Semantic Web: A Decade of Achievement?
• Linked Open Data Cloud
• Schema.org
• Google Knowledge Graph
• Bing Sartori
• Linked Data in Libraries
• Linked Data in Public Data Initiatives
• Etc.
Semantic Web and Linked Data Success Stories
http://www.heppresearch.com2
3. The LOD Cloud
A hard-wired, small-scale data integration project with no quality of service
guarantees.
http://www.heppresearch.com3
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/
4. Web Data Commons
A pretty outdated RDF representation of information extracted from a biased
sample of popular Web pages, missing a lot of data in deep detail pages.
http://www.heppresearch.com4
2015-04-02: RDFa, Microdata, and Microformat data sets extracted from the
December 2014 Common Crawl corpus available for download.
5. The Old Testament of the Semantic Web
http://www.heppresearch.com5
Mostly WHAT a better Web should allow
§ Computers should be able to help us
process information from the Web
6. The New Testament of the Semantic Web
http://www.heppresearch.com6
Detailed technical assumptions about the HOW
§ Widely driven by applying principles
from small-scale, controlled settings
to the Web.
§ Need for extensions of old paradigms
acknowledged.
§ But fundamental question of match
between paradigms and ecosystem
largely unchallenged.
7. The Modern Sects and their Cults
http://www.heppresearch.com7
Turned assumptions and drafts into laws
§ Linked Data Principles
– URIs over strings
§ Entity identifiers
§ Qualitative values
(enumerations)
– Page vs. Entity / Conneg /
Redirects
– Open Licenses
§ SPARQL endpoints
§ Reuse visible content in RDFa and
Microdata Berner-Lee, Tim: Linked Data,
http://www.w3.org/DesignIssues/LinkedData.html
8. An now they fight a useless war over the details of their
interpretation…
http://www.heppresearch.com8
3rd Commandment: Thou shalt not make unto thee any graven image
§ Exodus 20:4-6
§ Minimal ontological commitment, folks!
§ Occam's razor
§ Ludwig Wittgenstein: Tractatus Logico-Philosophicus:
– “Occam's Razor is, of course, not an arbitrary rule nor one justified by its practical success. It
simply says that unnecessary elements in a symbolism mean nothing. Signs which serve one
purpose are logically equivalent; signs which serve no purpose are logically meaningless.” (*)
Image Credit: PD, https://en.wikipedia.org/?title=Crusades#/media/File:Albigensian_Crusade_01.jpg
(*) Taken from https://en.wikipedia.org/wiki/Occam's_razor#Ludwig_Wittgenstein
9. What is schema.org? What is GoodRelations?
1. Official Characterization
2. Purpose:
§ Focus on information extraction on the Web
§ Other uses as a by-product
3. Knowledge Representation Perspective
§ Entity Types
§ Relationship Types
§ Weak Domain / Range Semantics
§ Syntax-independent Meta-Model
And how are they related?
Questions? Suggestions? Contact me at @mfhepp!9
10. Official Characterization from http://schema.org
Questions? Suggestions? Contact me at @mfhepp!10
This site provides a collection of schemas that webmasters can use
to markup HTML pages in ways recognized by major
search providers, and that can also be used for
structured data interoperability (e.g. in JSON). Search
engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the
display of search results, making it easier for people to find the right Web pages.
Many sites are generated from structured data, which is often stored in
databases. When this data is formatted into HTML, it becomes very difficult to
recover the original structured data. Many applications, especially
search engines, can benefit greatly from direct access to this structured
data. On-page markup enables search engines to understand the information on web
pages and provide richer search results in order to make it easier for users to find
relevant information on the web. Markup can also enable new tools
and applications that make use of the structure.
11. Overview and Motivation: There is REAL Momentum
Questions? Suggestions? Contact me at @mfhepp!11
A lot of data
§ Since 2011, schema.org has been added to >25% of top-ranked e-
commerce sites product detail pages.
§ RDF-based representations are specified.
Table: Random sample of n=73 product detail pages from high-ranking Google results.
Note that these numbers have a strong bias towards popular, professionally operated sites.
12. Schema.org: A Data Publication Ontology
Questions? Suggestions? Contact me at @mfhepp!12
Not designed for raw data consumption (only as a by-product)
§ Historically, ontologies in computer
science aimed at harmonizing the
conceptualization and representation
of data for publishers and consumers
of the data.
§ Implicit goal of the traditional Semantic
Web stack: More or less, consumption
of raw data.
§ This requires detailed consensus on
the level of data granularity and data
semantics at scale, and high data
quality.
§ Schema.org does not make this
assumption, since its sponsors have
the power to work on semi-structured
data at Web scale.
Ontologyschema.org
13. Schema.org: The Semantic Web Vision Come True?
1. No OWL. Not even an ontology in the narrow sense.
2. Direct consumption difficult
§ Crawling
§ Cleansing
§ Lifting
3. No broad use of Linked Data principles
§ Mostly no global entity identifiers
§ Page = Entity (vs. httpRange-14)
§ No vocabulary reuse (*)
Likely not what the Semantic Web community had hoped for.
Questions? Suggestions? Contact me at @mfhepp!13
14. Web Ontology Engineering Patterns
1. Dynamic Degree of Disambiguation
2. Dynamic Data Granularity
3. Sweet Spots Rule
§ Distinctions that can be populated reliably and with little
effort
§ Distinctions that are hard to reconstruct by the recipient
Hepp (2015, forthcoming)
http://www.heppresearch.com14
15. The Fallacy of Raw Consumption of Web Data
http://www.heppresearch.com15
Naïve Type Membership Interpretation: SPARQL
# Find former STI members who are professors
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT * {?s a dbpedia-owl:Professor} LIMIT 100
16. Naïve Type Membership Interpretation: SPARQL
http://www.heppresearch.com17
Find all professors from Web markup
<html prefix="schema: http://schema.org/!
dbpedia: http://dbpedia.org/ontology/">!
<!-- .. -->!
<div typeOf="schema:Person dbpedia:Professor" about="#person">!
<span property="schema:honorificPrefix">Prof. Dr.</span> !
<span property="schema:givenName">Zaphod</span>!
<span property="schema:familyname">Beeblebrox</span>!
</div>!
</html>
17. Type Membership as a Machine Learning Problem
http://www.heppresearch.com18
Supervised Learning: Logistic Regression
§ Input:
– Entity e
– Type t
– Origin (Graph / Domain / URI) o
– Optional: Properties and property values [(p1,v1), (p2,v2),…]
§ Output
– t’(e) = f(e, t, o)
– p(t(e) == True)
Example data:
(http://www.acme.org/, …#person, http://schema.org/EducationEvent)
(http://munich.eventful.com/, …#event1, http://schema.org/MusicEvent)
Hepp (2015b, forthcoming)
18. Let’s Do Science, not Cult!
http://www.heppresearch.com19
§ Challenge paradigms and
approaches
§ Use hard data, not beliefs and
assumptions (neither your own ones
nor the ones inherited from the old
folks)
CC BY-SA 3.0 / Nicor / https://en.wikipedia.org/wiki/North_Korea's_cult_of_personality#/
media/File:Mansudae_Grand_Monument_08.JPG
19. Thank you.
http://www.heppresearch.com20
HEPP RESEARCH GmbH
Prof. Dr. Martin Hepp, CEO
Contact us!
Kuppelnaustrasse 5
88212 Ravensburg, Germany
Phone +49 751 2708 5256-0
Fax +49 751 2708 5256-9
www.heppresearch.com
contact@heppresearch.com