Invited Talk at Summer School on Semantic Web, Bertinoro, 2015
Abstract:
Two decades ago one has discussed how to build seamless digital workflows
such that the medium for data in a workflow would not switch between paper, fax, phone,
and digital, because each transcription from one to another medium would
be laborious and cost-inefficient. Thus, the issue was avoiding *medium discontinuities*.
Today, we have all-digital data workflows, but we have still plenty of *semantic discontinuities*.
In this talk, I want first to describe reasons for this discontinuities including: autonomy of
data providers, need for agility and flexibility, or decentralized organizations in
the world-wide data spaces.
Then I want to describe several semantics discontinuities and some efforts to
ameliorate them by:
1. Semantic programming (Horizontal workflow paradigm)
2. Core ontologies (Vertical workflow paradigm)
3. Semantic data production and consumption (Sticky semantics)
1. Steffen Staab Seamless Semantics 1Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Web and Internet Science Group · ECS · University of Southampton, UK &
Seamless Semantics:
Avoiding Semantic Discontinuity
Steffen Staab
University of Southampton
&
Universität Koblenz-Landau
2. Steffen Staab Seamless Semantics 2
„I have bought myself a 30‘‘ screen,
because half of my work is re-typing existing
material.“
Why do we have
to re-type at all?
A professorial colleague:
3. Steffen Staab Seamless Semantics 3
Why do we have to retype at all?
Examples:
• CVs
• Bibliographies
• Visa entry forms
• Adresses
• Purchase orders
• ...
Reasons:
• No semantics
• Other semantics
• Other formats
• Other importance
ranking
• (partial)
incompleteness
6. Steffen Staab Seamless Semantics 6
Why do we know the „how“
(see talk by Maria-Esther & Axel)
but not the „what“?
7. Steffen Staab Seamless Semantics 7
Traditional Information System
Business
Logics
Structured Data
Unstructured
Data
Presentation and
Interaction
Charakteristics:
• Processes known
• Data structures
known
• Meaning of data in
schema and
implicit in code
8. Steffen Staab Seamless Semantics 8
Information ecosystems nowadays
Examples
• Open Data
• 1000s
DBs/company
• Ad-hoc data
Characteristics
• Some structure
• Late structure
• Social context
• Meaning of data
most important
9. Steffen Staab Seamless Semantics 9
How does data receive meaning?
Explicit:
• Formal schema/ontology
– By someone else?
Implicit:
• Names are just used for
describing
Social:
• Communities converge
– By discussion
– By emergence
Meaning?
10. Steffen Staab Seamless Semantics 10
Talking many languages...
Sub languages
• For consumer
– title,...
• Global retailers
– barcode
• US food industry
– serving size, calories,...
• Producer
– batch number
...
https://www.youtube.com/watch?v=ga1aSJXCFe0
Depending on who you are – you encounter the
(un)expected, the (un)known, the (un)understandable,...
11. Steffen Staab Seamless Semantics 11
The Unknown
https://www.flickr.com/photos/wrobel/81759
02444/
12. Steffen Staab Seamless Semantics 12
We know a bit.... 1. URIs as
identifiers
2. http
lookup
3. RDF
(triples)
4. relations, also
to other locations
13. Steffen Staab Seamless Semantics 13
What is/should Linked Data good for?
• Data integration is (relatively) easy
– Migrating different data sources to
linked data is (relatively) easy
• Late schema is easy
– Just add some more fields
• Ignoring data is easy
– Think of crisps
• Serendipitous use
– Discover new information &
new sources by following links
• Data repurposing / pointing
– Use what others have done at both schema
and data level
14. Steffen Staab Seamless Semantics 14
Issue: From Data Publishing to Understanding
De-contextualization Re-contextualization
Publishing data the structure of which you know is
easier than understanding what you do not know
15. Steffen Staab Seamless Semantics 15
1. Reducing language friction
2. Reducing re-use friction
1. Reducing information loss
Agenda
20. Steffen Staab Seamless Semantics 21
2 Modularization
Multimedia (@WeST)
• FOAF
• F event ontology
• COMM
• ..
Sensors (@Galway)
• SSN ontology
• COMM
• F
Italian
Spanish
French
[Scherp et al]
[Leggieri et al]
21. Steffen Staab Seamless Semantics 22
2 Pattern as Micro-Module for Image Tagging
[Scherp&Saathoff, WWW-2010]
[Troncy et al 2007]
22. Steffen Staab Seamless Semantics 23
3 Understanding via generalization
Fracture of Femur Fracture of bone
Femur is bone in your
upper leg
23. Steffen Staab Seamless Semantics 24
3 Generalization/Specialization
DOLCE
Ontology
of Plans
Core
Software Ontology
Core Ontology
of Web Services
Core Ontology of
Software Components
specificity
genericcore
reused
ontology modules
Ontology of
Information Objects
Descriptions
& Situations
contributed
ontology modules
http://cos.ontoware.org
28. Steffen Staab Seamless Semantics 29
4 OntoMDE Workflow
Model of Ontologies (MoOn)
Adding declarative layer:
Structuring the ontologies into
semantic units
Ontology API Model (OAM)
Adding declarative layer:
Structuring pragmatic units specifying how entities are to be
used together
30. Steffen Staab Seamless Semantics 31
Example scenario: Jamendo
Data about license free music
• ~ 1 Million triples
• classes and predicates
from 18 different ontologies
– FOAF, Tag ontology,
music ontology, …
Simple programming task:
• List for every music artist,
all the records they made
31. Steffen Staab Seamless Semantics 32
Software Development Process Overview
data model
design
revised data
model design
data model
prototype
data
queries
final data
model
Creation of
initial data
model
Exploration
of the data
source
Creation of
model in
code
Query design /
implementation
Mapping
of query
results
33. Steffen Staab Seamless Semantics 34
From artists to songs
Observations
• SPARQL queries are strings
• Results are strings
• Requires good understanding of the data source
RDF Typing is lost
34. Steffen Staab Seamless Semantics 35
Programming Language Support for RDF Access
Static Typing
Errors detected before
execution
Misspelling discovered
by compiler!
Anectode: 2nd place
because of misspelt code
Static types are form of
documentation
Less knowledge about
data source required
Better IDE integration /
autocompletion
Code generation
• Sommer
• Winter
• OntoMDE
Dynamic Typing
E.g. ActiveRDF
(Oren et al 2007))
“convention over
configuration”
dynamic
metaprogramming
allows for slick code
36. Steffen Staab Seamless Semantics 37
c1
Programming with Linked Data
Tasks of the Programmer
1 Schema exploration
2 Programming
code types
3 Programming queries
4 Programming procedures
for
• creating,
• manipulating,
• persisting
objects
37. Steffen Staab Seamless Semantics 38
Node Path Query Language Using Autocompletion
Exploration of classes
38. Steffen Staab Seamless Semantics 39
Node Path Query Language Using Autocompletion
Exploration of classes
Exploration of relations
39. Steffen Staab Seamless Semantics 40
Node Path Query Language: Query Formulation
Exploration of classes
Exploration of relations
Querying for instances
Type
set of mo:MusicArtist
No definition or
declaration needed
40. Steffen Staab Seamless Semantics 41
Node Path Query Language for Code Development
Exploration of classes
Exploration of relations
Querying for instances
Developing code with queries
All translated into SPARQL queries at
• Development time
• Type inference at compile time
(but also as part of IDE)
• Querying again at run time
One language to bind them all
41. Steffen Staab Seamless Semantics 42
Node Path Query Language for Code Development
Exploration of classes
Exploration of relations
Querying for instances
Developing code with queries
Developing code with new classes
All translated into SPARQL queries at
• Development time
• Run time update
• Persistence!
42. Steffen Staab Seamless Semantics 43
NPQL
NPQL (Node Path Query Language)
• Intensional Queries
Describing RDF classes and properties
for reuse in IDE and in host language
metaprogramming
• Extensional Queries
Class instances and property instances
• Compilation to SPARQL for reuse of existing
endpoints
Ongoing discussion
about details of NPQL
43. Steffen Staab Seamless Semantics 44
LITEQ
NPQL (Node Path Query Language)
• Intensional Queries
• Extensional Queries
• Compilation to SPARQL
LITEQ (Language Integrated Types, Extensions and Queries)
• Implementation of NPQL as F# Type Provider in Visual Studio
• Autocompletion using NPQL queries
• Automatic typing
of extensional query results
by intensional queries
44. Steffen Staab Seamless Semantics 45
Outlook: Programming with Linked Data
• More expressive query languages
– Derived data types in tractable description logics!
• More precise combined type inference
– (derived) type from data source
– type inference in programming language
• Programming across data sources
– Federated queries
– Linktraversal-based queries (the unknown sources)
• Integration of schema induction
– Low quality of schema/ontologies
• Improved autocompletion
46. Steffen Staab Seamless Semantics 47
Issue: From Data Publishing to
Unknown Data Understanding
Cognition
Storytelling
Pragmatics
Ontology Patterns
Conceptual Modeling
Metamodels
...
Quantity
Pertinence
Manner
47. Steffen Staab Seamless Semantics 48
What is missing?
...a lot...
• Indexing
• Search
• Data and schema quality
• Pragmatics
• ...
48. Steffen Staab Seamless Semantics 49
Semantic
Web
Social Web &
Web Retrieval
Interactive Web &
Human Computing
Web &
Economy
Software &
Services
Computational
Social Science
Thank You!