Linked Data in Production: Moving Beyond Ontologies
On the need for a W3C community group on RDF Stream Processing
1. On the need for a W3C
community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/
2. Disclaimer…
This presentation expresses my view but not necessarily the one from
the rest of the group (although I hope that it is similar)
<<Texto libre: proyecto, speaker, etc.>>
2
3. Acknowledgements
• All those that I have “stolen” slides, material and
ideas from
•
•
•
•
•
Emanuele Della Valle
Daniele Dell’Aglio
Marco Balduini
Jean Paul Calbimonte
And many others who
have already started
contributing…
<<Texto libre: proyecto, speaker, etc.>>
3
4. Why setting up a community group?
In RDF Stream models
(timestamps, events, time
intervals, triple-based, graph-based …)
In RDF Stream query languages
(windows, stream selection,
CEP-based operators, …)
Heterogeneity
In implementations
(RDF native, query rewriting,
continuous query registration,
scalability, static vs streaming data…)
<<Texto libre: proyecto, speaker, etc.>>
4
In operational semantics
(tick, window content, report)
5. You may think that we do not like heterogeneity…
<<Texto libre: proyecto, speaker, etc.>>
5
6. But at least I love it…
• However, we need to tell people what to expect with
each system, and smooth differences when they
are not crucial……
<<Texto libre: proyecto, speaker, etc.>>
6
7. The solution…
• Let’s create a W3C community group…
•
•
•
•
•
To understand better those differences
The requirements on which we are based
And explain to others
…
And maybe get some “recommendation” out
<<Texto libre: proyecto, speaker, etc.>>
7
8. The W3C RDF Stream Processing Comm. Group
• http://www.w3.org/community/rsp/
<<Texto libre: proyecto, speaker, etc.>>
8
9. W3C RSP Community Group mission
“The mission of the RDF Stream Processing
Community Group (RSP) is to define a common model
for producing, transmitting and continuously querying
RDF Streams. This includes extensions to both RDF
and SPARQL for representing streaming data, as well
as their semantics. Moreover this work envisions an
ecosystem of streaming and static RDF data sources
whose data can be combined through standard models,
languages and protocols. Complementary to related
work in the area of databases, this Community Group
looks at the dynamic properties of graph-based data,
i.e., graphs that are produced over time and which may
change their shape and data over time.”
<<Texto libre: proyecto, speaker, etc.>>
9
10. Use cases
• We have started collecting them
• And I hope that by the end of my talk you will
consider contributing some more…
<<Texto libre: proyecto, speaker, etc.>>
10
11. A template to describe use cases (I)
•
Streaming Information
•
•
•
•
•
•
Type: Environmental data: temperatures, pressures, salinity, acidity, fluid
velocities etc,
Nature:
• Relational Stream: yes
• Text stream: no
Origin: Data is produced by sensors in oil wells and on oil and gas
platforms equipments. Each oil platform has an average of 400.000.
Frequency of update:
• from sub-second to minutes
• In triples/minute: [10000-10] t/min
Quality: It varies, due to instrument/sensor issues
Management /access
• Technology in use: Dedicated (relational and proprietary) stores
• Problems: The ability of users to access data from different sources is
limited by an insufficient description of the context
• Means of improvement: Add context (metadata) to the data so it
become meaningful and use reasoning techniques to process that
metadata
<<Texto libre: proyecto, speaker, etc.>>
11
12. A template to describe use cases (II)
•
[optional] Static Information required to interpret the streaming
information
•
•
•
•
•
Type: Topology of the sensor network, position of each sensor, the
descriptions of the oil platform
Origin: Oil and gas production operations
Dimension:
• 100s of MB as PostGIS dump
• In triples: 10^8
Quality: Good
Management / access
• Technology in use: RDBMS, proprietary technologies
• Available Ontologies and Vocabularies: Reference Semantic Model
(RSM), based on ISO 15926
<<Texto libre: proyecto, speaker, etc.>>
12
13. A tale of four heterogeneities
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/
15. What is an RDF stream?
• Several possibilities:
• An RDF stream is an infinite sequence of timestamped
events (triples or graphs), where timestamps are nondecreasing
…
<eventi,ti >
<eventi+1,ti+1 >
<eventi+2,ti+2 >
…
• An RDF stream is an infinite sequence of triple occurrences
<<s,p,o>,tα,tω> where <s,p,o> is an RDF triple and tα and tω
are the start and end of the interval
• How are timestamps assigned?
16. Some examples…
• What would be the best/possible RDF stream
representation for the following types of problems?
• Does Alice meet Bob before Carl?
• Who does Carl meet first?
:alice :isWith :bob
:alice :isWith :carl
e1
:diana :isWith :carl
:bob :isWith :diana
e2
e3
e4
• How many people has Alice met in the last 5m?
• Does Diana meet Bob and then Carl within 5m?
1
3
6
9
t
• Which are the meetings the last less than 5m?
• Which are the meetings with conflicts?
:alice :isWith :bob
:alice :isWith :carl
:bob :isWith :diana
:diana :isWith :carl
e4
e2
e1
<<Texto libre: proyecto, speaker, etc.>>
e3
16
17. Data types for semantic streams - Summary
•
Multiple notions of RDF stream proposed
• Ordered sequence (implicit timestamp)
• One timestamp per triple (point in time semantics)
• Two timestamps per triple (interval base semantics)
•
Comparison between existing approaches
System
Time model
# of timestamps
INSTANS
triple
Implicit
0
C-SPARQL
triple
Point in time
1
SPARQLstream
triple
Point in time
1
CQELS
triple
Point in time
1
Sparkwave
triple
Point in time
1
Streaming Linked Data
RDF graph
Point in time
1
ETALIS
•
Data item
triple
Interval
2
More investigation is required to agree on an RDF stream model
17
19. Existing RDF Stream Processing systems
• C-SPARQL: RDF Store + Stream processor
• Combined architecture
C-SPARQL
query
sta
translator
tic
stre
amin
RDF Store
g
Stream
processor
continuous
results
• CQELS: Implemented from scratch. Focus on performance
• Native + adaptive joins for static-data and streaming data
CQELS
query
Native RSP
continuous
results
• CQELS-Cloud: Reusing Storm
• Paper presentation on Thursday
CQELS
query
Storm
topology
continuous
results
20. Existing RSP systems
• EP-SPARQL: Complex-event detection
• SEQ, EQUALS operators
EP-SPARQL
query
translator
Prolog
engine
continuous
results
• SPARQLStream: Ontology-based stream query
answering
• Virtual RDF views, using R2RML mappings
• SPARQL stream queries over the original data streams.
SPARQLStream
query
rewriter
DSMS/CEP
R2RML mappings
• Instans: RETE-based evaluation
continuous
results
21. Query languages for semantic streams - Summary
• Different architectural choices
• It is not clear when each choice is best for which type of use
case
• Wrappers over existing systems
• C-SPARQL, ETALIS, SPARQLstream , CQELS-Cloud
• Better reliability and maintainability?
• Native implementations
• CQELS, Streaming Linked Data, INSTANS
• Better scalability: optimizations that are not possible
in other systems
• Different operational semantics
• See later
21
23. Querying data streams (from CQL to SPARQL-X)
stream-to-relation (S2R)
Relation
s
Streams
infinite
unbounded
bag
…
<s,τ>
…
relation-to-relation (R2R)
relation-to-stream (R2S)
Stream
<s1>
<s2>
<s3>
finite
bag
Relati on R(t)
Mapping: T R
S2R Window operators
RDF
Streams
SPARQL operators
RDF
R2S operators
24. Output: relation
• Case 1: the output is a set of timestamped mappings
a … ?b… [t1]
a … ?b…
SELECT ?a ?b …
FROM ….
WHERE ….
queries
CONSTRUCT {?a :prop ?b }
FROM ….
WHERE ….
a … ?b… [t3]
a … ?b… [t5]
RS
P
a … ?b… [t7]
bindings
<… :prop … > [t1]
<… :prop … >
<… :prop … > [t3]
<… :prop … > [t5]
<… :prop … > [t7]
triples
25. Output: stream
• Case 2: the output is a stream
• R2S operators
CONSTRUCT RSTREAM {?a :prop ?b }
FROM ….
WHERE ….
query
RS
P
stream
…
<… :prop … > [t1]
<… :prop … > [t1]
<… :prop … > [t3]
<… :prop … > [t5]
< …:prop … > [t7]
…
ISTREAM: stream out data in the last step that wasn’t on the previous step
DSTREAM: stream out data in the previous step that isn’t in the last step
RSTREAM: stream out all data in the last step
26. Other operators
• Sequence operators and CEP world
e4
S
e1
e2
e3
1
3
6
Sequence
9
Simultaneous
SEQ: joins eti,tf and e’ti’,tf’ if e’ occurs after e
EQUALS: joins eti,tf and e’ti’,tf’ if they occur simultaneously
OPTIONALSEQ, OPTIONALEQUALS: Optional join variants
27. Query languages for semantic streams - Summary
•
Comparison between existing approaches
System
S2R
R2R
Time-aware
R2S
INSTANS
Based on
time events
SPARQL
update
Based on time events
Ins only
C-SPARQL
Engine
Logical and
triple-based
SPARQL 1.1
query
timestamp function
Batch only
SPARQLstream
Logical and
triple-based
SPARQL 1.1
query
no
Ins, batch,
del
CQELS
Logical and
triple-based
SPARQL 1.1
query
no
Ins only
Sparkwave
Logical
SPARQL 1.0
no
Ins only
Streaming Linked
Data
Logical and
graph-based
SPARQL 1.1
no
Batch only
ETALIS
no
SPARQL 1.0
• Is it time to converge on a
27
SEQ, PAR, AND, OR,
DURING, STARTS,
standard? NOT,
EQUALS,
MEETS, FINISHES
Ins only
28. Query languages for semantic streams - Issues
• Different syntax for S2R operator
• Semantics of query languages is similar, but not
identical
• Lack of R2S operator in some cases
• Different support for time-aware operators
28
31. Operational Semantics
Where are both alice and bob in the last 5s?
hall
:hall
sIn :
:i
isIn
e
:
:alic
:bob
S
e
:alic
hen
:kitc
:isIn
S1
S2
S3
S4
1
3
6
:bob
hen
:kitc
:isIn
9
System 1:
System 2:
:hall [5]
:hall [3]
t
:kitchen [10]
:kitchen [9]
Both correct?
ISWC 2013 evaluation track for "On Correctness in RDF stream
processor benchmarking" by Daniele Dell’Aglio, Jean-Paul
Calbimonte, Marco Balduini, Oscar Corcho and Emanuele Della Valle
33. Next steps in the community group…
• Agree on an RDF model?
•
•
•
•
Metamodel?
Timestamps in graphs?
Timestamp intervals
Compatibility with normal (static) RDF
• Additional operators for SPARQL?
• Windows (not only time based?)
• CEP operators
• Semantics
• Go Web
• Volatile URIs
• Serialization: terse, compact
• Protocols: HTTP, Websockets?
34. On the need for a W3C
community group on RDF
Stream Processing
ISWC2013 Workshop on Ordering and Reasoning,
Sydney, 22/10/2013
Oscar Corcho
ocorcho@fi.upm.es, ocorcho@localidata.com
@ocorcho
http://www.slideshare.net/ocorcho/