This document discusses order-aware reasoning over streaming data. It presents three use cases that require reasoning over ordered data streams: space situational awareness, jet engine design, and intelligent surveillance. These applications share features like ordered, streaming data and the need for immediate, inferred answers over massive amounts of data. The document surveys different approaches to handling ordered data and continuous queries, including data stream management systems and complex event processing. It argues that a fully order-aware approach is needed to optimize performance on queries over ordered data.
2. Acknowledges
§ This talk presents the content of a joint paper with
Stefan Schlobachb, Markus Krötzschc, Alessandro Bozzona,
Stefano Ceria, and Ian Horrocksc to appear on SWJ
a Politecnico di Milano
b Vrije Universiteit Amsterdam
c Univerity of Oxford
§ I also want to thank Frank van Harmelenb for his important
contribution to the discussion, Tony Lee (Saltlux), Andreas
Schreiber (DLR) and Achim Basermann (DLR) for the valuable
discussion on concrete examples of problems that require order-
aware reasoning. Moreover I want to thank Sara Magliacaneb
for her work on SPARQL-RANK and the slides I use in this
presentation, and Marco Balduinia, Davide Barbieria, and
Daniele Bragaa for their work on C-SPARQL
§ Check out the paper:
• http://www.semantic-web-journal.net/content/order-matters-
harnessing-world-orderings-reasoning-over-massive-data
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
3. References
§ The numbers in square brackets refers to references
in the SWJ paper
• http://www.semantic-web-journal.net/content/order-
matters-harnessing-world-orderings-reasoning-over-
massive-data
§ A short selection of references to my papers is
available in the end of the presentation.
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
4. The problem, three use cases, and …
§ More and more applications require real-time
processing of massive, dynamically generated, data
Space Situational Jet Engine Intelligent
Awareness Design Surveillance
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
5. The Problem
Use case: space junk
[source http://wordlesstech.com/2011/03/26/space-junk/ ]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 5
6. The Problem
Use case: jet engine design
[Source: http://www.sae.org/mags/aem/10018/ ]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 6
7. The Problem
Use case: intelligent surveillance
[Source: http://youtu.be/I3iDBfB_ZC0 ]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 7
8. The Problem
… and four common features!
§ their data is ordered,
• naturally ordered by recency, proximity, etc.
• intrinsically ordered by precision, popularity, provenance,
certainty, trust, etc.
• and, in any case, it is explicitly sortable through attribute
values
§ the answers are also required to come
in an ordered fashion
• engineers surveying a satellite orbit need to know the largest
pieces of debris in closest proximity with maximal certainty,
measured with highest precision, etc.
§ they require immediate answers at runtime
• flight paths have to be adapted once an object in collision
course is detected
§ and, they require inference
• rich ontological models describing complex domain
knowledge is often used to pose the queries and to interpret
the results
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
9. The Problem
Performance targets
Answer
Target
quality at
time t
Fully correct
answers
Desired situation
Current situation
Computation
Time t
Real-time Max runtime
behaviour
Note: completeness may not be necessary if all relevant answers are found
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 9
10. The Problem
A running example
§ Imagine a system which
• listens to all micro-posts that are published,
• knows the geographic location of social media
users,
• has the ability of detecting the topic of each micro-
post, and
• has modelled relationships between topics in an
expressive ontological language
§ Let suppose that each of us asks a query like
the following to such a system:
• Which users of social media, currently leading
popular discussions on fashion-related topics, are
closest to my current location? What are they
saying about the shopping district nearby?
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
11. The solution space
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Types of
Approximation
reasoning
and
parallelisation
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 11
12. The solution space
no ordering, no reasoning
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 12
13. The solution space
no ordering, no reasoning
§ Most of the big data solutions currently
on the market
• BSP (Bulk Synchronous Parallel)
• PRAM (Parallel Random Access Machine)
• PGAS (Partitioned Global Access Space)
• Map-Reduce implementations
• and data-centric workflow systems based on them
§ Some (e.g., Hive and Pig) allow the specification of
ordering constraints, but no specific optimisation is
provided for top-k or streaming queries
§ W.r.t. the running example
• Right performances and scalability
• Limited ability to harnessing orderings
• Missing inference capability
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
14. The solution space
Order aware data management
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Cheap to enforce
Natural
No ordering
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 14
15. The solution space
Order aware data management
§ When treating massive data order matters!
Data
as
a
where
we
can
e.g.,
order
by
sortable
en,ty
enforce
orderings
• sortable
literals
easily
and
logically
• popularity
• uncertainty
• trust
Most
relevant
streaming
answers
first
algorithms
§ If N is the size of the input, a problem is considered to be
“well- solved” if a streaming algorithm exists which
requires at most O(poly(log(N)) space and time [31]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
16. The solution space
Order aware data management and approximation
§ approximate, streaming algorithms can outperform
classical, data-bound approaches to this problem by
several orders of magnitude [6,14].
§ Such approximations can be asymptotic, so that
arbitrary accuracy can be achieved [6].
Answer
accuracy at Fully correct answers
computation
time t
Computation Time t
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
17. The solution space
Harnessing natural orderings
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 17
18. The solution space
Harnessing natural orderings
§ Continuous queries registered over streams that, in most of
the cases, are observed trough windows
window
input streams Registered
streams of answer
(unbound, and Con,nuous
time-varying) Query
§ Assumption: the recent information being more relevant as it describes
the current state of a dynamic system
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 18
19. The solution space
Harnessing natural orderings
§ The nature of streams requires a
paradigmatic change*
• from persistent data
– to be stored and queried on demand
– a.k.a. one time semantics
• to transient data
– to be consumed on the fly by continuous queries
– a.k.a. continuous semantics
* This paradigmatic change first arose in DB community [31]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
20. The solution space
Harnessing natural orderings
§ Two types of solutions
• Data Stream Management Systems (DSMS)
• Complex Event Processors (CEP)
§ Research Prototypes
• Amazon/Cougar (Cornell) – sensors
• Aurora (Brown/MIT) – sensor monitoring, dataflow
• Gigascope: AT&T Labs – Network Monitoring
• Hancock (AT&T) – Telecom streams
• Niagara (OGI/Wisconsin) – Internet DBs & XML
• OpenCQ (Georgia) – triggers, view maintenance
• Stream (Stanford) – general-purpose DSMS
• Stream Mill (UCLA) - power & extensibility
• Tapestry (Xerox) – publish/subscribe filtering
• Telegraph (Berkeley) – adaptive engine for sensors
• Tribeca (Bellcore) – network monitoring
§ High-tech startups
• Streambase, Coral8, Apama, Truviso
§ Major DBMS vendors are all adding stream extensions as well
• IBM InfoSphere Stream
• Microsoft streaminsight
• Oracle CEP
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
21. The solution space
Harnessing natural orderings
§ DSMSs are optimised for the simplest portion of the
query in our running example
• retrieve the micro posts that have been posted recently
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
22. The solution space
Harnessing other types of orders
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 22
23. The solution space
Harnessing other types of orders
§ W.r.t. the running example, solutions studied in these
two areas allow to efficiently
• retrieve nearby shops that are discussed by popular social
media users.
§ This is a typical top-k query
• a limited number of results k
• ordered by a scoring function
• that combines several criteria
– e.g., near by and most discussed
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
24. The solution space - Harnessing other types of orders
Treating order as a first class citizen
§ Traditional query § Order-aware query
evaluation schema: evaluation schema:
materialize then sort split and interleave
Limit
to
K
Limit
to
K
[10s]
[10s]
Materialize
join
results
and
order
them
all
by
proximity
of
the
shop
discussed
to
the
issuer
and
popularity
of
the
[10s]
[10s]
social
media
user
[1,000s]
Order
by
Order
by
proximity
to
popularity
discussed
the
issuer
[1,000s]
[100,0000s]
shops
social
shops
social
media
user
media
user
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 24
25. The solution space - Harnessing other types of orders
The split-and-interleave scheme
§ State-of-the-art
• Literature in RDBMS (for a survey see [35]) presents the
split-and-interleave scheme:
1. Split the evaluation of the scoring function
into the evaluation of the single criteria
2. Interleave them with other operators
3. Use partial orders to construct incrementally the final order
§ Standard assumptions:
• Monotone increasing scoring function
• Sorted access for each criterion
• Random access when possible is expensive
• No uncertainty in the scores
• No uncertainty in the scoring function
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
26. The solution space - Harnessing other types of orders
Be aware, it’s a trade-off
Orders of
magnitude
NOTE: Typically users are interested in 1<= k <= 100
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 26
27. The solution space
Harnessing all types of orders together
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 27
28. The solution space
Harnessing all types of orders together
§ W.r.t. the running example, solutions studied in these
area allow to efficiently
• retrieve the shops nearby that popular social media users
are currently positively posting about..
§ This is a typical continuous monitoring of top-k
queries over sliding windows [45]
§ A very promising and little explored research area in
data management
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
29. The solution space
Wrapping up order-aware data mng.
§ Two parts of the query in the running example
remain difficult to express:
• knowing which topics are related to fashion
– requires at least a taxonomy of fashion-related topics
• computing which recent discussions on social media
are popular
– requires to compute the transitive closure of the discussion
§ Both are
• difficult to model without an expressive ontological
language (such as OWL 2) and
• both require complex algorithms that an ontology
reasoner can handle natively
§ Moreover, order-aware data management
techniques do not cope with heterogeneity
• i.e., data should be translated in one common representation
before order-aware data manage- ment techniques can be
applied.
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
30. The solution space
Types of
orders
Combinations
Expensive to enforce
Cheap to enforce
Natural
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 30
31. The Solution Space
Scalable Reasoning
§ Why?
• handling heterogeneity in the input data through
ontology-based information integration
§ In the running example,
• ontological background knowledge can be used to model
relationships between more specific and more general topics
of interest, which can be used to infer which concrete topics
are related to fashion
§ How?
• Data-driven methods
– Scalable methods available in the state-of-the-art
• Query-driven methods
– research trend, implementations are appearing
• Combinations of the previous two
– mostly theoretical results
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
32. The Solution Space – Scalable Reasoning
Data-driven
§ Ontological Language:
• OWL 2 RL
– aimed at applications that require scalable reasoning without sacrificing
too much expressive power
– http://www.w3.org/TR/owl2-profiles/#OWL_2_RL
§ Reasoning approach
• Backward chaining: from asserted data to all possible entailments
§ Pros: Low query latency
§ Cons: they do not take the actual information-need into account
§ Implementations
• OWLIM, Virtuoso, Allegro- Graph, and OntoBroker
§ Research trend
• Parallelization using Map-Reduce as a main paradigm
– e.g. [33,65] for OWL2RL or a fragment thereof [32,64,66,38]
• Applying similar techniques to more expressive fragments of OWL
– e.g., ELK reasoner for OWL EL [37]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
33. The Solution Space – Scalable Reasoning
Query-driven
§ Ontological Language
• OWL 2 QL
– designed for query answering in LOGSPACE w.r.t the size of the data,
with the expressivity of conceptual models (e.g., UML class diagrams)
– http://www.w3.org/TR/owl2-profiles/#OWL_2_QL
§ Reasoning approach
• Forward chaining: from query to asserted facts
• Query rewriting: from ontological query to a set of SQL queries
§ Pros: limit the search space by considering the actual query
§ Cons: number of rewritings grow exponentially
§ Implementations
• QuOnto, Owlgres, and Requiem
§ Research trend
• Extend query rewriting for more expressive ontology languages
– e.g., Datalog± [27,4]
• Parallelization using Map-Reduce
– e.g., Query Pie
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
34. The Solution Space – Scalable Reasoning
Combinations
§ Ontological Language
• Subject to research
§ Reasoning approach
• combine the advantages of data- and query-driven approaches
§ State-of-the-art
• Magic Sets technique [1]
§ Recent theoretical results
• for limited fragment of OWL EL [44]
• for existential rules [4]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
35. The Solution Space – Scalable Reasoning
Approximation
§ Many rule-based systems compute only part of the
entailed consequences by employing a set of rules that cannot
derive all results
• E.g., Jena, Sesame, OWLIM, and Virtuoso
§ A typical approach is to approximate the input information
by restricting to a simpler ontology language that is then
processed with a more efficient, sound and complete algorithm
• e.g., Trowl [48], and screech [62].
§ Approximate reasoning is used as a sub-method in many
sound and complete reasoners,
• e.g., the OWL reasoner HermiT first computes the syntactically told class
hierarchy before using more complex algorithms for a complete subsumption
check.
§ None of the above, however, deal with or take advantage of
orderings of any kind.
§ A number of interesting research challenges thus remain open.
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
36. The solution space
Wrap up of the talk so far
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Cheap to enforce
Natural
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 36
37. The solution space
Reasoning with streaming algorithms
Types of
orders
Combinations Order-aware
data management
reasoning
Expensive to enforce
Order-aware
Top-k
Cheap to enforce Reasoning
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 37
38. The solution space
Reasoning with streaming algorithms
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Cheap to enforce
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 38
39. The solution space
Stream Reasoning [IEEE-IS2009]
§ W.r.t. the running example, solutions studied in these
area allow to efficiently
• compute which recent discussions on social media are
popular
§ For instance, how many micro-posts discussed (either
replying or retweeting) my tweet?
discuss
reply
discuss
reply
discuss
t2
reply
t4
t7
discuss
retweet
discuss
reply
discuss
reply
7!
t1
t3
t5
t8
retweet
discuss
t6
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
40. The solution space
Stream Reasoning features
Trad Data Stream Automatic Stream
Processing Processing Reasoning Reasoning
Feature offers offers offers aims at
Processing
Streams
Handling Large
datasets
Reactivity (real-
time)
Expressing
Fine-grained
queries
Capturing
Knowledge
Access to
Persistent Data
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
41. The solution space
Stream Reasoning definition
§ Making sense [IEEE-IS2010]
• in real time
• of multiple, heterogeneous, gigantic and inevitably noisy
data streams
• in order to support the decision process of extremely
large numbers of concurrent user
§ Note: making sense of streams necessarily requires processing
them against rich background knowledge, an unsolved problem
in database
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
42. The solution space
Architecture of a Stream Reasoner
§ Continuous reasoning tasks registered over
streams that, in most of the cases, are observed
trough windows window
Registered
input streams streams of answer
Con,nuous
Reasoning
Tasks
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
43. The solution space
Stream Reasoning PoliMi’s Achievements
§ RDF Stream data type [WWW2009]
• (virtually) represent heterogeneous data streams
§ C-SPARQL query language [WWW2009]
• express fine-grained continuous queries
• It is “compiled down” to keep high performances
§ Incremental RDFS++ Reasoning [ESWC2010]
• allows for domain knowledge exploitation
§ C-SPARQL Engine [EDBT2010]
• Fully operational prototype
• Deployed in award winning applications (e.g., Bottari [JWS2012])
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
44. The solution space
Stream Reasoning PoliMi’s Achievements
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Cheap to enforce
Natural
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 44
45. The solution space – Stream Reasoning “alla PoliMi”
RDF Stream
§ RDF Stream Data Type
• Ordered sequence of pairs, where each pair is made of an
RDF triple and its timestamp
§ Timestamps are not required to be unique, they must be non-
decreasing
§ E.g.,
(<:Alice :posts :post1 >, 2010-02-12T13:34:41)
(<:post1 :talksAboutPositively :LaScala>, 2010-02-12T13:34:41)
(<:Bob :posts :post2 >, 2010-02-12T13:36:28)
(<:post2 :talksAboutNegatively :Duomo>, 2010-02-12T13:36:28)
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
47. The solution space – Stream Reasoning “alla PoliMi”
Where C-SPARQL Extends SPARQL
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
48. The solution space – Stream Reasoning “alla PoliMi”
An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to
influence the behavior of other users who follow them
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions>
[RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
49. The solution space – Stream Reasoning “alla PoliMi”
An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to
influence the behavior of other users who follow added as
Query registration RDF Stream them
(for continuous execution) new ouput format
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions>
[RANGE 30m STEP 5m]
WHERE { FROM STREAM clause
?opinionMaker ?opinion ?resource .
WINDOW
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource. Builtin to
access
FILTER ( cs:timestamp(?follower) > timestamps
cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses )
Aggregates as
} in SPARQL 1.1
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
50. The solution space – Stream Reasoning “alla PoliMi”
Efficiency of C-SPARQL Query Evaluation
§ window based selection of C-SPARQL outperforms the
standard FILTER based selection
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
51. The solution space – Stream Reasoning “alla PoliMi”
Efficiency of C-SPARQL Query Evaluation
§ C-SPARQL Algebra allows to push of filters and projections
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
52. The solution space – Stream Reasoning “alla PoliMi”
High Throughputs of C-SPARQL Engine
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
53. The solution space – Stream Reasoning “alla PoliMi”
Incremental Materialization evaluation
§ base-line: re-computing the materialization from scratch
§ state-of-the-art (materialized view incremental maintenance)
§ PoliMi’s incremental stream approach [ESWC2010]
% of the materialization changed when the window slides
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
54. The solution space – Stream Reasoning “alla PoliMi”
Incremental Maintenance and Query Latency
§ comparison of the average time needed to answer
a C-SPARQL query using
• backward reasoner
• the naive approach of re-computing the materialization
• PoliMi’s incremental-stream approach
20
15
10
ms.
5
0
forward
reasoning naive
approach incremental-‐stream
query 5,82
Backward reasoning 1,61 1,61
materialization 0 15,91 0,28
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
55. The solution space
Stream Reasoning Community Achievements
§ RDF Stream data type
• Adopted by most of the research groups active on Stream
Reasoning
• Alternative solution based on two time stamps used in eTalis
§ Continuous query language
• C-SPARQL was extended by the community
• Alternative solutions have been studied
– without FROM STREAM clause [CQUELS]
– oriented to complex event processing [2]
§ Reasoning
• Data-driven for RDFS++ [ESCW2010]
• Goal-driven for temporal logics (eTalis) [2]
• time-decaying logic programs [26].
• Inductive reasoning [IEEE-IS2010]
§ Implementation Experiences
• C-SPARQL Engine
• eTalis / EP-SPARQL
• CQUELS
• S2R
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
56. The solution space
Stream Reasoning next steps
§ Scientific
• Notions of soundness and completeness
• More expressive reasoning
– with minor loss in throughput
– and predictable loss on scalability
• Dealing with incomplete & noisy data
• Parallelization and distribution of the processing
§ Technical
• Prove effectiveness and efficacy in specific application
domains
• Better integrate continuous semantics with Linked Data
• Design and develop a software framework to simplify stream
reasoning application development
§ Organizational
• Standardaze RDF Stream, C-SPARQL, Streaming Linked
Data, etc.
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
57. The solution space
Wrap-up of Stream Reasoning
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Cheap to enforce
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 57
58. The solution space
Top-k reasoning
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Top-k
Cheap to enforce Reasoning
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 58
59. The solution space
Top-k reasoning approach
§ In traditional reasoning, ranking of results is
normally considered a task that increase the
hopelessness of scaling inference to massive data
set
§ Top-k reasoning should, instead, overcome such a
common practice and interleave ordering and
reasoning
§ W.r.t. the running example, top-k reasoning should
allow to efficiently
• compute which are the top-k social media users, who are
well-known to lead discussions on fashion-related topics and
are closest to the requester current location.
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
60. The solution space
Top-k reasoning attempts
§ SoftFacts [60]
• an ontology-mediated top-k information retrieval system over
relational databases
§ SparqlRank[13]
• adds order to SPARQL algebra as a first class citizen and
experimentally shows the performance gain
§ AnQL [41]
• extends SPARQL to querying RDFS annotated by bounded
lattice (and thus comes with a partial or- dering).
§ Notion of exact top-k closure of an ontology w.r.t. a
query and a scoring function [53]
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
61. The solution space
Top-k queries in SPARQL 1.1
§ Retrieve the best 10 offers ordered by a function of
user ratings of the product and offer price:
SELECT
?product
?offer
(g1(?avgRat1)
+
g2(?avgRat2)
+
g3(?price)
AS
?score)
WHERE
{
?product
hasAvgRat1
?avgRat1
.
?product
hasAvgRat2
?avgRat2
.
?product
hasName
?name
.
?product
hasOffers
?offer
.
?offer
hasPrice
?price
}
ORDER
BY
DESC
(?score)
LIMIT
10
§ Slow = tens of seconds on 5M (could be improved to
milliseconds)
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
62. The solution space - Top-k queries in SPARQL 1.1
Challenges
§ Adapting SQL optimizations to SPARQL is not
straightforward:
• Different algebra
• Different cost of data access in native RDF triplestores
– Sorted access is slow, random access is fast
• Additional optimization dimensions
– Pushing the evaluation of BGP in the storage
§ Research tasks
• New algebra for SPARQL where order is a first class citizen
• new algorithms, and
• optimization techniques
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
63. The solution space - Top-k queries in SPARQL 1.1
The SPARQL-Rank algebra
§ Extends the standard SPARQL algebra
§ Ranked set of mappings: set of mappings augmented
with an order relation
New
Extended
EQUIVALENC
OPERATORS
ES
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
66. The solution space – SPARQL-Rank algebra
Rank Join Algorithms
§ Different algorithms based on available access in
the inputs: RankJoin
(a)
• Hash Rank-Join RankJoin
sortedAccess sortedAccess
– e.g. HRJN [Ilyas2004] (a)
RankSequence
sortedAccess sortedAccess
(b)
RankSequence
sortedAccess randomAccess
(b)
• Random Access Rank-Join RA-RankJoin
sortedAccess randomAccess
– e.g. RA-HRJN [Ilyas2004] (c) RA-RankJoin
RankJoin
sortedAccess sortedAccess
randomAccess randomAccess
(c)
(a)
sortedAccess sortedAccess
randomAccess randomAccess
sortedAccess sortedAccess
• RankSequence (e,g, RSEQ) RankSequence
– Minimum sorted access (b)
– Leverages random access sortedAccess randomAccess
2 ]
SWC201
EW [I
RA-RankJoin
N
Trento, Italy, 6.11.2012 Emanuele(c)
Della Valle - http://streamreasoning.org/
67. The solution space – SPARQL-Rank algebra
The new Algebraic Equivalences
Split
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
68. The solution space – SPARQL-Rank algebra
The new Algebraic Equivalences
Interleave
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
69. The solution space – SPARQL-Rank algebra
Planning Strategies
§ Apply algebraic equivalences
§ Result: three possible strategies
1. Rank of BGPs 2. Interleaved 3. Rank Join
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
71. The solution space – SPARQL-Rank algebra
Planning Strategies: Interleaved (INTER)
§ Separate the pattern in two groups:
• Triple patterns that influence the ranking
• Triple patterns that don’t influence the ranking
of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score
?pr, ?of, ?score
SLICE [0,10] SLICE [0,10]
E [0,10] SLICE [0,10] SLICE [0,10]
ORDER Join
[?score] ?pr =Sequence
?pr
p1) RankJoin ?pr = ?pr
EXTEND g3(?p1) ?pr = ?pr
[?score =g1(?a1)+g2(?a2)+g3(?p1)] RankJoin g3(?p1) ?prg2(?a2)?n ?pr h
hasN
a1) ?pr = ?pr seqScan
?pr hasA1 ?a1. g1(?a1)
?pr hasA2 ?a2 . g3(?p1) g1(?a1)
?pr hasN ?n . hasN?pr .hasA1 ?a1 . ?pr hasN ?n .
?pr ?n ?pr hasA1 ?a1 .
?of hasP1 ?p1 hasO?pr hasO ?of . ?of hasP1 ?p1 .
?pr ?of . ?pr hasO ?of ?pr hasO ?of . ?of hasP1 ?p1
?of hasP ?p1. ?of hasP ?p1 . ?pr hasA1 ?a1 . ?pr hasA2 ?a2 .
can orderScan_a1 seqScan
a) (a) (b) (b)
(c)
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
72. The solution space – SPARQL-Rank algebra
Planning Strategies: Rank-Join (RJ)
§ Split into one pattern for each ranking criterion
§ Use the most appropriate join based on type of access
?pr, ?of, ?score ?pr, ?of, ?score
?pr, ?of, ?score ?pr, ?of, ?score
SLICE [0,10] SLICE [0,10]
SLICE [0,10] SLICE [0,10]
ORDER Join
ORDER [?score] Join ?pr
?pr =
[?score] RankJoin ?pr = ?pr
RankJoin
EXTEND
EXTEND =g1(?a1)+g2(?a2)+g3(?p1)] ?pr = ?pr
?pr = ?pr
[?score
[?score =g1(?a1)+g2(?a2)+g3(?p1)] RankJoin
RankJoin g ?pr ?pr hasN ?n .
g2(?a2)2(?a2) hasN ?n .
?pr hasA1 ?a1. ?pr = ?pr ?pr = ?pr
?pr hasA1 ?a1.
?pr hasA2 ?a2 . ?pr hasA2 ?a2 . g3(?p1)
?pr hasN ?n .
g3(?p1) g1(?a1) g1(?a1)
?pr hasN ?n .
?pr hasO ?of . ?pr hasO ?of . ?pr hasO ?of . ?pr hasO ?of .
?of hasP ?p1. ?of hasP ?p1. ?of hasP ?p1 . ?of hasP ?p1 ?pr hasA1 ?pr hasA1 ?a1 . ?pr hasA2 hasA2 ?a2 .
. ?a1 . ?pr ?a2 .
(a) (a) (b) (b)
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
73. The solution space – SPARQL-Rank algebra
Experimental evidences of performance improvements
§ Example query, 5M triples dataset
§ Assumption: availability of sorted access indexes
Two orders
of magnitude
better
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
74. The solution space – SPARQL-Rank algebra
Experimental evidences of performance improvements
§ Benchmark: 8 queries from on an extension of BSBM
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
75. The solution space
Wrap-up of Top-k Reasoning
Types of
orders
Combinations
data management
Expensive to enforce
Order-aware
Top-k
Cheap to enforce Reasoning
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 75
76. The solution space
Full-fledge Order-aware reasoning
Types of
orders
Combinations Order-aware
data management
reasoning
Expensive to enforce
Order-aware
Top-k
Cheap to enforce Reasoning
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 76
77. The solution space
Full-fledge Order-aware reasoning
§ In Full-fledged order-aware reasoning, data- and
query-driven inference methods have to deal with
combinations of natural, cheap to enforce and
expensive to enforce type of orders.
• the naive assumption of independence of orderings would
have to be relaxed
• theories and methods, which exploit mutual relationships
between the three type of orders, have to be rethought
§ Considering our running example, methods
implementing order-aware reasoning are the only
ones able to answer to the query
• Which users of social media, currently leading popular
discussions on fashion- related topics, are closest to my
current location? What are they saying about the shopping
district nearby?
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
78. The solution space
Full-fledge Order-aware reasoning
§ State-of-the-art
• None
§ Promising work
• The Answer Set Programming (ASP) community has recently
proposed an streaming algorithm for ASP [25] that
1. ranks the constants referring to domain elements and,
2. fetch them increasing the domain sizes until an answer set is
found.
§ Challenges
• theoretical framework that unifies and generalises those
defined for stream reasoning and top-k reasoning
• designing and test scalable data- and query-driven methods
that allows for efficient answering of queries that involve all
types of orders
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
79. The solution space
Wrap-up of Top-k Reasoning
Types of
orders
Combinations Order-aware
data management
reasoning
Expensive to enforce
Order-aware
Top-k
Cheap to enforce Reasoning
Natural
Stream reasoning
No ordering
Scalable reasoning
Types of
reasoning
No reasoning Data-driven Query-driven Combinations
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 79
80. References
My papers
[IEEE-IS2009] E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel
It's a Streaming World! Reasoning upon Rapidly Changing Information.
IEEE Intelligent Systems 24(6): 83-89 (2009)
[EDBT2010] D.F. Barbieri, D.Braga, S. Ceri and M. Grossniklaus.
An Execution Environment for C-SPARQL Queries. EDBT 2010
[WWW2009] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus:
C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062
[IEEE-IS2010] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Rettinger, H.
Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics IEEE
Intelligent Systems, 30 Aug. 2010.
[JWS2012] M. Balduini; I.Celino; E. Della Valle; D.Dell'Aglio; Y. Huang; T. Lee; S. Kim; V. Tresp:
BOTTARI: an Augmented Reality Mobile Application to deliver Personalized and Location-based
Recommendations by Continuous Analysis of Social Media Streams. JWS. 2012. IN PRESS.
[ESWC2010] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus.
Incremental Reasoning on Streams and Rich Background Knowledge. ESWC 2010
[SWJ2012] E. Della Valle, S.Schlobach, M. Krötzsch, A. Bozzon, S. Ceri, I. Horrocks.
Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data. IN PRESS
[ISWC2012] S. Magliacane, A. Bozzon, E. Della Valle.
Efficient Execution of Top-k SPARQL Queries. ISWC 2012. IN PRESS
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
81. Downloads
§ C-SPARQL Engine (no reasoning support)
• A ready to go pack for eclipse
– http://streamreasoning.org/download
• Source code available on request
§ SPARQL-Rank Engine (ARQ-Rank)
• Source code and experimental data
– http://sparqlrank.search-computing.org/
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
82. Thank You!
Any questions? emanuele.dellavalle@polimi.it
Keep an eye on
http://www.streamreasoning.org
There’s much more to come!
Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 82