Abstract. Many approaches have been proposed for Stream Reasoning (SR). Some of them combine information flow processing (IFP) tech- niques and semantic technologies to make sense in real-time of noisy, vast and heterogeneous data streams that come from complex domains. More recent works shown the presence of a trade-off between through- put and reasoning expressiveness. Indeed, systems with IFP-like perfor- mance are not really expressive (e.g. up to an RDFS subset) and vice versa. For static data, Information Integration (II) systems approached the problem already. The idea consists in spreading the reasoning com- plexity over different layers of an hierarchical architecture and treating it where it is easier to do. Is it possible realize an expressive and efficient stream reasoning (E2SR), by defining a hierarchical approach that adapts II techniques to the streaming scenario? In this paper, I discuss my plan towards E2SR, the intuition of adapting Information Integration tech- niques to the streaming scenario and the need of Stream Reasoning of comparative analysis to support its technological progress.
Scanning the Internet for External Cloud Exposures via SSL Certs
A Hierarchical approach towards Efficient and Expressive Stream Reasoning
1. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
A Hierarchical approach towards Efficient
and Expressive Stream Reasoning
Riccardo Tommasini (Ph.D Student at Politecnico di Milano, DEIB )
Advisor: Emanuele Della Valle (Assistant Professor at Politecnico di Milano, DEIB)
1
Web Reasoning and Rule Systems Conf. 2016,
Doctoral Consortium
5. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Stream Reasoning
Supports complex domains decision making
in real-time (reactively).
I.e., making sense of
vast and heterogeneous,
noisy and incomplete
streams of data.
5
Vision
6. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Stream Processing and Reasoning
Data Stream Management Systems (DSMS) e.g., Esper, Flink
Complex Event Processing Engines (CEP) e.g., Drools Fusion, Esper.
RDF Stream Processing (RSP) e.g., C-SPARQL, CQELS, SKB.
Rule Based Systems e.g., (RBS) EP-SPARQL, Sparkwave.
Ontology Based Data Access (OBDA) e.g., Morphstream, STARQL.
Incremental Maintenance of Ontology Materialisation (IMOM), e.g,
RDFox, TrOWL
6
State-of-the-art
7. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano) 7
SR DSMS CEP RSP RBS OBDA IMOM
Vast x x x
Heterogeneous x x x x x
Noisy x x
Incomplete x x x x
Stream x x x
Time-Aware x x x
Complex Domains x x x
Approaches VS Challenges
8. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano) 8
Research Question
Can we realise an expressive and efficient stream reasoning?
9. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano) 9
Research Question
Can we realise an expressive and efficient stream reasoning?
Still unanswered!
10. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano) 10
Research Question
Can we realise an expressive and efficient stream reasoning,
using a hierarchical approach?
11. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Cascading Reasoning
11
Stuckenschmidt, H., Ceri, S., Della Valle, E., & Van Harmelen, F.
(2010). Towards expressive stream reasoning
12. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Cascading Reasoning vs State-of-the-art
12
Stuckenschmidt, H., Ceri, S., Della Valle, E., & Van Harmelen, F.
(2010). Towards expressive stream reasoning
C-SPARQL
EP-SPARQL
trOWL
ESPER
13. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Information
Integration Systems
The role of II systems is to
provide a uniform view of
the data in the sources.
13
Integrated Conceptual
Model (ICM)
Mappings
Data
Sources
Query
Wrappers
14. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Information Integration Systems
Integrated Conceptual Model (ICM), i.e., a common
vocabulary, formally defined, that enables query answering.
Mapping, i.e., (typically) FOL statements that establish
links between ICM and data sources.
Wrapper, i.e., interfaces to reinterpret the data source
into a data model that enables the mapping.
14
at a glance
15. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Cascading Reasoning VS Information Integration
15
Stuckenschmidt, H., Ceri, S., Della Valle, E., & Van Harmelen, F.
(2010). Towards expressive stream reasoning
z
ICM
z
Wrapping
z
Mapping
16. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Research Plan
16
17. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Research Questions
17
Q.1, Can we extend the mapping language to include time-
related operators (e.g. windows) and engines operational
semantics?
Q.2, Can we extend the ontological language to include time
operators without degenerate into intractability?
Q.3, Can we enable a systematic comparative research
approach for stream reasoners?
18. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Q.1, Can we extend the mapping language to include time-
related operators (e.g. windows) and engines operational
semantics?
Q.2, Can we extend the ontological language to include time
operators without degenerate into intractability?
Research Questions
18
Q.3, Can we enable a systematic comparative research
approach for stream reasoners?
19. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Research Questions: Q.1
19
Stuckenschmidt, H., Ceri, S., Della Valle, E., & Van Harmelen, F.
(2010). Towards expressive stream reasoning
Q.1
relates with rewriting and interpretation
20. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Q.1 Research Plan
(i) include the continuous semantics to enable continuous
querying over virtual RDF Stream data sources;
(ii) include time aware operators, e.g. windows, to enable
rewriting over continuous query languages e.g. EPL;
(iii) enable the description of stream processors execution
semantics.
20
Extending mapping language to
21. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Research Questions: Q.1
21
Stuckenschmidt, H., Ceri, S., Della Valle, E., & Van Harmelen, F.
(2010). Towards expressive stream reasoning
Q.2
relates with reasoning and abstraction
22. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Q.2 Research Plan
(i) identify meaningful OWL 2 DL fragments for Stream
Reasoning.
(ii) consider temporal extension of DLs that do not
degenerate to intractability.
(ii) exploit time-related operators typical of complex event
processing or event calculus to provide rule based reasoning.
22
Extend the ICM language to
23. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Evaluation Plan
23
24. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
A good
evaluation
by Nico Matentzoglu
24
25. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Stream Reasoning Benchmarking
Mostly related to RDF Stream Processing
Focused on query answering
Limited Entailment (RDFS subsets)
Lack of expressive benchmarks
Lack of shared approaches
No absolute winner (RSP)
25
26. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Research Questions
Q.1, Can we extend the mapping language to include time-
related operators (e.g. windows) and engines operational
semantics?
Q.2, Can we extend the ontological language to include time
operators with- out degenerate into intractability?
26
Q.3, Can we enable a systematic comparative research
approach for stream reasoners benchmarking?
27. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Benchmark Principles
The goal of a domain specific benchmark is to foster
technological progress by guaranteeing a fair
assessment.
Jim Gray, The Benchmark Handbook
for Database and Transaction Systems, 1993
27
28. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Experiment
Design
for Stream Reasoning
28
is the engine used as subject in the
experiment;
is an ontology and any data not subject
to change during the experiment.
is the description of the input data
streams:
is the set of continuous queries
registered into the engine
is the set of key performance
indicators (KPIs) to collect.
The result of the execution of an
experiment is a Report that captures
the engine dynamics.
E
T
Q
D
K
R
29. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Test Stand Architecture
29
30. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
RSP Baselines
The minimal meaningful
approaches to realise an
RSP engine
Pipeline of DSMS and a reasoner;
Support reasoning under the ρDF
entailment regime;
Data can flows from the DSMS to the
reasoner via snapshots (i.e. Figure 2-A)
or differences ( Figure 2-B);
They exploit absolute time, i.e. their
internal clock can be externally
controlled.
30
33. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano) 33
Achievements and Future Works
Conclusion
34. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Lessons Learned
- Stream Reasoning benchmarking requires further
investigations
- RSP research is mature (active w3c group), but still its
role can be further investigated
34
35. RR - 2016 - Aberdeen - Riccardo Tommasini (Politecnico di Milano)
Achievements
- Publication: Heaven: a framework for systematic
comparative research approach for RSP engines (ESWC 2016)
- Promising work for semantic Complex Event Processing
- First steps towards a “naïve” implementation of cascading
reasoning (collaboration with UGENT)
35