Paper presented at European Semantic Web Conference ESWC, 3-7 June 2018, held in Heraklion, Crete, Greece (Aldemar Knossos Royal & Royal Villa).
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Design and Development of a Provenance Capture Platform for Data Science
Dynamic planning for link discovery - ESWC 2018
1. Dynamic Planning for Link Discovery
Kleanthi Georgala and Daniel Obraczka and Axel-Cyrille Ngonga Ngomo
AKSW Research Group, University of Leipzig, Germany
Data Science Group (DICE), Paderborn University, Germany
June 2nd, 2018
Heraklion, Crete, Greece
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 1 / 29
2. Overview
1 Motivation
2 Approach
3 Evaluation
4 Conclusions and Future Work
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 2 / 29
5. What is Link Discovery
4th Linked Data principle: Include links to other
URIs so that they can discover more things.
Definition (Link Discovery)
Given sets S and T of resources and relation R
Find M = {(s, t) ∈ S × T : R(s, t)}
Example: R = :failureType
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 5 / 29
6. Declarative Link Discovery
M is difficult to compute directly
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 6 / 29
7. Declarative Link Discovery
M is difficult to compute directly
compute M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
use Link Specification (LS)
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 6 / 29
8. Declarative Link Discovery
M is difficult to compute directly
compute M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
use Link Specification (LS)
describe conditions for which R(s, t) holds
Similarity measure m : S × T → [0, 1]
Specification operators op: , ,
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 6 / 29
9. Declarative Link Discovery
M is difficult to compute directly
compute M = {(s, t) ∈ S × T : σ(s, t) ≥ θ}
use Link Specification (LS)
describe conditions for which R(s, t) holds
Similarity measure m : S × T → [0, 1]
Specification operators op: , ,
(θ, 0.73) cosine(:label, :label), 0.46
Right Child
Similarity measure:cosine(:label, :label)Threshold:0.46
Atomic LS
trigrams(:type, :type), 0.87
Left Child
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 6 / 29
10. Why is it difficult?
Accuracy: correct links
Genetic programming
Refinement operators
. . .
Time efficiency: fast and scalable linking
Runtime reduction of the atomic similarity measures
Planning algorithms (e.g. HELIOS [1])
Use of cost functions to approximate runtime of LS
No exploitation of global knowledge about the LS
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 7 / 29
11. Our Contributions
Intuition
The execution engine knows more about runtimes than the planner once it has
executed a portion of the specification.
First dynamic planner for LD (Condor)
Mutable plans by re-shaping
Feedback loop between the planner and
the engine
Duplicated steps are executed once
Dependencies between steps of the plan
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 8 / 29
26. Dynamic execution of LS
Execution Engine:
Execute Left Child
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
27. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
28. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
29. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Set Left Child and its sub-LSs as executed
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
30. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Set Left Child and its sub-LSs as executed
Condor:
Receive feedback from Execution Engine
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
31. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Set Left Child and its sub-LSs as executed
Condor:
Receive feedback from Execution Engine
Re-evaluate plan
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
32. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Set Left Child and its sub-LSs as executed
Condor:
Receive feedback from Execution Engine
Re-evaluate plan
Executed plans are more important than runtime estimations
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
33. Dynamic execution of LS
Execution Engine:
Execute Left Child
Cache intermediate results
Replace the estimated costs with its real costs
Set Left Child and its sub-LSs as executed
Condor:
Receive feedback from Execution Engine
Re-evaluate plan
Executed plans are more important than runtime estimations
Re-plan remaining steps
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 17 / 29
41. Experiment set-up
Datasets:
4 benchmark datasets: Abt-Buy Amazon-GP, DBLP-ACM and DBLP-Scholar
Scalability: MOVIES, TOWNS and VILLAGES
Input LS:
100 LSs for each dataset by Eagle
Unsupervised version
High accuracy in LSs
Comparison with Canonical and Helios
All planners achieved 100% F-measure
Evaluation metric: Runtime
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 22 / 29
42. Experiment 1
Q1 : Does Condor achieve better runtimes for LSs?
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 23 / 29
43. Experiment 1
Q1 : Does Condor achieve better runtimes for LSs?
Condor outperforms both static planners in all datasets
Wilcoxon signed-rank test on cumulative runtimes: statistically significant
differences
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 23 / 29
44. Experiment 2
Q2 : How much time does Condor spend planning?
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 24 / 29
45. Experiment 2
Q2 : How much time does Condor spend planning?
Condor needs less than 10ms for planning
Best average performance in Amazon-GP
4.6 times faster than Canonical
8 times faster than Helios
0.1% of overall runtime used in planning
Highest absolute difference in DBLP-Scholar
600s less runtime than Canonical
110s less runtime than Helios
0.0005% of overall runtime used in planning
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 24 / 29
46. Experiment 3
Q3 : How do the different sizes of LSs affect Condor ’s runtime?
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 25 / 29
47. Experiment 3
Q3 : How do the different sizes of LSs affect Condor ’s runtime?
LSs of size 1: same results for all planners
LSs of size 3: 7.5% faster than static planners
LSs of size 5++: 30.5% resp. 55.7% less time compared to Canonical
resp. Helios
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 25 / 29
48. Conclusions and Future Work
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 26 / 29
49. Conclusions and Future Work
Condor, a dynamic planner for link discovery:
Combination of dynamic planning with subsumption and result caching
Comparison with state-of-the-art: Canonical and Helios
Evaluation:
Experiments on 7 datasets: variety in size and classes
Significantly better runtimes than existing planning solutions
Up to 2 orders of magnitude faster
Requires less than 0.1% of the total runtime of a given LS for plan generation
Future Work:
Improvement of the cost function
Parallel execution of plans
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 27 / 29
50. Thank you!
Visit http://aksw.org/Projects/LIMES.html
https://twitter.com/DiceResearch
Questions?
Kleanthi Georgala
georgala@informatik.uni-leipzig.de
AKSW Research Group at Leipzig University
DICE Group at Paderborn University
http://aksw.org/KleanthiGeorgala.html
This work has been supported by H2020 projects SLIPO (GA no. 731581) and HOBBIT (GA no. 688227) as well as the DFG project LinkingLOD (project
no. NG 105/3-2) and the BMWI Project GEISER (project no. 01MD16014)
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 28 / 29
51. References
A.-C. Ngonga Ngomo.
HELIOS - Execution Optimization for Link Discovery.
In The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference,
Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, pages 17–32.
Springer, 2014.
Georgala, Obraczka & Ngonga Ngomo (DICE) CONDOR June 5, 2018 29 / 29