PAGOdA (Pay-as-you-go OWL Query Answering Using a Triple Store) presentation by Bernardo Cuenca Grau
Abstract: We present an enhanced hybrid approach to OWL query answering that combines an RDF triple-store with an OWL reasoner in order to provide scalable pay-as-you-go performance. The enhancements presented here include an extension to deal with arbitrary OWL ontologies, and optimisations that significantly improve scalability. We have implemented these techniques in a prototype system, a preliminary evaluation of which has produced very encouraging results.
2. Ontology-mediated Query Answering
Q
A C
T
B D
RDF Data
a
b
• (Meta)-data published in RDF
• RDF resources reference an OWL 2 ontology
• The ontology describes the meaning of data
RDF and OWL 2 well-established
• Thousands of available OWL 2 ontologies
• RDF ubiquitous on the Web
2
3. Ontology-mediated Query Answering
Ontology languages offer a wide range modeling constructs
High expressive power à high worst-case complexity of reasoning
How can we provide scalable query answering?
• Restrict our ontology to a lightweight fragment of OWL
EL, QL or RL profiles
• Tolerate incompleteness
• Rely on highly optimised pay-as-you-go systems
• Worst case optimal for lightweight fragments
• Rapidly computes easy answers
• Performance gracefully degrades with harder instances
3
4. Datalog and the OWL 2 Profiles
Datalog is the quintessential rule-based KR language
• Reasoning typically implemented via materialisation
• Our in-house system RDFox shows excellent performance
Query answering within the OWL 2 profiles
• RL ontologies equivalent to Datalog programs
• EL and QL ontologies can be strengthened using Datalog
Query answering requires an additional filtration step
4
5. Incomplete Reasoning
§ RL / EL reasoning w.r.t. arbitrary OWL ontology O dataset D and
query q gives (in general) an incomplete answer L
P Profile-specific reasoning via Datalog (relatively) scalable
O Answers may be incomplete
O Degree of incompleteness unknown
O Incompleteness may be pathological (empty answers)
5
L = cert(q, hO`,Di) ✓ cert(q, hO,Di) with O |= O`
6. The idea behind PAGOdA
6
Redistribute reasoning workload
Datalog reasoner
Fully-fledged OWL 2 reasoner
Resort to expensive OW2 reasoning as
little as possible (if at all)
Ensure sound and complete answers
Do not restrict ontology language
Datalog reasoner OWL 2 reasoner
7. Step 1: Lower and Upper Bounds
ELHO Lower
Lower
Data
Upper
Data
Ontology
Query
Datalog
Engine
Datalog
Engine
7
Profile-specific reasoning via Datalog gives a lower bound
L gives a subset of
cert(q, hO,Di)
We transform O into strictly stronger Datalog ontology Ou
• Normalise ontology into Datalog±,v rules
• Eliminate ∨ by transforming to ∧
• Replace existential variables with Skolem constants
Datalog reasoning w.r.t. Ou
gives upper bound answer U
cert(q, hO,Di) ✓ cert(q, hOu,Di) = U
8. Step 2: Module extraction
8
Checking possible answers in U L is expensive
Compute a fragment of ontology + data sufficient to
check each answer in U L.
Fragment computation involves proof tracing in Ou
Achieved also using Datalog materialisation
Relevant fragments are typically much smaller
Size of the problem substantially reduced
Datalog Engine U
D
Fragment
9. Step 3: Summarisation
9
Fragment
Summarisation
Summary
Full Reasoner Q
Further reduce problem size by summarising the fragment
• Technique introduced by the SHER team at IBM
• “Merge” constants that are instances of same concepts
• Check answers against summary using OWL 2 reasoner
• The summary of the fragment is typically very small
This is an orthogonal over-approximation to previous ones
We further reduce the size of U L
Sometimes we even make it empty !
10. Step 4: Dependency analysis
10
F
Dependency Analysis
F
Full Reasoner Q
Output
Group remaining candidate answers
• If a and b are in the same group then a is an answer iff b is
• We can also establish dependencies between groups
Check group representatives against fragment using the
fully-fledged reasoner.
11. Features of PAGOdA
PAGOdA provides PAYG query answering for OWL 2:
§ Uses Datalog reasoner “out of the box”
§ Efficiently computes sound partial answers
§ In “easy” cases, efficiently computes complete answers
§ In “harder” cases, applies increasingly powerful but less scalable
reasoning techniques as needed to completely answer query
§ The last step involving full reasoner is rarely needed in practice
§ Recent improvements
§ Better and better upper bounds
§ Smaller and smaller modules
11
12. Queries answered by each technique
LUBM UOBM FLY DBPedia NPD
Total 24 15 6 441 329
Bounds 22 12 5 439 326
Sum 22 14 5 440 329
Full 24 15 6 441 329
Scalability for lower and upper bound computation
Importing Lower Mat Upper Mat Ave QA
LUBM1000 313s 190s 269s 12s
UOBM500 356s 346s 734s 4s
13. Queries that require full reasoning
Lower Upper Gap Sum Groups
LUBM100_q20 0 26 26 26 1
LUBM100_q22 0 14 14 14 1
UOBM1_q14 6271 6535 264 264 1
FLY_q5 0 344 344 344 1
DBPedia_q404 0 2 2 2 1
14. Lower Upper Frag Size (%) Sum Full
LUBM100_q20 0.2s 0.3s 14.5s .005/.04 1.2s 190.1s
LUBM100_q22 0.3s 0.2s 10.0s .005/.04 0.8s 46.1s
UOBM1_q14 0.1s 0.1s 0.7s .17/.076 0.5s 5.4s
FLY_q5 0.0s 0.0s 16.0s .34/.01 0.1s 0.2s
14
Time distribution and fragment size
15. PAGOdA Team
§ Yujiao Zhou
§ Yavor Nenov
§ Bernardo Cuenca Grau
§ Ian Horrocks
15