Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Horn Concerto – AKSW Colloquium

340 Aufrufe

Veröffentlicht am

Preliminary results of an ongoing work titled "Efficient Rule Mining on RDF Data". University of Leipzig, AKSW Colloquium, April 3rd, 2017.

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Horn Concerto – AKSW Colloquium

  1. 1. HORN CONCERTO Efficient Rule Mining on RDF Data Tommaso Soru AKSW Colloquium, 03.04.2017
  2. 2. RDF Rule Mining • RDFS/OWL rules are given as schema. • Schema-free datasets might have an implicit schema. • Why “mining”? Because rules are not visible in data. 2
  3. 3. Motivation/1 Link Prediction Problem.
 
 Given a union of graphs G = G1 ∪ … ∪ GN,
 find new edges among vertices s and t in G. 3
  4. 4. Motivation/2 Markov Logic Networks.
 
 Given a collection of first-order statements (evidence) and a set of weighted first-order rules, build an undirected weighted graph where nodes are statements and edges indicate dependency. 4
  5. 5. Motivation/3 RDFS/OWL Interpretation Rule Mining Weight Learning Grounding Inference Input Dataset(s) Predicted TriplesMANDOLIN’s pipeline Markov Logic Networks 5
  6. 6. Rule Mining & Weight Learning Given a directed labelled graph, find rules
 and weights associated with them. w rule w1 p(x, y) ← q(x, y) w2 p(x, y) ← q(y, x) w3 p(x, y) ← q(x, z) ^ r(y, z) w4 p(x, y) ← q(x, z) ^ r(z, y) w5 p(x, y) ← q(z, x) ^ r(y, z) w6 p(x, y) ← q(z, x) ^ r(z, y) Horn Clauses 6
  7. 7. Horn Clauses A Horn clause is a clause (a disjunction of literals patterns)
 with at most one positive, i.e. unnegated, literal pattern. p ∨ ¬q1 ∨ ¬q2 ∨ ... ∨ ¬qn p ← q1 ∧ q2 ∧ ... ∧ qn head body p(x,y) 7
  8. 8. Confidence score (weight) The confidence score of a rule is defined as the rate of
 the occurrences of head and body together
 over
 the occurrences of the body. p ← q1 ∧ q2 ∧ ... ∧ qn head body 8
  9. 9. The HORN CONCERTO approach P(A | B) = P(A∩ B) P(B) Bayes’ Theorem p ← q1 ∧ q2 ∧ ... ∧ qn Horn Clause Event-based confidence score P(p | ! q) = P(p ∩ ! q) P( ! q) ≈ p ∧ ! q{ } ! q{ } 9
  10. 10. Rules with p ∧ ! q{ } ! q{ } SELECT ?p (COUNT(*) AS ?c) WHERE { ?x ?p ?y . ?x <target_q> ?y . FILTER(?p != <target_q> ) } GROUP BY ?p SELECT ?q (COUNT(*) AS ?c) WHERE { [] ?q [] } GROUP BY ?q ! q = 1 p(x, y) ← q(x, y) 10
  11. 11. Rules with p ∧ ! q{ } ! q{ } SELECT ?q ?r (COUNT(*) AS ?c) WHERE { ?x ?q ?z . ?z ?r ?y . ?x <target_p> ?y } GROUP BY ?q ?r SELECT (COUNT(*) AS ?c) WHERE { ?x <target_q> ?z . ?z <target_r> ?y } ! q = 2 p(x, y) ← q(x, z) ∧ r(z, y) 11
  12. 12. Optimizations • Select only top N properties. • Order by descending score and prune when it’s lower than a threshold T. • Cache scores in-memory, as there might exist p1, p2 such that: pi(x,y) ← q(?,?), r(?,?). • Parallelize algorithm by rule type. 12
  13. 13. Evaluation Setup • 8 CPUs, 32 GB RAM, Ubuntu 16.04 • Scalability study DBpedia Person (7 million triples) DBpedia 2016-04 (397 million triples) FUTURE • Rule effectiveness for link prediction FB15k (592 thousand triples) WN18 (151 thousand triples) • Rule quality (human judgment?) 13
  14. 14. Preliminary results – DBpedia Person Runtime (s) # Rules Used RAM (GB) AMIE+ > 10 days 6,337 4 Ontological PF > 3 hours > 1,000 4 HORN CONCERTO single-thread 1.7 hours 3,125 client: 0.2 server: 1.0 14
  15. 15. Preliminary results – DBpedia 2016-04 Runtime (s) # Rules Used RAM (GB) HORN CONCERTO single-thread 11 hours 887 client: 0.2 server: N/A 15
  16. 16. Discussion • AMIE+ Cons: indexes the graph in-memory. • Ontological PF Pros: Very fast Cons: Relies on schema data (types domain, range) • Horn Concerto Pros: Works with SPARQL endpoint, fast also single-threaded,
 may be able to overperform Ontological PF in RDF datasets with available schema. 16
  17. 17. Thank you.

×