Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Focused Exploration of Geospatial Context on Linked Open Data

436 Aufrufe

Veröffentlicht am

Talk by Dr. Thomas Gottron at the IESD 2014 workshop in Riva del Garda (at ISWC).


Abstract The Linked Open Data cloud provides a wide range of different types of information which are interlinked and connected. When a user or application is interested in specific types of information under time constraints it is best to ex- plore this vast knowledge network in a focused and directed way. In this paper we address the novel task of focused exploration of Linked Open Data for geospatial resources, helping journalists in real-time during breaking news stories to find contextual geospatial information related to geoparsed content. After formalising the task of focused exploration, we present and evaluate five approaches based on three different paradigms. Our results on a dataset with 425,338 entities show that focused exploration on the Linked Data cloud is feasible and can be implemented at very high levels of accuracy of more than 98%.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Focused Exploration of Geospatial Context on Linked Open Data

  1. 1. Focused Exploration of Geospatial Context on Linked Open Data Thomas Gottron, Johannes Schmitz, Stuart E. Middleton 20 October 2014 IESD workshop, Riva del Garda Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 1
  2. 2. Challenge: Focused Exploration of LOD • Linked Data entities Thomas Gottron Focused Exploration of LOD 2
  3. 3. Challenge: Focused Exploration of LOD • Linked Data entities • (Semantic) link structure Thomas Gottron Focused Exploration of LOD 3
  4. 4. Challenge: Focused Exploration of LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities Thomas Gottron Focused Exploration of LOD 4
  5. 5. Challenge: Focused Exploration of LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities • Seed entity Thomas Gottron Focused Exploration of LOD 5
  6. 6. Challenge: Focused Exploration of LOD • Linked Data entities • (Semantic) link structure • „Relevant“ entities • Seed entity ? ? ? ? ? ? Classification: Which links lead to relevant entities? Ranking: How probable is a link leading to a relevant entity? Use Cases: Guided exploration Focused LOD crawler Thomas Gottron Focused Exploration of LOD 6
  7. 7. Focused exploration of Geospatial Context Rovereto Relevant entities: Locations semantically related to seed entities Bensheim (Germany) Thomas Gottron Focused Exploration of LOD 7
  8. 8. Focused Exploration: Formalisation • E: set of entities (URIs) • R: set of RDF triples (s,p,o) s∈ L – Restricted to s,o ∈ E wgs84:long • L⊆E: relevant entities -1.404 – For us: Locations with coordinates • Task: for given s‘ and all (s‘,p,o) ∈ R – Classification: Predict which o are in L – Ranking: Sort object entities o starting from the one presumed most probable to be relevant wgs84:lat 50.897 Thomas Gottron Focused Exploration of LOD 8
  9. 9. 5 Approaches • Based on 3 paradigms: – Schema semantics (1 approach) – Supervised machine learning (2 approaches) – Information Retrieval inspired (2 approaches) Thomas Gottron Focused Exploration of LOD 9
  10. 10. Exploration based on Schema Semantics • Exploit rdfs:range definitions of link predicates rdfs:range dbpedia:Place rdfs:subClassOf dbponto:twinCity dbpedia:City • Follow links which lead to locations Thomas Gottron Focused Exploration of LOD 10
  11. 11. Exploration based on Schema Semantics s Classification p1 p2 • Range of any pi is a location? àLabel = relevant o pm Ranking Location? • Re-use classification: – Relevant before irrelevant ... Thomas Gottron Focused Exploration of LOD 11
  12. 12. Supervised Machine Learning • Use incoming link predicates as features – Learn predicates which typically leading to locations p4 p6 p2 p3 o‘ o xxx wgs84:lat yyy wgs84:long • Train a classifier (e.g. Naive Bayes) 2 Variations: Use all or only observed predicates Thomas Gottron Focused Exploration of LOD 12
  13. 13. Supervised Machine Learning s Classification • p1 P(o ∈ L) > P(o ∉ L)? àLabel = relevant o pm Ranking Location? • Rank by odds: p2 ... O(o ∈ L) = P(o ∈ L) P(o ∉ L) Thomas Gottron Focused Exploration of LOD 13
  14. 14. IR Inspired Approaches • Discriminativeness of predicates (inspired by tf-idf) • Property relevance frequency: • Inverse property frequency • Combine into prf-ipf and prr-ipf • Total score ρ: aggregate over all predicates prf = c(p, L) ipf = log c(∗,∗) c(p,∗) " # $ Thomas Gottron Focused Exploration of LOD 14 % & ' o p3 2nd Variation: prr: normalised prf
  15. 15. IR Inspired Approaches s Classification p1 p2 • Determine threshold – Nearest centroid o pm Ranking Location? • Rank by score ... ρ prr-ipf (o) Thomas Gottron Focused Exploration of LOD 15
  16. 16. Evaluation • Metrics: – Ranking: • ROC curves • AUC – Classification: • Precision • Recall • F1 • Accuracy • Cross validation: – 10-times / 10-fold – Averages 99,951 entities 1,728,633 links 425,338 entities 128,171 relevant Seed Exploration owl:sameAs Thomas Gottron Focused Exploration of LOD 16
  17. 17. Performance (Ranking) 1 0.8 0.6 0.4 0.2 0 ROC 1 0.975 0.95 0 0.025 0.05 0 0.2 0.4 0.6 0.8 1 random Schema Semantics NB (all predicates) NB (present predicates) prf-ipf prr-ipf Thomas Gottron Focused Exploration of LOD 17
  18. 18. Performance (Classification & Ranking) 2. Average performance of approaches († indicates significant improvements confidence level ⇢ = 0.01) Method Recall Precision F1 Accuracy AUC Schema Scemantics 0.1188 0.8119 0.2073 0.7262 0.5552 NB (all predicates) 0.9906 0.9491 † 0.9694 † 0.9812 0.9970 NB (observed predicates) 0.9943 0.9436 0.9683 0.9804 0.9968 prf-ipf 0.8512 † 0.9754 0.9091 0.9487 0.9958 prr-ipf † 0.9973 0.9240 0.9592 0.9745 0.9769 performance in bold. Furthermore, we marked the results where we had a significant over the second best method at confidence level of ⇢ = 0.01. The aggregated basically Thomas Gottron confirm the observations Focused Exploration made of above. LOD In general, when considering 18
  19. 19. Summary • Focused exploration feasible • ML approach performing best • Future work: – Other data sets – Generalise scenario (more than locations) – Better approaches using more features Thomas Gottron Focused Exploration of LOD 19
  20. 20. Questions? Thomas Gottron Institute for Web Science and Technologies Universität Koblenz-Landau gottron@uni-koblenz.de Thomas Gottron Focused Institute for Web Science and Technolo Egxieplso r a·t i oUnn oifv LeOrDs ity of Koblenz-Landau, Germany 20

×