8. Overview
• Silk as Contextualisation Tool
• System Integration
• Contextualisation Progress and Results
• Challenges
• Applicability and Reuseability
• Future Plans
818.06.2014
9. Contextualisation with Silk
• Silk: Link Discovery Framework (UMA)
• Definition of linkage rules to create links between Linked
Data resources
• http://context.dm2e.eu
918.06.2014
10. Intergration of Silk
• Silk is integrated in OmNom as Web Service
1018.06.2014
use generated
configuration
show links
11. Access to Contextualisation Results
• Contextualization results (Linksets) are kept separate
from ingested data
• Linksets are further described and versioned
• Additional linkset properties (tbd):
– Automatically created
– Manually created
– Recall-oriented (exploratory, but with wrong links)
– Precision-oriented (incomplete, but high quality)
1118.06.2014
12. Used Linked Data Resources
1218.06.2014
Geonames GND
LCSHDBPedia
Freebase
Places Subjects
Agents
DDC
Linked
Geodata
13. Example Process
1318.06.2014
• Manual creation of linkage rules, e.g. compare
skos:prefLabel with rdfs:label using Levensthein
distance, link if distance < 2
• Let Silk run to find the links
14. Results
• Contextualised all datasets that are currently ingested
-> no qualitative analysis so far
• increased the number of existing links by 20%
(performance requirement)
• Different amounts of links were found
– Dingler (UBER) 134 unique links
– Deutsches Textarchiv (BBAW) 9946 unique links
• Potential to find more links
1418.06.2014
19. Challenges
• In most cases, only a prefered label is available
– Nancy France vs. Nancy Kentucky
• Very specific rules for different spellings/abbreviations
required
– Frankfurt am Main vs. Frankfurt a.M. vs. Frankfurt a/M
• Unstructured data is not captured
1918.06.2014
20. • Place: Wren Library, Trinity College Cambridge
• Agent: Georg Tanner, Maximilian II.
Unstructured Data
2018.06.2014
22. Applicability and Reuseability
• Created linkage rules can be reused but an adaption
might be necessary
• Knowledge about the Silk framework and the similarity
functions is required
• Access to the datasets is required (as dump or in a
triplestore)
• Quality of the links is not ensured
2218.06.2014
23. Future Work
• Evaluation of the detected links
– Iterative process to improve the links
• Can we use existing information, e.g. already known
connections to strenghen/weaken links?
• Questions that can be answered based on the links?
– Where have the resources been published?
– MarineLinves – Map of the ship routes
2318.06.2014