Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Melinda: Methods and tools for Web Data Interlinking
1. Introduction Framework Tools Application Conclusions
Melinda
Methods and tools for Web data Interlinking
François Schare
December
@ STI Innsbruck
3. Introduction Framework Tools Application Conclusions
Publishing datasets on the Web
Four publication principles
1 Resources are identied by URIs.
2 URIs are dereferenceable.
3 When a URI is dereferenced, a description of the identied
resource should be returned, ideally adapted through content
negotiation.
4 Published Web datasets must contain links to other Web
datasets.
4. Introduction Framework Tools Application Conclusions
Interlinking datasets
Links are contained in specic datasets
http://www.example.org/linkset/DBPedia-MB
a void:Linkset ;
void:target http://www.dpbedia.org;
void:target http://www.musicbrainz.org;
http://www.example.org/linkset/DBPedia-MB
http://www.dbpedia.org/resource/
Johann_Sebastian_Bach
owl:sameAs
http://www.musicbrainz.org/artist/
24f1766e-9635-4d58-a4d4-9413f9f98a4c .
5. Introduction Framework Tools Application Conclusions
Web Data Cloud
6. Introduction Framework Tools Application Conclusions
Goodie : Open Data's coming up
data.gov, US Data Act
data.gov.co.uk, Sir TBL on the track
Other intitiatives around : from the EU, Open Data intitiatives
7. Introduction Framework Tools Application Conclusions
What do we do ?
We propose a framework capturing the various data
interlinking methods
We study existing tools and position them in the framework
We propose an architecture allowing to articulate ontology
alignment and interlinking tools
8. Introduction Framework Tools Application Conclusions
General approach
owl :sameAs
URI1 URI2
Data interlinking
Fig.: The data interlinking problem.
9. Introduction Framework Tools Application Conclusions
Manual resource alignement
owl :sameAs
URI1 URI2
URI transformation
Fig.: URI transformation.
10. Introduction Framework Tools Application Conclusions
Matching identiers - Exemple
owl:sameAs
http://www.lastfm.fr/music/ http://dbpedia.org/resource/
Johann+Sebastian+Bach Johann_Sebastian_Bach
URI alignment
Fig.: URI transformation exemple
11. Introduction Framework Tools Application Conclusions
Datasets sharing a common ontology
owl :sameAs
Resource
URI1 matching of URI2
datasets described
by the same
ontology
O1
Fig.: Matching two datasets described according to the same ontology.
12. Introduction Framework Tools Application Conclusions
Datasets sharing a common ontology - Exemple
mo:MusicArtist
type type
DBPedia Musicbrainz
URI1 URI2
first last first last
Johann- Jean-
Sebastian Bach Sébastien Bach
Resource matching algorithm,
datasets described according
to a common ontology
Fig.: Matching data sharing a common ontology
13. Introduction Framework Tools Application Conclusions
Matching datasets having heterogeneous ontologies
owl :sameAs
Resource
URI1 matching of URI2
datasets described
by dierent
ontologies
O1 Implicit alignment O2
Fig.: Two datasets matched using an implicit alignment.
14. Introduction Framework Tools Application Conclusions
Exemple
OpenCyc Musicbrainz
Classical Music Performer mo:MusicArtist
type type
URI1 URI2
English ID givenname
Jean-Sébastien
Johann
Sebastian name
Bach Bach
15. Introduction Framework Tools Application Conclusions
General interlinking framework
owl :sameAs
URI1 URI2
Data interlinking
O1 Alignment O2
Ontology matching
Fig.: General framework for data interlinking involving ontology matching.
16. Introduction Framework Tools Application Conclusions
Processes and specications
process result
instance link specication linkset
class matcher alignment
Tab.: Matching process, interlinks, and their results.
17. Introduction Framework Tools Application Conclusions
Analysis criterion
Degree of Automation
Is the tool completely automatic ?
Does the tool need to be parametrized by the user ? What kind
of parameters (data matching techniques, ontology
alignment) ?
Used matching techniques
String matching ?
External functions (values conversion, data transformations) ?
Similarity propagation ?
Other techniques ?
Domain : Is the tool specic for a given domain ?
18. Introduction Framework Tools Application Conclusions
Analysis criterion
Ontologies
Does the tool take into account ontologies associated to the
datasets ?
Does the tool allow to interlink datasets described according
to dierent ontologies ?
In the case the ontologies dier, does the tool perform
ontology alignment ?
Output
What does the tool produce in output ?
Does the tool propose to merge the two input datasets ?
Postprocessing Does the tool perform any post-processing
operations ?
19. Introduction Framework Tools Application Conclusions
Six interlinking tools
RKB-CRS Coreference resolution service of the RKB RDF
Knowledge Base.
LD-mapper Interlinking tool for the music ontology MO.
ODD Linker Interlinking tool based on SQL record matching.
RDF-AI Interlinking and data fusion tool.
Silk et Silk LSL Interlinking tool and link specication language.
Knofuss architecture Outil Interlinking and data fusion tool with
ontology alignment.
20. Introduction Framework Tools Application Conclusions
Six interlinking tools
owl:sameAs
URI 1 URI 2
Resource comparison method
LD-Mapper
ODD-Linker
RKB-CRS
Implicit Explicit
O1 Alignment Alignment O2
Silk
RDF-AI Knofuss
Ontology
Matching
System
Fig.: Tools positioned in the dened framework
22. Introduction Framework Tools Application Conclusions
Application
The alignment implicitely contained in the link specication.
align:map [ :map2 a align:Cell;
align:entity1 [ a align:Property;
edoal:and dbpedia:populationTotal.
:dbp-geo a align:Alignment; edoal:and [ a edoal:PropertyDomainRestriction;
align:onto1 http://dbpedia.org/ontology/; edoal:domain dbpedia:City.
align:onto2 http://www.geonames.org/ontology#; ];
align:map [ :map1 a align:Cell; align:entity2 [ a align:Property;
align:entity1 dbpedia:City; edoal:and gn:population;
align:entity2 gn:P; edoal:and [ a edoal:PropertyDomainRestriction;
align:relation align:subsumedBy. edoal:domain gn:P. ];
]; align:relation align:equivalent.
align:map [ :map2 a align:Cell; ];
align:entity1 dbpedia:populationTotal; align:map [ :map2 a align:Cell;
align:entity2 gn:population; align:entity1 [ a align:Property;
align:relation align:equivalent. edoal:and rdfs:label.
]; edoal:and [ a edoal:PropertyDomainRestriction;
align:map [ :map3 a align:Cell; edoal:domain dbpedia:City.
align:entity1 rdfs:label; ];
align:entity2 gn:name; align:entity2 [ a align:Property;
align:relation align:equivalent. edoal:and gn:name;
]. edoal:and [ a edoal:PropertyDomainRestriction;
edoal:domain gn:P. ];
align:relation align:equivalent.
].
23. Introduction Framework Tools Application Conclusions
Application
Using the alignment, the link specication can be simplied.
UseAlignment rdf:resource=#dbp-geo /
Interlink id=cities
LinkTypeowl:sameAs/LinkType
LinkCell rdf:resource=#map1 /
LinkCondition
AVG
Compare metric=jaroSimilarity
CellParam rdf:resource=#map2 /
/Compare
Compare metric=numSimilarity
CellParam rdf:resource=#map3 /
/Compare
/AVG
/LinkCondition
Thresholds accept=0.9 verify=0.7 /
Output acceptedLinks=accepted_links.n3
verifyLinks=verify_links.n3
mode=truncate /
/Interlink
24. Introduction Framework Tools Application Conclusions
Conclusions
We propose a framework for data interlinking on the Web of
data.
We have presented existing tools and positioned them wrt the
framework.
We propose a simplication of the interlinking task and
demonstrate it on an example.
Our current work goes towards more interoperability for link
specications :
Is it possible to construct more generic link specications ? ie
attached to datasets or ontologies
Is it possible to automatically nd out the key properties
allowing to identify matching pairs ?
25. Introduction Framework Tools Application Conclusions
For more
http://melinda.inrialpes.fr
François Schare et Jérôme Euzenat. Linked data meets
ontology matching : enhancing data interlinking through
ontology alignments. (submitted WWW'2010).