The document presents Instance-Based Ontology Matching by Instance Enrichment (IBOMbIE), a technique for ontology matching that uses instance data. IBOMbIE enriches instances from one ontology with attributes from matching instances in another ontology to generate dually annotated instances for comparison. The document outlines IBOMbIE's approach, experiments comparing different instance similarity measures and parameters, and results showing IBOMbIE performed comparably to other ontology matching techniques while being faster.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Instance-based Ontology Matching by Instance Enrichment
1. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Instance-Based Ontology Matching
By Instance Enrichment
Balthasar A.C. Schopman
–
supervisors:
Antoine Isaac
Shenghui Wang
Stefan Schlobach
Vrije Universiteit Amsterdam
June 29, 2009
2. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Outline
1 Ontology matching
2 Instance-based OM
3 IBOMbIE
4 Experiments
5 Comparison other OM
6 Conclusions
3. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Research questions
General research questions:
How do different algorithm design options of
IBOMbIE influence the final result?
How does the performance of IBOMbIE relate to other OM
algorithms?
4. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Questions from the audience
Crucial questions: please interrupt me.
Other questions: after presentation please.
5. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology
Definition of an ontology1 :
An ontology typically (1) defines a vocabulary relevant in
a certain domain of interest, (2) specifies the meaning of
terms and (3) specifies relations between terms.
Ontologies:
controlled vocabulary
thesaurus
database schema
canonical semantic web ontology: a set of typed, interrelated
concepts defined in a formal language
1
by Euzenat and Shvaiko
6. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology
Definition of an ontology1 :
An ontology typically (1) defines a vocabulary relevant in
a certain domain of interest, (2) specifies the meaning of
terms and (3) specifies relations between terms.
Ontologies:
controlled vocabulary
thesaurus
database schema
canonical semantic web ontology: a set of typed, interrelated
concepts defined in a formal language
1
by Euzenat and Shvaiko
7. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology Matching (OM)
Ontologies ...
facilitate interoperability between parties
do not solve heterogeneity problem, but raise it to a higher
level: the OM level
Elementary OM techniques:
terminological
structure-based
semantic-based
instance-based
8. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Ontology Matching (OM)
Ontologies ...
facilitate interoperability between parties
do not solve heterogeneity problem, but raise it to a higher
level: the OM level
Elementary OM techniques:
terminological
structure-based
semantic-based
instance-based
9. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Instance-based OM (IBOM)
Variants IBOM:
1 use dually annotated instances (DAI)
2 create DAI
3 use extension of concepts (DAI not required)
General pros and cons:
Con: does not deduce specific relations
Con: suitable instances rarely available
Pro: focus on active part of ontology
Pro: able to deal with ambiguous linguistic phenomena:
synonym, homonym
10. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Introduction
Instance-based OM (IBOM)
Variants IBOM:
1 use dually annotated instances (DAI)
2 create DAI
3 use extension of concepts (DAI not required)
General pros and cons:
Con: does not deduce specific relations
Con: suitable instances rarely available
Pro: focus on active part of ontology
Pro: able to deal with ambiguous linguistic phenomena:
synonym, homonym
11. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Definitions of ‘instance of’-relation
Example definitions:
Canonical semantic web definition
Library definition
someone:Peter
foaf:name foaf:knows
rdf:type
"Peter" someone:Nate
foaf:Person
12. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Definitions of ‘instance of’-relation
Example definitions:
Canonical semantic web definition
Library definition
ontology /
vocabulary object o1
c1 c1
c2
c3
object o2
... c1 c2
c3
...
13. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Intro
Application
Two library scenarios: KB and TEL
match controlled vocabularies
data-sets: book catalogs
multi-lingual
14. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c1
c2
15. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c1
c2
16. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c1
c2
17. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
IBOM: measuring similarity
c1
c2
18. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Jaccard coefficient
Jaccard coefficient:
|i1 ∩ i2 |
J(c1 , c2 ) =
|i1 ∪ i2 |
quantifies the overlap of the extension of concepts
→ relatedness between concepts
Con: no multi-sets
19. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Jaccard coefficient
Jaccard coefficient:
|i1 ∩ i2 |
J(c1 , c2 ) =
|i1 ∪ i2 |
quantifies the overlap of the extension of concepts
→ relatedness between concepts
Con: no multi-sets
20. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAI
If DAI unavailable:
exact instance matching → merge annotations
approximate instance matching → enrich instances
21. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IBOM
Creating dually annotated instances (DAI)
Jaccard needs DAI
If DAI unavailable:
exact instance matching → merge annotations
approximate instance matching → enrich instances
22. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Instance matching
Approximate instance matching
Instance similarity measures:
Lucene
vector space model (VSM)
23. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
Basic instance enrichment (IE)
data-set D1 data-set D2
i i
i1 i2
a b match A B
i i
24. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
Basic instance enrichment (IE)
data-set D1 data-set D2
i i
i1 i2
a b A B
i i
A B
25. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D1 data-set D2
i i2
i1
1st A B
a b match
i3
2nd D
match
i i4
3rd
A C
match
26. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D1 data-set D2
i i2
i1
A B
a b i3
A B D
i i4
A C
27. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D1 data-set D2
i i2
i1
A B
a b i3
A B D
i i4
D
A C
28. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: topN
data-set D1 data-set D2
i i2
i1
A B
a b i3
A B D
i i4
D
A C
A C
29. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1 data-set D2
i i2
i1
sim(i1,i2) A B
a b = 0.8
i3
sim(i1,i3)
D
= 0.4
i i4
sim(i1,i4) A C
= 0.2
30. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1 data-set D2
i i2
i1
sim(i1,i2) A B
a b = 0.8
i3
A B sim(i1,i3)
D
= 0.4
i i4
sim(i1,i4) A C
= 0.2
31. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1 data-set D2
i i2
i1
sim(i1,i2) A B
a b = 0.8
i3
A B sim(i1,i3)
D
= 0.4
i i4
D
sim(i1,i4) A C
= 0.2
32. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Enriching instances
IE parameter: similarity threshold (ST)
data-set D1 data-set D2
i i2
i1
sim(i1,i2) A B
a b = 0.8
i3
A B sim(i1,i3)
D
= 0.4
i i4
D
sim(i1,i4) A C
A C = 0.2
33. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Experimental questions
Experimental questions
Instance similarity measure
topN parameter
ST parameter
combining topN + ST parameters
performance as compared to other OM algorithms
34. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation
Alignment evaluation
Methods:
Gold standard := good alignment
Reindexing
Measures:
Precision
Recall
f-measure
35. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: instance similarity measure - quality
1 1
P VSM P VSM
R VSM R VSM
F VSM F VSM
P Lucene P Lucene
R Lucene R Lucene
0.8 F Lucene 0.8 F Lucene
0.6 0.6
performance
performance
0.4 0.4
0.2 0.2
0 0
10 100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06
mapping rank mapping rank
(a) Gold standard (b) Reindex
Virtually equal
41. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Results of experiments
Results: combining parameters
Using both parameters performs good in TEL, not in KB...
possibly due to:
more selective IBOMbIE pays off in TEL, because vocabularies
+ instance annotations are more different than in KB scenario.
0.4 0.3
baseline baseline
topN=1 ST=mu-0.5s topN=1 ST=mu-0.5s
topN=1 ST=mu topN=1 ST=mu
0.35 topN=1 ST=mu+0.5s topN=1 ST=mu+0.5s
topN=2 ST=mu-0.5s 0.25 topN=2 ST=mu-0.5s
topN=2 ST=mu topN=2 ST=mu
topN=2 ST=mu+0.5s topN=2 ST=mu+0.5s
0.3 topN=3 ST=mu-0.5s topN=3 ST=mu
topN=3 ST=mu topN=3 ST=mu+0.5s
0.2
0.25
f-measure
f-measure
0.2 0.15
0.15
0.1
0.1
0.05
0.05
0 0
100 1000 10000 100000 1e+06 100 1000 10000 100000 1e+06
mapping rank mapping rank
(m) KB (n) TEL
(evaluation method: reindexing)
42. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
OAEI
Ontology alignment evaluation initiative (OAEI)
terminol- structure- semantic- instance-
ogical based based based
DSSim #
Lily #
TaxoMap #
IBOMbIE # # #
DSSim, Lily and TaxoMap:
consider KB ontologies “huge”
feature functionality to deal with large ontologies
43. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
OAEI
Performance comparison: quality
0.8
P IBOMbIE topN=1
R IBOMbIE topN=1
P DSSim
0.7 R DSSim
P Lily
R Lily
P TaxoMap
0.6 R TaxoMap
0.5
performance
0.4
0.3
0.2
0.1
0
0 2000 4000 6000 8000 10000
mapping rank
45. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Conclusions + discussion
IBOMbIE algorithm is quite promising:
Relatively low run-time
Able to deal with large ontologies
Amount + quality of mappings
Pros of IBOM
Able to align ontologies using disjunct data-sets
Basic instance enrichment appears best performing method.
Possible cause: Jaccard coefficient does not support multi-sets.
46. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Fin
Thank you... any questions ?
47. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Vocabularies
vocabulary size
KB GTT 35K
Brinkman 5K
TEL LCSH 340K
Rameau 155K
SWD 805K
48. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IE parameter: similarity threshold (ST)
D1 D2
annotated annotated
with with µ σ
KB O1 O2 0.297 0.106
O2 O1 0.279 0.101
TEL O1 O2 0.260 0.097
O2 O1 0.232 0.084
standard ST: µ
1
step-size: 2 σ
49. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
VSM
Weights are components of vectors:
term frequency - inverse document frequency: TF-IDF
e.g. audiovisual features
tfidfw ,d = tfw ,d ∗ idfw
√
nw ,d
tfw ,d =
|d|
|D|
idfw = log
|d ∈ D : w ∈ d|
VSM cosine similarity
n
d1 · d2 i =1 wi ,d1 wi ,d2
cosine sim(d1 , d2 ) = =
|d1 ||d2 | i wi2 1 i wi2 2
,d ,d
50. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation method: gold standard
Gold standard := good alignment
|{reference} ∩ {retrieved}|
P = precision =
|{retrieved}|
|{reference} ∩ {retrieved}|
R = recall =
|{reference}|
P ∗R
F = f − measure = 2 ∗
P +R
51. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
Evaluation method: reindexing
o_1 o_2
a x
b y
c z
instance i_dual instance i_dual
{a, b} {x, z}
reindex
{x} {a, b}
dually annotated instances |{reference}∩{retrieved}|
|{retrieved}|
P=
|{reindexed instances}|
dually annotated instances |{reference}∩{retrieved}|
|{reference}|
R=
52. Ontology matching Instance-based OM IBOMbIE Experiments Comparison other OM Conclusions
IbOM by IM algorithm overview
Whole algorithm
Start: two data-sets Dx and Dy
1 Enrich instances of Dx with annotations of instances of Dy
For every instance a:
1 Find N best matching instances {b} in Dy
2 Add annotations of {b} to a
2 Enrich vice versa
3 Merge data-sets into one dually annotated data-set
4 Apply Jaccard measure