This document outlines techniques for expanding and enriching knowledge graphs through data linking, key discovery, and link invalidation. It begins with an introduction to linked open data and knowledge graphs. The technical part discusses approaches to data linking, including instance-based methods that consider attribute similarity and graph-based methods that propagate similarity across object properties. Supervised methods use training data while rule-based methods apply expert-defined rules. Evaluation measures effectiveness using recall, precision and F1-score, and efficiency. The document also covers similarity measures and techniques for knowledge graph expansion and enrichment.
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Discover Hidden Connections with Knowledge Graph Linking
1. KNOWLEDGE GRAPH EXPANSION
AND ENRICHMENT
LRI, UNIVERSITÉ PARIS SUD, CNRS, UNIVERSITÉ PARIS
SACLAY
TUTORIAL@BDA 2017
FATIHA SAÏS
1
2. OUTLINE
§ Linked Open Data
§ Knowledge Graphs
§ Technical Part
1. Data Linking
2. Key Discovery
3. Link invalidation/Requalification
§ Conclusion and Future Challenges
2
3. FROM THE WWW TO THE
WEB OF DATA
- applying the principles of the WWW to data
data is links,
not only properties
Linked Open Data
3
4. LINKED DATA
PRINCIPLES
① Use HTTP URIs as identifiers for resources
à so people can look up the data
② Provide data at the location of URIs
à to provide data for interested parties
③ Include links to other resources
àso people can discover more things
àbridging disciplines and domains
è Unlock the potential of island repositories
Tim Berners Lee, 2006
4
5. RDF – RESOURCE
DESCRIPTION FRAMEWORK
• Statements of < subject predicate object >
http://dbpedia.org/resource/Airbus
dbo:label
“Airbus”
Subject Predicate Object
… is called a triple
5
6. LINKED OPEN DATA
6
Linked Data - Datasets
under an open access
- 1,139 datasets
- over 100B triples
- about 500M links
- several domains
Ex. DBPedia : 1.5 B
triples
"Linking Open Data cloud diagram 2017, by Andrejs Abele, John
P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak.
http://lod-cloud.net/"
8. THE ROLE OF KNOWLEDGE IN AI
8
[Artificial Intelligence 47 (1991)]
The knowledge principle: “if a program is
to perform a complex task well, it must
know a great deal about the world in
which it operates.”
9. ONTOLOGY, A DEFINITION
9
“An ontology is an explicit, formal specification of a shared
conceptualization.”
[Thomas R. Gruber, 1993]
Conceptualization: abstract model of domain related expressions
Specification: domain related
Explicit: semantics of all expressions is clear
Formal: machine-readable
Shared: consensus (different people have different perceptions)
11. 1111
OWL ONTOLOGY
OWL – Web Ontology Language
• Represents rich and complex
knowledge about things
• Based on First Order Logic (FOL)
• Can be used to verify the consistency
of knowledge
• Can make implicit knowledge explicit
• Classes: concepts or collections of
objects (individuals)
• Properties:
• owl:DataTypeProperty (attribute)
• owl:ObjectProperty (relation)
• Hierarchy
• owl:subClassOf
• owl:subPropertyOf
• Individuals: ground-level of the
ontology (instances)
13. 1313
• Axioms: knowledge definitions in the ontology that were explicitly defined and have
not been proven true.
• Reasoning over an ontology
→ Implicit knowledge can be made explicit by logical reasoning
• Example:
Pompidou museum is an Art Museum
< Pompidou_museum rdf:type ArtMuseum> .
Pompidou museum contains Hallucination partielle
< Pompidou_museum ao:contains Hallucination_partielle> .
• Infer that:
è Pompidou museum is a CulturalPlace
< Pompidou_museum rdf:type CulturalPlace> .
Because: Museum subsumes ArtMuseum and CulturalPlace subsumes Museum
è Hallucination partielle is a Work
<Hallucination_partielle rdf:type ao:Work> .
Because: the range of the object property contains is the class Work.
OWL ONTOLOGY - AXIOMS
14. LINKED OPEN
VOCABULARIES (LOV)
Linked Open Vocabularies
• Keeps track of available open
ontologies and provides them
as a graph
• Search for available
ontologies, open for reuse
• Example:
http://lov.okfn.org/dataset/lov/vo
cabs/foaf
14
15. ONTOLOGY PREDICATES SPREAD
ON THE SEMANTIC WEB
• RDFa (or Resource Description Framework in Attributes)
• Top 50 web sites publishing Semantic Web data, clustered by
predicates used.
FOAF
16. OUTLINE
§ Linked Open Data
§ Knowledge Graphs
§ Technical Part
§ Data Linking
§ Key Discovery
§ Link invalidation/requalification
§ Conclusion and Future Challenges
16
17. WHO IS DEVELOPING
KNOWLEDGE GRAPHS?
17
2012
2013
2015 2016
2013
2007
2008
2007
2012
Academic side Commercial side
22. TOWARDS A KNOWLEDGE-
POWERED DIGITAL ASSISTANT
22
Cortana (Microsoft)
• Natural access and storage of knowledge
• Chat bots
• Personalization
• Emotion
24. KNOWLEDGE GRAPH: A
DEFINITION …
24
[L. Ehrlinger and W. Wöß, SEMANTiCS’2016]
[16] H. Paulheim. Knowledge Graph Refinement: A Survey of
Approaches and Evaluation Methods. Semantic Web Journal,
(Preprint):1–20, 2016.
[12] M. Kroetsch and G. Weikum. Journal of Web Semantics: Special
Issue on Knowledge Graphs.
http://www.websemanticsjournal.org/index.php/ps/
announcement/view/19 [August, 2016].
[17] J. Pujara, H. Miao, L. Getoor, and W. Cohen. Knowledge Graph
Identification. In Proceedings of the 12th International Semantic Web
Conference - Part I, ISWC ’13, pages 542–557, New York, USA,
2013. Springer.
[3] A. Blumauer. From Taxonomies over Ontologies to Knowledge
Graphs, July 2014. https://blog.semanticweb.at/2014/07/15/from-
taxonomies-over-ontologiesto-knowledge-graphs [August, 2016].
[7] M. Farber, B. Ell, C. Menne, A. Rettinger, and F.
Bartscherer. Linked Data Quality of DBpedia, Freebase,
OpenCyc, Wikidata, and YAGO. Semantic Web Journal,
2016. http://www.semantic-web-journal. net/content/linked-
data-quality-dbpedia-freebaseopencyc-wikidata-and-yago
[August, 2016] (revised version, under review).
à Populated
Ontology
à Populated
Ontology
à RDF Graph
à RDF Graph
à Extracted
RDF Graph
25. KNOWLEDGE GRAPH (KG)
dbo:Painting
dbo:Museum
dbo:museum-of
dbo:Artist
dbo:author
dbo:city
dbo:PopulatedPlace
Thing
dbo:Person
dbo:Agent
owl:equivalentClass(dbo:Municipality, dbo:Place)
owl:equivalentClass(dbo:Place, dbo:Wikidata:Q532)
owl:equivalentClass(dbo:Village, dbo:PopulatedPlace)
owl:equivalentClass(dbo:PopulatedPlace,dbo:Municipality)
owl:disjointClass(dbo:PopulatedPlace, dbo:Artist)
owl:disjointClass(dbo:PopulatedPlace, dbo:Painting)
owl:FunctionalProperty(dbo:city)
owl:InverseFunctionalProperty(dbo:museum-of)
dbo:birthPlace(X, Y) => dbo:citizsenOf(X, Y)
dbo:parentOf(X, Y) => dbo:child(Y, X)
is-a is-a
is-a
is-a
is-a
Ontology hierarchy
Ontology axioms and rules
PREFIX dbo: <http://dbpedia.org/ontology#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?m ?p
WHERE { ?m rdf:type dbo:Museum . ?m dbo:musuem-of ?p .}
Querying (SPARQL)
- KG saturation: infer whatever can be inferred from the
KG.
- KG consistency checking: no contradictions
- KG repairing
- …
Reasoners: (Pellet, Fact++, Hermit, etc. )
RDF Graphs
25
26. KNOWLEDGE GRAPH EXPANSION
AND ENRICHMENT
• Expansion: knowledge graphs are incomplete
• Data linking (entity resolution, duplicate detection, reference
reconciliation)
• Link prediction: add relations
• Ontology matching: connect graphs
• Missing values prediction/inference
• Validation: knowledge graphs may contain errors
• Link invalidation
• Dealing with errors and ambiguities
• Enrichment: can new knowledge emerge from knowledge graphs?
• Knowledge discovery
• Automatic reasoning and planning
26
27. OUTLINE
§ Linked Open Data
§ Knowledge Graphs
§ Technical Part
1. Data Linking
2. Key Discovery
3. Link invalidation/requalification
§ Conclusion and Future Challenges
27
29. DATA LINKING
29
• Data linking or Identity link detection consists in detecting whether two descriptions of
resources refer to the same real world entity (e.g. same person, same article, same gene).
30. 30
• Data linking or Identity link detection consists in detecting whether two descriptions of
resources refer to the same real world entity (e.g. same person, same article, same gene).
DATA LINKING
31. DATA LINKING: DIFFICULTIES
31
Different
Vocabularies
Misspelling errorsIncomplete Information :
- date and place of birth ?
- museum phone number ?
- …. ?
• Data linking or Identity link detection consists in detecting whether two descriptions of
resources refer to the same real world entity (e.g. same person, same article, same gene).
32. IDENTITY LINK DETECTION
PROBLEM
• Identity link detection consists in detecting whether two descriptions of
resources refer to the same real world entity (e.g. same person, same article,
same gene).
• Definition (Link Discovery)
• Given two sets U1 and U2 of resources
• Find a partition of U1 x U2 such that :
• S = {(u1,u2) ∈ u1 × u2: owl:sameAs(s,t)} and
• D = {(u1,u2) ∈ u1 × u2: owl:differentFrom(s,t)}
• A method is said total when (S È D) = (U1 x U2)
• A method is said partial when (S È D) Ì (U1 x U2 )
• Naïve complexity ∈ O(U1 × U2), i.e. O(n2)
32
33. Problem which exists since the data exists … and under different
terminologies: record linkage, entity resolution, data cleaning,
object coreference, duplicate detection, ….
Record linkage: used to indicate the
bringing together of two or more separately
recorded pieces of information concerning
a particular individual or family.
[NKAJ, Science 1959]
SOME OF HISTORY …
33
35. DATA LINKING IS MORE COMPLEX
FOR GRAPHS THAN TABLES (WHY?)
Databases Semantic Web
Schema/Ontologies Same schema Possibly different ontologies in the same
dataset
Multiple types Single relation Several classes
Open World
Assumption
NO YES
UNA-Unique Name
Assumption
Yes May be no
Data volume XX Thousands XX Millions/Billions
(e.g., DBpedia has 1.5 billion triples)
Multiple values for a
property
NO YES
P1 hasAuthor “Michel Chein”
P1 hasAuthor “Marie-Christine Rousset”
35
• Can propagate similarity decisions è more expensive but better performance
• Can be generic and use domain knowledge, e.g. ontology axioms
36. DATA LINKING APPROACHES:
DIFFERENT CONTEXTS
• Datasets conforming to the same ontology
• Datasets conforming to different ontologies
• Datasets without ontologies
36
37. DATA LINKING APPROACHES
• Instance-based approaches: consider only data type properties
(attributes)
• Graph-based approaches: consider data type properties
(attributes) as well as object properties (relations) to propagate
similarity scores/linking decisions (collective data linking)
• Supervised approaches: need an expert to build samples of
linked data to train models (manual and interactive approaches)
• Rule-based approaches: need knowledge to be declared in the
ontology or in other format given by an expert
37
38. DATA LINKING APPROACHES
• Instance-based approaches: consider only data type properties
(attributes)
• String comparison
38
m2
architect
address
“Saadiyat Island, Abu
Dhabi”
Jean_Nouvel
“Nov. 8th 2017”
S2
architect
architect
m1
address
Jacques_Lemercier
category
Art_Antiquity
Art_Museum
category
S1
created
Luis_le_Vau
architect
Ange_Jacues_Gabriel
created
“1793”
“99 rue de Rivoli, Paris”
=?
=?
“Louvre Abou Dabi”
“Musée du Louvre”
=?
39. DATA LINKING APPROACHES
• Graph-based approaches:
• consider data type properties (attributes) as well as
• object properties (relations) to propagate similarity scores/linking decisions
(collective data linking)
39
m2
architect
address
“Saadiyat Island, Abu
Dhabi”
Jean_Nouvel
“Nov. 8th 2017”
architect
architect
m1
address
Jacques_Lemercier
category
Art_Antiquity
Art_Museum
category
created
Luis_le_Vau
architect
Ange_Jacues_Gabriel
created
“1793”
“99 rue de Rivoli, Paris”
=?
=?
“Louvre Abou Dabi”
“Musée du Louvre”
=?
=?
=?
=?
=?
S2
S1
40. DATA LINKING APPROACHES
• Supervised approaches: need an expert to build samples of identity
links to train models (manual and interactive approaches)
40
Examples
of identity
links
S1S2
Learning of parameters,
similarity functions,
thresholds, …
Identity link detection Identity links
41. DATA LINKING APPROACHES
• Rule-based approaches: need knowledge to be declared in
the ontology or in other format given by an expert
• homepage(w1, y) ∧ homepage(w2, y) è sameAs(w1, w2)
• sameAs(Restaurant11, Restaurant21)
• sameAs(Restaurant12, Restaurant22)
• sameAs(Restaurant13, Restaurant23)
41
… homepage
Restaurant11 www.kitchenbar.com
Restaurant12 www.jardin.fr
Restaurant13 www.gladys.fr
Restaurant14 …
homepage …
www.kitchenbar.com Restaurant21
www.jardin.fr Restaurant22
www.gladys.fr Restaurant23
… Restaurant24
SameAS
SameAS
SameAS
42. DATA LINKING APPROACHES:
EVALUATION
• Effectiveness: evaluation of linking results in terms of recall and
precision
• Recall = (#correct-links-sys) /(#correct-links-groundtruth)
• Precision = (#correct-links-sys) /(#links-sys)
• F-measure (F1) = (2 x Recall x Precision) / (Recall +Precision)
• Efficiency: in terms of time and space (i.e. minimize the linking
search space and the interaction actions with an expert/user).
• Robustness: override errors/mistakes in the data
• Use of benchmarks, like those of OAEI (Ontology Alignment
Evaluation Initiative) or Lance
42
43. SIMILARITY MEASURES
• Token based (e.g. Jaccard, TF/IDF cosinus) :
The similarity depends on the set of tokens that appear in both S and T.
è Efficient, but sensitive to spelling errors
• Edit based (e.g. Levenstein, Jaro, Jaro-Winkler) :
The similarity depends on the smallest sequence of edit operations which
transform S into T.
è Less efficient, may deal with spelling errors, but sensitive to word order
• Hybrids (e.g. N-Grams, Jaro-Winkler/TF-IDF, Soundex)
43
For more details: William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg. 2003. A comparison of string
distance metrics for name-matching tasks. In Proceedings of the 2003 International Conference on Information
Integration on the Web (IIWEB'03), Subbarao Kambhampati and Craig A. Knoblock (Eds.). AAAI Press 73-78.
45. FRAMEWORK SILK [Volz et al'09]
• Provides a Link Specification Language(LSL)
• Allows specifying linking conditions between two datasets
• The linking conditions may be expressed in terms of:
• Elementary similarity measures (e.g., Jaccard, Jaro) and
• Aggregation functions (e.g. max, average) of the similarity scores
45
51. KNOFUSS (INSTANCE-BASED,
UNSUPERVISED) [Nikolov et al’12]
• Learns linking rules using genetic algorithms:
Sim(i1, i2) = fag(w11sim11 (V11,V21), …wmnsimmn (V1m,V2n))
• Fag : aggregation function for the similarity scores
• simij: similarity measure between values V1i and V2j
• wij: weights in [0..1]
• Assumptions:
• Unique name assumption (UNA), i.e., two different URIs refer to two
different entities.
• Good coverage rate between the two datasets
• Normalized similarity scores in [0..1]
51
53. GRAPH-BASED AND
INTERACTIVE APPROACH
53
KANG ET AL.: INTERACTIVE ENTITY RESOLUTION IN RELATIONAL DATA: A VISUAL ANALYTIC TOOL AND ITS EVALUATION 1001
Fig. 1. D-Dupe consists of three coordinated windows: the potential duplicate viewer on the left, the relational context viewer on the upper right
corner, and the data detail viewer on the lower right corner. The potential duplicate viewer shows the list of potential duplicate author pairs that were
D-Dupe
[Kang et al’08]
54. LN2R: A LOGICAL AND
NUMERICAL METHOD FOR
REFERENCE RECONCILIATON
54
[Saïs et al’07, Saïs et al’09]
55. LN2R
(GRAPH BASED, UNSUPERVISED AND INFORMED)
[Saïs et al’07, Saïs et al’09]
• A combination of two methods:
• L2R, a Logical method for reference reconciliation: applies
logical rules to infer sure owl:sameAs and
owl:differentFrom links
• N2R, a Numerical method for reference reconciliation:
computes similarity scores for each pair of references
• Assumptions
• The datasets are conforming to the same ontology
• The ontology contains axioms
55
56. Ontology axioms
• Disjunction axioms between classes, DISJOINT(C, D)
• Functional properties axioms, PF(P)
• Inverse functional properties axioms, PFI(P)
• A set of properties that is functional or inverse functional axioms
Assumptions on the data
• Unique Name Assumption, UNA(src1)
• Local Unique Name Assumption, LUNA(R)
Example:
56
Authored(p, a1), Authored(p, a2), Authored(p, a3) …., Authored(p, an)
è (a1 ¹a2), (a1 ¹ a3), (a2 ¹ a3) , …
LN2R
(GRAPH BASED, UNSUPERVISED AND INFORMED)
[Saïs et al’07, Saïs et al’09]
57. N2R: A NUMERICAL METHOD FOR
REFERENCE RECONCILIATION
57
[Saïs et al’09]
58. N2R: A NUMERICAL METHOD FOR
REFERENCE RECONCILIATION
• N2R computes a similarity score for pair of references obtained from
their common description.
• Uses known similarity measures, e.g. Jaccard, Jaro-Winkler.
• Exploits ontology knowledge in a way to be coherent with L2R.
• May consider the results of L2R: Reconcile(i, i’), ¬Reconcile(i, i’) ,
SynVals(v, v’) and ¬SynVals(v, v’).
58
[Saïs et al’09]
62. N2R: RESULTS ON CORA
Trec=1, all the reconciliations obtained by L2R are also obtained by N2R.
Trec=1 to Trec=0.85, the recall increases of 33 % while the precision decreases
only of 6 %.
Trec = 0.85, the F-measure is of 88 %:
• Better than the results obtained by the supervised method of [Singla and Domingos’05]
• Worst than those (97 %) obtained by the supervised method of [Dong et al.’05]
62
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1 0,95 0,9 0,85 0,8 0,75 0,7 0,65 0,6 0,55
Trec
Rappel
Precision
F-Mesure
[Saïs et al’09]
65. ONTOLOGY MATCHING
• Ontology alignment [Shvaiko,Euzenat13]: computes a set
A of mappings between elements (classes, properties) of two
ontologies O1 and O2:
f(O1,O2)=A
• The relations that are used to express a mapping can be:
owl:equivalentClass, owl:equivalentProperty,
rdfs:subClassOf, skos:closeMatch, skos:broader, etc.
• Example: A={(
owl:equivalentClass( http://dbpedia.org/ontology/City,
http://schema.org/City, 0.8)}
65
66. KINDS OF
INFORMATION
• Terminology: lexical information describing the ontology
elements (i.e. labels, comments, …)
• Example: Voie vs Voie souterraine
• Structure: hierarchy of classes and properties
(relations/attributes)
• Example: the sub-classes of Voie are very similar to the
sub-classes of Route
• Extension: the existence of common instances!!
66
67. PARIS [Suchanek et al. 12]
• Objective: instance-based ontology alignment and data linking (graph-
based, unsupervised and probabilistic)
• Inputs: two populated RDFS ontologies with UNA (two different URI
refer to two different entities)
• Principle:
• Compute the similarities between literal values (“12 cm”=“12”)
• Iterate (1) and (2) until a fix-point :
① Compute the probability that two instances are linked
① Compute the probabilities of subClassOf/subPropertyOf
67
P(Ci ⊆ Cj ), P(Pi ⊆ Pj )
P(i1 = i2)
68. • Property functionality degree (computed from data)
• The more a property is functional the more the probability of X=Y
will be.
• Local functionality: Fun(p,x) = 1 / #y:p(x, y)
• Global functionality: Fun(p) = (#x : ∃y:p(x,y)) / (#x,y : p(x,y))
• Example:
city(m1,Londres), city(m1,Orsay), city(m2,Tokyo)
Fun(city,m1)= ½ Fun(city,m2)=1
Fun(city)=2/3
è The same is done for inverse functionality (denoted fun-1) 68
PARIS [Suchanek et al. 12]
69. Link probability computation
• Positive evidence (P1): if there exists a property that is highly inverse
functional which has range values that are equal with a high probability
isbn(x,isbn1), isbn(x’,isbn2), P(isbn1=isbn2) = 1, fun-1(isbn)=1 …
P1(x=x’) = 1 - ((1 - (1.1)) . …) = 1-(0. …) = 1
• Negative evidence (P2): if there exists a property that is highly
functional which has range values for the probability to be equal is very low.
• Combination : P(x=x’) = P1(x=x’).P2(x=x’)
69
P1(x = x') =1− (1− Fun−1
r(x,y)
r(x',y')
∏ (p).P(y = y'))
PARIS [Suchanek et al. 12]
70. • The probabilities of the existence of a subsumption mapping between
properties and between classes are also computed
• It is based on the proportion of common instances comparing to the
number of instances of the general class
• To compute these probabilities, the probabilities of the existence of a
sameAs link between instances are exploited.
70
P(C ⊆ C') = #(C∩C') /#C
P(p ⊆ p') = #(p∩ p') /# p
PARIS [Suchanek et al. 12]
71. PARIS - EXPERIMENTS
Ontology #Instances #Classes #Relations
Yago 2 795 289 292 206 67
Dbpedia 2 365 777 318 1 109
71
Instances Classes Relations
Précision Rappel F-Mesure Yago Í DBp
Précision
DBpÍ Yago
Précision
Yago Í DBp
Précision
DBpÍ Yago
Précision
90% 73% 81% - - 100% 92%
90% 73% 81% 94% 84% 100% 92%
Instances: DBPedia and Yago uses the URIs of Wikipedia (recall and precision
possible)
Classes/properties: sampling + expert
5h00 to compute the linking probabilities for instances in one iteration (2h for the
classes and 20 minutes for the properties)
Linking or mapping if the probability >0.4
[Suchanek et al. 12]
72. SUMMARY
Data linking: numerous and different approaches …
• Supervised approaches: needs samples of linked data
à It can be avoided by using assumptions like (UNA)
• Graph-based approaches: decision propagation (good recall but highly time
consuming)
• Logical approaches: good precision but partial
è Few approaches generate differentFrom(i1,i2) or use dissimilarity evidence
• Informed approaches: need knowledge to be declared in the ontology
(generality) and/or ad-hoc knowledge given by an expert (a selection of
properties, similarity functions)
àThis kind of knowledge are not always available but can be
learnt/discovered from the data (e.g., key/rule discovery approaches)
[Symeonidou et al. 14, Symeonidou et al. 17, Galarraga et al. 13]
72
73. OUTLINE
§ Linked Open Data
§ Knowledge Graphs
§ Technical Part
1. Data Linking
2. Key Discovery
3. Link invalidation/requalification
§ Conclusion and Future Challenges
73
74. KEY DISCOVERY
PHD OF DANAI SYMEONIDOU IN COLLABORATION WITH:
NATHALIE PERNELLE (LRI, UNIV. PARIS SUD),
74
[Qualinca ANR project (2012-2016)]
77. DATA LINKING USING
RULES
Rules
• Logical Rules
• Ex. For instances of the class Restaurant
homepage(w1, y) ∧ homepage(w2, y) è sameAs(w1, w2)
{homepage}: discriminative property
• Complex Rules
• Ex. For instances of the class Restaurant
max(Jaccard(lat(w1, n);lat(w2, m));jaro(long(w1, x);long(w2, y)) > 0.8
èsameAs(w1, w2)
{lat, long}: discriminative property set
77
Rules contain discriminative properties => keys
78. KEYS
Key: A set of properties that uniquely identifies every instance in the data
78
FirstName LastName Birthdate Profession
Person1 Anne Tompson 15/02/88 Actor
Person2 Marie Tompson 02/09/75 Researcher
Person3 Marie David 15/02/85 Teacher
Person4 Vincent Solgar 06/12/90 Teacher
Is [FirstName] a key? ✖
Is [LastName] a key? ✖
79. KEYS
Key: A set of properties that uniquely identifies every instance in the data
79
FirstName LastName Birthdate Profession
Person1 Anne Tompson 15/02/88 Actor
Person2 Marie Tompson 02/09/75 Researcher
Person3 Marie David 15/02/85 Teacher
Person4 Vincent Solgar 06/12/90 Teacher
Is [FirstName] a key? ✖
Is [LastName] a key? ✖
Is [FirstName,LastName] a key? ✔
80. KEYS - KEY
MONOTONICITY
Key monotonicity: When a set of properties is a key, all its supersets
are also keys
Minimal Key: A key that by removing one property stops being a key
80
FirstName LastName Birthdate Profession
Person1 Anne Tompson 15/02/88 Actor
Person2 Marie Tompson 02/09/75 Researcher
Person3 Marie David 15/02/85 Teacher
Person4 Vincent Solgar 06/12/90 Teacher
✔Is [FirstName,LastName, Birthday] a key?
✖Is [FirstName,LastName, Birthday] a minimal key?
Minimal keys: [[FirstName,LastName],[FirstName,Profession] [Birthdate], [LastName,Profession]]
81. KEYS DECLARED BY EXPERTS
FOR DATA LINKING
Not an easy task:
• Experts are not aware of all the keys
Ex. {SSN}, {ISBN} easy to declare
Ex. {Name, DateOfBirth, BornIn} is it a key for the class Person?
• Erroneous keys can be given by experts
• As many keys as possible
• More keys => More linking rules
Goal: Discover keys automatically
81
82. OWL2 KEY, KEY IN
THE OPEN WORLD
OWL2 Key for a class: a combination of properties that uniquely identify
each instance of a class
• hasKey( CE ( OPE1 ... OPEm ) ( DPE1 ... DPEn ) )
:hasKey(Book(Author) (Title)) means:
Book(x1)∧Book(x2)∧Author(x1, y)∧Author (x2, y)∧Title(x1,w) ∧Title(x2, w)
è sameAs(x1, x2)
82
83. KEYS IN THE OPEN
WORLD
Key in the Open World
• Key discovery: For a set of properties that is a key, there is no instance that
shares values with another instances
83
lastName firstName hasFriend
Person1 Tompson Manuel Person2,
Person3
Person2 Tompson Maria,
Hellen
Person1
Person3 David George Person2,
Person4
Person4 Solgar Michel Person3
84. KEYS IN THE OPEN
WORLD
Key in the Open World
• Key discovery: For a set of properties that is a key, there is no instance that
shares values with another instances
• Ex. firstName is a key in the Open World since
• There do exist any people that share first names
84
lastName firstName hasFriend
Person1 Tompson Manuel Person2,
Person3
Person2 Tompson Maria,
Hellen
Person1
Person3 David George Person2,
Person4
Person4 Solgar Michel Person3
85. KEYS IN THE OPEN
WORLD
Key in the Open World
• Key discovery: For a set of properties that is a key, there is no instance that
shares values with another instances
• Ex. hasFriend is not a key in the Open World since
• There exist at least two people that share a friend
85
lastName firstName hasFriend
Person1 Tompson Manuel Person2,
Person3
Person2 Tompson Maria,
Hellen
Person1
Person3 David George Person2,
Person4
Person4 Solgar Michel Person3
86. KEYS IN THE CLOSED
WORLD
Key in the Closed World
• For each property that participate in a key, every set of values for a property
and an instance should be unique
86
lastName firstName hasFriend
Person1 Tompson Manuel Person2,
Person3
Person2 Tompson Maria Person1
Person3 David George Person2,
Person4
Person4 Solgar Michel Person3
87. KEYS IN THE CLOSED
WORLD
Key in the Closed World
• For each property that participate in a key, every set of values for a property
and an instance should be unique
• Ex. hasFriend is a key in the Closed World since
{Person2,Person3} <> {Person1} <> {Person2,Person4} <> {Person3}
87
lastName firstName hasFriend
Person1 Tompson Manuel Person2,
Person3
Person2 Tompson Maria Person1
Person3 David George Person2,
Person4
Person4 Solgar Michel Person3
89. KEY DISCOVERY
APPROACHES
CWA approaches
• Keys and Pseudo-Keys Detection for Web Datasets Cleaning
and Interlinking
• ROCKER: A Refinement Operator for Key Discovery
OWA approaches
• KD2R: a Key Discovery method for reference reconciliation
• SAKey: Scalable almost key discovery in RDF data
• Linkkey: Data interlinking through robust Linkkey extraction
• VICKEY: Conditional key discovery
89
90. KEY DISCOVERY
APPROACHES
CWA approaches
• Keys and Pseudo-Keys Detection for Web Datasets
Cleaning and Interlinking
• ROCKER: A Refinement Operator for Key Discovery
OWA approaches
• KD2R: a Key Discovery method for reference reconciliation
• SAKey: Scalable almost key discovery in RDF data
• Linkkey: Data interlinking through robust Linkkey extraction
• VICKEY: Conditional key discovery
90
91. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
91
x
P2P1 P3 P4
P1P2 P1P3 P1P4 P2P3 P2P4 P3P4
P1P2P3 P1P2P4 P2P3P4
P1P2P3P4
92. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
92
x
P1
93. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
93
x
P2P1
94. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
94
x
P2P1 P3
95. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Top-down approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
95
x
P2P1 P3 P4
96. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
96
x
P2P1 P3 P4
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
If P4 is a key => P1P4, P2P4,.., P1P2P3P4 are also keys
97. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
If P4 is a key => P1P4, P2P4,.., P1P2P3P4 are also keys
97
x
P2P1 P3 P4
P1P2 P1P3 P1P4 P2P3 P2P4 P3P4
98. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
If P4 is a key => P1P4, P2P4,.., P1P2P3P4 are also keys
98
x
P2P1 P3 P4
P1P2 P1P3 P1P4 P2P3 P2P4 P3P4
P1P2P3 P1P2P4 P2P3P4
99. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
Bottom-up approach that discovers
• Keys in the Closed World
• Pseudo keys - keys that tolerate exceptions
If P4 is a key => P1P4, P2P4,.., P1P2P3P4 are also keys 99
x
P2P1 P3 P4
P1P2 P1P3 P1P4 P2P3 P2P4 P3P4
P1P2P3 P1P2P4 P2P3P4
P1P2P3P4
100. KEYS AND PSEUDO-KEYS DETECTION FOR
WEB DATASETS CLEANING AND
INTERLINKING [Atencia et al. 12]
To verify if a set of properties is a key
• Partition instances according to their sharing values
• If each partition contains only one instances => Key
Key quality measures
• Support of a set of properties P:
• Discriminability of a set of properties P (pseudo-keys):
100
support ! =
# !"#$%"&'# !"#$%&'"! !" !
# !"" !"#$%"&'#
101. [ATENCIA ET AL. 12]- EXAMPLE
101
Name Actor Director ReleaseDate Website Language
film
1
“Ocean’s 11” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
film
2
“Ocean’s 12” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com “english”
film
3
“Ocean’s 13” “B. Pitt”
“G.
Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com “english”
film
4
“The
descendants”
“N. Krause”
“G.
Clooney”
“A. Payne” “15/9/11” --- “english”
film
5
“Bourne Identity“ “D. Liman” --- “12/6/12” www.bourneIdentity.co
m
“english”
102. [ATENCIA ET AL. 12]- EXAMPLE
102
Film3Film2Film1
Film5Film4
[Name]: key
support 5/5 =1
Discriminability = 5/5 =1
Partitions using [Name]
è
Name Actor Director ReleaseDate Website Language
film
1
“Ocean’s 11” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
film
2
“Ocean’s 12” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com “english”
film
3
“Ocean’s 13” “B. Pitt”
“G.
Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com “english”
film
4
“The
descendants”
“N. Krause”
“G.
Clooney”
“A. Payne” “15/9/11” --- “english”
film
5
“Bourne Identity“ “D. Liman” --- “12/6/12” www.bourneIdentity.com “english”
103. 103
Partitions using [Actor]
è
Name Actor Director ReleaseDate Website Language
film
1
“Ocean’s 11” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
film
2
“Ocean’s 12” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com “english”
film
3
“Ocean’s 13” “B. Pitt”
“G.
Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com “english”
film
4
“The
descendants”
“N. Krause”
“G.
Clooney”
“A. Payne” “15/9/11” --- “english”
film
5
“Bourne Identity“ “D. Liman” --- “12/6/12” www.bourneIdentity.co
m
“english”
Film4
Film3Film1
Film2
Film5
[Actor]: ???
[ATENCIA ET AL. 12]- EXAMPLE
104. 104
Partitions using [Actor]
è
Name Actor Director ReleaseDate Website Language
film
1
“Ocean’s 11” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
film
2
“Ocean’s 12” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com “english”
film
3
“Ocean’s 13” “B. Pitt”
“G.
Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com “english”
film
4
“The
descendants”
“N. Krause”
“G.
Clooney”
“A. Payne” “15/9/11” --- “english”
film
5
“Bourne Identity“ “D. Liman” --- “12/6/12” www.bourneIdentity.co
m
“english”
Film4
Film3Film1
Film2
Film5
[Actor]: pseudo key
Support = 5/5 =1
Discriminability = 3/4=0.75
[ATENCIA ET AL. 12]- EXAMPLE
105. 105
Partitions using
[Actor,Director]
è
Name Actor Director ReleaseDate Website Language
film
1
“Ocean’s 11” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
film
2
“Ocean’s 12” “B. Pitt”
“J.
Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com “english”
film
3
“Ocean’s 13” “B. Pitt”
“G.
Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com “english”
film
4
“The
descendants”
“N. Krause”
“G.
Clooney”
“A. Payne” “15/9/11” --- “english”
film
5
“Bourne Identity“ “D. Liman” --- “12/6/12” www.bourneIdentity.co
m
“english”
Film3Film2Film1
Film5Film4
[Actor, Director]: key
Support = 4/5 =0.8
Discriminability = 5/5=1
[ATENCIA ET AL. 12]- EXAMPLE
106. KEY DISCOVERY
APPROACHES
CWA approaches
• Keys and Pseudo-Keys Detection for Web Datasets Cleaning
and Interlinking
• ROCKER: A Refinement Operator for Key Discovery
OWA approaches
• KD2R: a Key Discovery method for reference reconciliation
• SAKey: Scalable almost key discovery in RDF data
• Linkkey: Data interlinking through robust Linkkey extraction
• VICKEY: Conditional key discovery
106
107. SAKEY [Symeonidou et al. 14]
SAKey: Scalable Almost Key discovery approach for:
• Incomplete data (optimistic heuristic)
• Errors
• Duplicates
• Large datasets
Discovers almost keys
• Sets of properties that are not keys due to few exceptions
107
108. ALMOST KEY DISCOVERY
STRATEGY
• Naive automatic way to discover keys
• Examine all the possible combinations of properties
• Scan all instances for each candidate key
• Example: Class described by 15 properties è215 = 32767
candidate keys
• Discover keys efficiently by:
• Reducing the combinations
• Partially scanning the data
108
109. KEY DISCOVERY
SAKey: Key discovery approach for very large RDF datasets that may
contain erroneous data or duplicates
Region Producer Colour
Wine1 Bordeaux Dupont White
Wine2 Bordeaux Baudin Rose
Wine3 Languedoc Dupont Red
Wine4 Languedoc Faure Red
109
Producer is a key?
110. KEY DISCOVERY
SAKey: Key discovery approach for very large RDF datasets that may
contain erroneous data or duplicates
Region Producer Colour
Wine1 Bordeaux Dupont White
Wine2 Bordeaux Baudin Rose
Wine3 Languedoc Dupont Red
Wine4 Languedoc Faure Red
110
Region is a non key?
• Interested only in maximal non keys:
• All the sets of properties that are not maximal non keys are keys
• Example: class described by the properties p1, p2, p3, p4
Maximal non key = {{p1, p2}} keys = {{p3}, {p4}}è
112. KEY DISCOVERY
SAKey: Key discovery approach for very large RDF datasets that may
contain erroneous data or duplicates
• n-almost key: A set of properties for which at most n instances are
identical values
Region Producer Colour
Wine1 Bordeaux Dupont White
Wine2 Bordeaux Baudin Rose
Wine3 Languedoc Dupont Red
Wine4 Languedoc Faure Red
• Examples of keys
{Region, Producer}:
0-almost key
112
[Symeonidou et al. 14]
113. KEY DISCOVERY
SAKey: Key discovery approach for very large RDF datasets that
may contain erroneous data or duplicates
• n-almost key: A set of properties for which at most n instances are
identical values
Region Producer Colour
Wine1 Bordeaux Dupont White
Wine2 Bordeaux Baudin Rose
Wine3 Languedoc Dupont Red
Wine4 Languedoc Faure Red
• Examples of keys
{Region, Producer}:
0-almost key
{Producer}:
2-almost key
Tool available in https://www.lri.fr/sakey
113
[Symeonidou et al. 14]
114. N-ALMOST KEYS
Exception of a key: an instance that shares values with another instance for
a given set of properties P
114
Films HasName HasActor HasDirector ReleaseDate HasWebsite HasLanguage
f1 “Ocean’s 11” “B. Pitt”
“J. Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
f2 “Ocean’s 12” “B. Pitt”
“G. Clooney”
“J. Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com ---
f3 “Ocean’s 13” “B. Pitt”
“G. Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com ---
f4 “The
descendants”
“N. Krause”
“G. Clooney”
“A. Payne” “15/9/11” www.descendants.com “english”
f5 “Bourne
Identity“
“D. Liman” --- “12/6/12” www.bourneIdentity.com “english”
f6 “Ocean’s 12“ --- “R. Howard” “2/5/04” --- ---
[Symeonidou et al. 14]
115. N-ALMOST KEYS
Exception of a key: an instance that shares values with another instance for a
given set of properties P
• f1, f2 and f3 are three exceptions for the property set {HasActor}
115
Films HasName HasActor HasDirector ReleaseDate HasWebsite HasLanguage
f1 “Ocean’s 11” “B. Pitt”
“J. Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
f2 “Ocean’s 12” “B. Pitt”
“G. Clooney”
“J. Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com ---
f3 “Ocean’s 13” “B. Pitt”
“G. Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com ---
f4 “The
descendants”
“N. Krause”
“G. Clooney”
“A. Payne” “15/9/11” www.descendants.com “english”
f5 “Bourne
Identity“
“D. Liman” --- “12/6/12” www.bourneIdentity.com “english”
f6 “Ocean’s 12“ --- “R. Howard” “2/5/04” --- ---
[Symeonidou et al. 14]
116. N-ALMOST KEYS
Exception of a key: an instance that shares values with another instance for a
given set of properties P
• f1, f2 and f3 are three exceptions for the property set {HasActor}
Exception Set EP: set of exceptions for P
• EP = {f1, f2, f3} È {f2, f3, f4} = {f1, f2, f3, f4} for {HasActor}
116
Films HasName HasActor HasDirector ReleaseDate HasWebsite HasLanguage
f1 “Ocean’s 11” “B. Pitt”
“J. Roberts”
“S.
Soderbergh”
“3/4/01” www.oceans11.com ---
f2 “Ocean’s 12” “B. Pitt”
“G. Clooney”
“J. Roberts”
“S.
Soderbergh”
“R. Howard”
“2/5/04” www.oceans12.com ---
f3 “Ocean’s 13” “B. Pitt”
“G. Clooney”
“S.
Soderbergh”
“R. Howard”
“30/6/07” www.oceans13.com ---
f4 “The
descendants”
“N. Krause”
“G. Clooney”
“A. Payne” “15/9/11” www.descendants.com “english”
f5 “Bourne
Identity“
“D. Liman” --- “12/6/12” www.bourneIdentity.com “english”
f6 “Ocean’s 12“ --- “R. Howard” “2/5/04” --- ---
[Symeonidou et al. 14]
117. N-ALMOST KEYS
n-almost key: a set of properties where |EP|≤ n
• {HasActor} is a 4-almost key
117
[Symeonidou et al. 14]
118. N-ALMOST KEYS
n-almost key: a set of properties where |EP|≤ n
• {HasActor} is a 4-almost key
n-non key: a set of properties where |EP|≥ n
• Using all the maximal n-non keys we can derive all the
minimal (n-1)-almost keys
118
[Symeonidou et al. 14]
119. N-ALMOST KEYS
n-almost key: a set of properties where |EP|≤ n
• {HasActor} is a 4-almost key
n-non key: a set of properties where |EP|≥ n
• Using all the maximal n-non keys we can derive all the
minimal (n-1)-almost keys
119
All combinations
of properties
[Symeonidou et al. 14]
120. N-ALMOST KEYS
n-almost key: a set of properties where |EP|≤ n
• {HasActor} is a 4-almost key
n-non key: a set of properties where |EP|≥ n
• Using all the maximal n-non keys we can derive all the
minimal (n-1)-almost keys
120
All sets of properties
that contain at least 5
exceptions
All combinations
of properties
5-non keys
[Symeonidou et al. 14]
121. N-ALMOST KEYS
n-almost key: a set of properties where |EP|≤ n
• {HasActor} is a 4-almost key
n-non key: a set of properties where |EP|≥ n
• Using all the maximal n-non keys we can derive all the
minimal (n-1)-almost keys
121
All sets of properties
that contain at least 5
exceptions
All combinations
of properties
All sets of properties
that contain less than 5
exceptions
5-non keys 4-almost keys
[Symeonidou et al. 14]
127. N-NON KEY DISCOVERY:
POTENTIAL N-NON KEYS
Combinations of properties not needed be explored
• Incomplete data
• Properties referring to different classes
Lake Mountain
NaturalPlace
depth mountainRange
127
rdfs:subclassOfrdfs:subclassOf
[Symeonidou et al. 14]
128. N-NON KEY DISCOVERY:
POTENTIAL N-NON KEYS
Combinations of properties not needed be explored
• Incomplete data
• Properties referring to different classes
• Potential n-non keys: Sets of properties that possibly refer
to n-non keys
128
Lake Mountain
NaturalPlace
depth mountainRange
rdfs:subclassOfrdfs:subclassOf
[Symeonidou et al. 14]
133. KEY MERGE: SEVERAL DATASETS
Goal: Keys valid in both datasets
• More sure keys
Intuition: Computation of Cartesian product of sets of keys
• Keep only minimal keys
133
[firstName, LastName]
[hasFriend]
[firstName]
KeysA KeysB
[Symeonidou et al. 14]
134. KEY MERGE: SEVERAL DATASETS
Goal: Keys valid in both datasets
• More sure keys
Intuition: Computation of Cartesian product of sets of keys
• Keep only minimal keys
134
[firstName, LastName]
[hasFriend]
[firstName]
KeysA KeysB
[firstName, LastName]
[hasFriend, firstName]
KeysAB
[Symeonidou et al. 14]
135. DATA LINKING USING
ALMOST KEYS
Goal: Compare linking results using almost keys with different n
Evaluation of linking using
• Recall
• Precision
• F-Measure
Datasets
• OAEI 2010
• OAEI 2013
Conclusion
• Linking results using n-almost keys are the better than using keys
135
[Symeonidou et al. 14]
138. KD2R VS. SAKEY -
KEY DERIVATION
Class
# non
keys
# keys KD2R SAKey
(n=0)
DB:Lake 50 480 1min10s 1s
DB:Mountain 49 821 8min 1s
DB:BodyOfWater 220 3846 > 1 day 66s
DB:NaturalPlace 302 7011 > 2 days 5min
Dbpedia class=
DB:BodyOfWater
138
[Symeonidou et al. 14]
139. KEY DISCOVERY
APPROACHES
CWA approaches
• Keys and Pseudo-Keys Detection for Web Datasets Cleaning
and Interlinking
• ROCKER: A Refinement Operator for Key Discovery
OWA approaches
• KD2R: a Key Discovery method for reference reconciliation
• SAKey: Scalable almost key discovery in RDF data
• Linkkey: Data interlinking through robust Linkkey extraction
• VICKEY: Conditional key discovery
139
140. LINKKEY
140
Ontology1 Ontology2
Data of classA Data of classB
Linkkey
Discovery
Given a pair of classes in two datasets conforming to two different ontologies:
1) Discover Linkkeys – maximal sets of property pairs that can link instances of two
different classes
[Atencia et al. 2014]
142. LINKKEY [ATENCIA ET AL. 2014]
142
Person1
Literal
LastName
Literal Literal
Person2
Literal
LN
Literal Literal
Owl:EquivalentClass
Given a pair of classes in two datasets conforming to two different ontologies:
1) Discover Linkkeys – maximal sets of property pairs that can link instances of two
different classes
143. LINKKEY [ATENCIA ET AL. 2014]
143
Person1
Literal
LastName
Literal Literal
Person2
Literal
LN
Literal Literal
Owl:EquivalentClass
LastName Nationality Profession
P11 Tompson Greek
P12 Dupont French Researcher
LN NL PR
P21 Tompson Greek
P22 Tompson Greek,
French
Reseacher
Given a pair of classes in two datasets conforming to two different ontologies:
1) Discover Linkkeys – maximal sets of property pairs that can link instances of two
different classes
144. LINKKEY [ATENCIA ET AL. 2014]
144
Person1
Literal
LastName
Literal Literal
Person2
Literal
LN
Literal Literal
Owl:EquivalentClass
Ex. {<LastName,LN>,<Nationality,NL>}, {<Profession,PR>,<Nationality,NL>}
LastName Nationality Profession
P11 Tompson Greek
P12 Dupont French Researcher
LN NL PR
P21 Tompson Greek
P22 Tompson Greek,
French
Reseacher
Given a pair of classes in two datasets conforming to two different ontologies:
1) Discover Linkkeys – maximal sets of property pairs that can link instances of two
different classes
145. LINKKEY [ATENCIA ET AL. 2014]
145
Given a pair of classes in two datasets conforming to two different ontologies:
1) Discover Linkkeys – maximal sets of property pairs that can link instances of two
different classes
2) Linkkey quality evaluation:
1) use measures of discriminability and coverage for non supervised linkkey
extraction
2) an approximation of precision and recall for the supervised case.
Ex. {<LastName,LN>,<Nationality,NL>}, {<Profession,PR>,<Nationality,NL>}
LastName Nationality Profession
P11 Tompson Greek
P12 Dupont French Researcher
LN NL PR
P21 Tompson Greek
P22 Tompson Greek,
French
Reseacher
146. CONDITIONAL KEYS
COLLABORATION WITH:
LUIS GALARAGA (INRIA RENNES)
NATHALIE PERNELLE (LRI, UNIV. PARIS SUD),
FABIAN SUCHANEK (TELECOM PARISTECH)
DANAI SYMEONIDOU (INRA, MONTPELLIER)
146
Symeonidou et al. 2017
147. KEYS IN KNOWLEDGE
BASES
To obtain keys found in Knowledge Bases (KBs)
• Different automatic key discovery approaches (eg. SAKey, KD2R)
Key limitations
• Cases of datasets having no keys
• Keys are generic, i.e., they are true for every instance of a class in a
dataset
FirstName LastName Gender Lab Nationality
instance1 Claude Dupont Female Paris-Sud France
instance2 Claude Dupont Male Paris-Sud Belgium
instance3 Juan Rodríguez Male INRA Spain, Italy
instance4 Juan Salvez Male INRA Spain
instance5 Anna Georgiou Female INRA Greece, France
instance6 Pavlos Markou Male Paris-Sud Greece
instance7 Marie Legendre Female INRA France
Instances of the
class Person
Key: {LastName,
Gender}
147
Symeonidou et al. 2017
148. PROBLEM STATEMENT
Motivating example
• In many countries of the world, a student can be supervised
by multiple supervisors
• In German Universities, a student can be supervised by only
one professor
• {supervises} key for the instances of the class Professor with
“condition” that they are in a German university
Approximate keys [Sakey] express keys that do not uniquely
identify every instance of a class in a dataset
Approximate keys do not express the conditions under which they are true
148
Symeonidou et al. 2017
149. CONDITIONAL KEYS
BY VICKEY
Conditional key: a key, valid for instances of a class satisfying a
specific condition
• Condition part: pairs of property and value
Eg. {Lab=INRA}, {Gender=Male}, {Gender=Female, Lab=INRA} etc.
• Key part: a set of properties
FirstName LastName Gender Lab Nationality
instance1 Claude Dupont Female Paris-Sud France
instance2 Claude Dupont Male Paris-Sud Belgium
instance3 Juan Rodríguez Male INRA Spain, Italy
instance4 Juan Salvez Male INRA Spain
instance5 Anna Georgiou Female INRA Greece, France
instance6 Pavlos Markou Male Paris-Sud Greece
instance7 Marie Legendre Female INRA France
{LastName} is a key under the condition {Lab=INRA}
Instances of the
class Person
149
Symeonidou et al. 2017
150. CONDITIONAL KEY
QUALITY MEASURES
Support: #instances both satisfying condition part and instantiating key part
Coverage: Support/#all_Instances
Support: 4
Coverage: 4/7 = 0.57
FirstName LastName Gender Lab Nationality
instance1 Claude Dupont Female Paris-Sud France
instance2 Claude Dupont Male Paris-Sud Belgium
instance3 Juan Rodríguez Male INRA Spain, Italy
instance4 Juan Salvez Male INRA Spain
instance5 Anna Georgiou Female INRA Greece, France
instance6 Pavlos Markou Male Paris-Sud Greece
instance7 Marie Legendre Female INRA France
{LastName} is a key under the condition {Lab=INRA}
Instances of the
class Person
150
Symeonidou et al. 2017
151. VICKEY: MINING EFFICIENTLY
CONDITIONAL KEYS
Goal of VICKEY
• Given all instances of a class in a dataset, a min_support, a min_coverage,
mine all minimal conditional keys with
• support ≥ min_support
• coverage ≥ min_coverage
Exponential search space (O(|V||P|)with
• V, the set of objects of a given class in the dataset
• P, the set of properties of a given class in the dataset
151
Symeonidou et al. 2017
Conditional keys can be efficiently discovered from the non-keys
152. VICKEY: MINING EFFICIENTLY
CONDITIONAL KEYS
Non-key: a set of properties where two instances share at least one value for
each property in the non-key (SAKey)
Given a maximal non-key
• Condition part: subset of properties of the non-key
• Key part: subset of the remaining properties of the non-key
FirstName LastName Gender Lab Nationality
instance1 Claude Dupont Female Paris-Sud France
instance2 Claude Dupont Male Paris-Sud Belgium
instance3 Juan Rodríguez Male INRA Spain, Italy
instance4 Juan Salvez Male INRA Spain
instance5 Anna Georgiou Female INRA Greece, France
instance6 Pavlos Markou Male Paris-Sud Greece
instance7 Marie Legendre Female INRA France
Non-key
{FirstName, Gender, Lab,
Nationality}
Instances of the
class Person
152
Symeonidou et al. 2017
153. CONDITIONAL KEY GRAPH
EXPLORATION
• Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1
• Step 2: Discover all minimal conditional keys with condition of size 2
• …
• Step n: Discover all minimal conditional keys with condition of size n
153
Symeonidou et al. 2017
154. • Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1 {p=a}
• Step 2: Discover all minimal conditional keys with condition of size 2 {p1=a1^p2=a2}
• …
• Step n: Discover all minimal conditional keys with condition of size n {p1=a1^…^pn=an}
FirstName Lab
FirstName
Lab
FirstName
Nationality
Lab
Nationality
FirstName
Nationalit
y
Lab
Gender=Female Nationality
Condition Conditional key graph
Non-key
{FirstName, Gender, Lab, Nationality}
154
CONDITIONAL KEY GRAPH
EXPLORATION Symeonidou et al. 2017
155. • Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1 {p=a}
• Step 2: Discover all minimal conditional keys with condition of size 2 {p1=a1^p2=a2}
• …
• Step n: Discover all minimal conditional keys with condition of size n {p1=a1^…^pn=an}
FirstName Lab
FirstName
Lab
FirstName
Nationalit
y
Lab
Nationality
FirstName
Nationalit
y
Lab
Gender=Female Nationality
Condition Conditional key graph
Key {FirstName} for cond {Gender=Female}
Non-key
{FirstName, Gender, Lab, Nationality}
155
CONDITIONAL KEY GRAPH
EXPLORATION Symeonidou et al. 2017
156. • Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1 {p=a}
• Step 2: Discover all minimal conditional keys with condition of size 2 {p1=a1^p2=a2}
• …
• Step n: Discover all minimal conditional keys with condition of size n {p1=a1^…^pn=an}
Gender=Female
Condition Conditional key graph
firstName
firstName
lab
firstName
nationalit
y
lab
nationalit
y
firstName
nationalit
y
lab
FirstNam
e
FirstName
Lab
FirstName
Nationalit
y
FirstName
Nationalit
y
Lab
Lab
Nationalit
y
NationalityLab
Key {FirstName} for cond {Gender=Female}
Non-key
{FirstName, Gender, Lab, Nationality}
156
CONDITIONAL KEY GRAPH
EXPLORATION Symeonidou et al. 2017
157. • Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1 {p=a}
• Step 2: Discover all minimal conditional keys with condition of size 2 {p1=a1^p2=a2}
• …
• Step n: Discover all minimal conditional keys with condition of size n {p1=a1^…^pn=an}
Lab=INRA
Condition Conditional key graph
Key {FirstName} for cond {Gender=Female}
FirstNam
e
FirstName
Gender
FirstName
Nationalit
y
Gender
Nationalit
y
FirstName
Nationalit
y
Gender
NationalityGender
Non-key
{FirstName, Gender, Lab, Nationality}
157
CONDITIONAL KEY GRAPH
EXPLORATION Symeonidou et al. 2017
158. • Given a non-key
• Step 1: Discover all minimal conditional keys with condition of size 1 {p=a}
• Step 2: Discover all minimal conditional keys with condition of size 2 {p1=a1^p2=a2}
• …
• Step n: Discover all minimal conditional keys with condition of size n {p1=a1^…^pn=an}
Condition Conditional key graph
Gender=Female
^
Lab=INRA
Nationality
158
CONDITIONAL KEY GRAPH
EXPLORATION Symeonidou et al. 2017
160. VICKEY’S
SCALABILITY
Run VICKEY in large datasets
Compare the runtime results with AMIE - a generic rule
mining approach adapted to mine conditional keys
• Both approaches take as input non-keys mined by SAKey to
reduce the search space of conditional key mining
Coverage 1%
160
Symeonidou et al. 2017
161. RUNTIME RESULTS -
VICKEY VS. AMIE[2]
Class* #Triples #Instances #Properties #NonKeys VICKEY AMIE #ConditionalKeys
Actor 57.2k 5.8k 71 137 4.52m 12.58h 311
Album 786.1k 85.3k 39 68 1.53h 3.90h 304
Book 258.4k 30.0k 51 95 11.84h >1d 419
Film 832.1k 82.6k 74 132 1.37h 3.64h 185
Mountain 127.8k 16.4k 58 47 2.86m 23.57m 257
Museum 12.9k 1.9k 65 17 1.46s 6.45s 58
Organization 1.82M 178.7k 553 3221 26.32h > 36h 28
Scientist 258.5k 19.7k 73 309 27.67m > 1d 582
University 85.5k 8.7k 89 140 14.45h >1d 941
*All used classes are obtained from DBpedia
161
Symeonidou et al. 2017
162. DATA LINKING -
VICKEY VS. SAKEY[1]
Goal: link two datasets using
• Classical keys discovered by SAKey[1]
• Conditional keys discovered by VICKEY
• Supervises(Prof1, stud1)^Supervises(Prof2, stud1)^teachesIn(Prof1,
“Gemany”)^teachesIn(Prof2, “Gemany”)=> sameAs(Prof1, Prof2)
• Both classical keys and conditional keys
Evaluate obtained links using the existing goal-standard with
• Recall
• Precision
• F-Measure
162
Symeonidou et al. 2017
163. DATA LINKING -
VICKEY VS. SAKEY[1]
Link two knowledge bases containing information of Wikipedia
• Yago[3]
• DBpedia[4]
Used classes of DBpedia and Yago
• Actor
• Album
• Book
• Film
• Mountain
• Museum
• Organization
• Scientist
• University
163
Symeonidou et al. 2017
164. DATA LINKING -
VICKEY VS. SAKEY[1]
Class Recall Precision F-Measure
Actor
Keys[1]* 0.27 0.99 0.43
Conditional keys** 0.57 0.99 0.73
Keys[1]+Conditional
keys 0.6 0.99 0.75
Album
Keys[1] 0 1 0.00
Conditional keys 0.15 0.99 0.26
Keys[1]+Conditional
keys 0.15 0.99 0.26
Film
Keys[1] 0.04 0.99 0.08
Conditional keys 0.38 0.96 0.54
Keys[1]+Conditional
keys 0.39 0.98 0.55
Museum
Keys[1]
0.12 1 0.21
Conditional keys
0.25 1 0.40
Keys[1]+Conditional
keys
0.31 1 0.47
x 869
x 1.75
x 7.1
x 2.19
*Keys[1] from SAKey **Conditional keys from VICKEY
164
Symeonidou et al. 2017
165. SUMMARY
Keys and almost keys:
• Better linking results in terms of recall with n-almost keys than with sure keys
• Optimistic heuristic gets better results than pessimistic one
• Can handle size classes much larger than KD2R (DB: Natural Place more than
16 million triplets)
Conditional keys:
• Better linking results in terms of recall when conditional keys are used.
• Aside data linking, may be used to refine a class hierarchy
Improvements:
• Numerical values
• Dataset heterogeneity
When no (almost-)keys neither conditional keys are discoverable from the dataset
è Possible solution: Referring expressions
165
166. LINK
INVALIDATION /
REQUALIFICATION
WORK IN COLLABORATION WITH:
NATHALIE PERNELLE (LRI, UNIV. PARIS SUD),
LAURA PAPALEO (GENOVA PROVINCE, ITALY)
JOE RAAD (PHD STUDENT, AGROPARISTECH & UNIVERSITÉ PARIS SUD. )
166
167. • owl:sameAs, indicates that two different
descriptions refer to the same entity
• owl:sameAs semantics is too strict
• Reflexive, symmetric, transitive and
• Property sharing:
" X " Y owl:sameAs(X, Y)Ù p(X, Z) Þ p(Y, Z)
• Automatic data linking tools do not guarantee 100%
precision, because of:
• Errors, missing information, data freshness, …
• [Halpin et al. 2010] showed that 37% of owl:sameAs
links randomly selected among 250 identity links
between books were incorrect.
• Problem: how to (semi)-automatically invalidate/requalify
owl:sameAs links?
IDENTITY CRISIS
b1 b2
owl:sameAs
b4 b3
owl:sameAs
b1 b5
owl:sameAs
b5 b6
owl:sameAs
167
168. IDENTITY CRISIS: SOME SOLUTIONS
b1 b2
owl:sameAs
b4 b3
owl:sameAs
b1 b5
owl:sameAs
b5 b6
owl:sameAs
168
• [Halpin et al. 2010]: propose ontology of identity and
invalidation of identity links by crowdsourcing.
• [de Melo 2013]: uses the Unique Name Assumption
and the transitivity of links to detect inconsistencies in
the data.
• [Papaleo et al. 2014, Papageorgiou et al. 2017]: exploit
some ontology axioms to logically/numerically detect
invalid identity links.
• [Raad et al. 2017] computes contextual identity links
for each pair of instances
170. LOGICAL AND NUMERICAL
APPROACH FOR LINK INVALIDATION
• Two ontology-based methods to detect invalid sameAs statements: a
logical method and a numerical method
• We build a contextual graph «around» each one of the two resources
involved in the sameAs by exploiting ontology axioms on:
• functionality and inverse functionality of properties and
• local completeness of some properties (e.g., the author list of a
book).
• We analyse the descriptions provided in these contextual graphs to
eventually detect inconsistencies or high dissimilarities.
170
171. LOGICAL METHOD
F is the set of RDF facts
enriched by a set of ¬synVals
facts in the form
¬synVals(w1, w2)
w1 and w2, being literals and
different.
171of 20
EXAMPLES:
- notSynVals(‘231’,’100’)
for a functional property numOfPages
-notSynVals(‘New York’, ‘Paris’)
for a functional property cityName
… knowledge from expert or extracted.
171
[Papaleo et al. 2014]
172. LOGICAL METHOD
R the set of rules
172of 20
(inverse) functional properties
local complete properties
sameAs(x,y) ÙnumOfPages(x,w1) Ù numOfPages(y,w2) à SynVals(w1,w2)
sameAs(x,y) Ù hasAuthor(x,w1)
à hasAuthor(y,w1)
172
[Papaleo et al. 2014]
173. LOGICAL INVALIDAITON
• If the property nb-pages is declared as functional then:
nbPages(b1, n1) Ù nbPages(b2, n2) Ù (b1=b2) Þ n1=n2.
è owl:sameAs(b1, b2) est faux.
b1 b2
a11
a21
owl:sameAs(b1, b2) ?
author
author
A Semantic Web
Primer
titre
A Semantic Web
Primer
titre
2007
pubYear
2007
pubYear
G. Antoniou
aName
Grigoris
Antoniou
aName
author
a12
author
a22
208
nbPages 288nbPages
Paul Grauth
aName
P. Grauth
aName
173
[Papaleo et al. 2014]
174. b1
e1
b2
a11 e2 a21
owl:sameAs(b1, b2) ?
author
author
publisherpublisher
A Semantic Web
Primer
title
A Semantic Web
Primer
title
2008
pubYear
2007
pubYear
G. Antoniou
aName
Grigoris
Antoniou
aName
author
…
a12
…
author
…
a22
…
r2n
r21ref
ref
…r1m r11
ref ref
…
NUMERICAL METHOD: EXAMPLE
• P= {title, pubYear, publisher, author, aName, pName}
• Sim(“A Semantic Web …``, “A Semantic Web …``) = 1, Sim(“ 2007“, “2008“) = 0
• Sim(“MIT Press``, “MIT Press``) = 1, è CSim(e1, e2) = 1
• Sim(“Grigoris Antoniou``, “G. Antoniou``) = 0.5, ... è CSim(a11, a21) = 0.5, ….
è CSim(a12, a22) = 0.5
MIT Press
MIT Press
pName
pNam
e
• CSim(b1, b2) = 0.62 if agregation function is average and
• CSim(b1, b2) = 0 if agregation function is minimum 174
[Papageorgio et al. 2017]
175. COMPARAISON
LOGICAL/NUMERICAL
Logical method
[Papaleo, Pernelle and Saïs (2014)]
Numerical method
(Agregation = average)
[Papageorgiou, Pernelle and Saïs 2017]
Datasets Precision Recall F-measure Precision Recall F-measure
thresh
old
Person1 0.69 0.98 0.81 1.0 0.98 0.99 0.3
Person2 0.5 1.0 0.67 0.994 0.989 0.99 0.2
Restaurant 0.63 0.97 0.77 0.97 1.0 0.98 0.4
• Average gain of 23% F-measure (significant increase in precision, comparable recall)
175
176. CONTEXTUAL
IDENTITY
[JOE RAAD PHD
LIONES (2015-2018) PROJECT,
FUNDED BY CDS PARIS SACLAY]
WORK IN COLLABORATION WITH:
- NATHALIE PERNELLE (LRI, UNIV. PARIS SUD),
- JULIETTE DIBIE (AGROPARISTECH, INRA),
- LILIANA IBANESCO (AGROPARISTECH, INRA),
- CAROLINE PENICAUD (INRA PARIS)
176
177. • owl:sameAs, indicates that two different descriptions refer to the same
entity
• owl:sameAs semantics is too strict
• Reflexive, symmetric, transitive and
• Property sharing:
" X " Y owl:sameAs(X, Y)Ù p(X, Z) Þ p(Y, Z)
• In some specific domains, the identity depends on the context.
Examples:
• same-components and same-quantity è same-juice. [Hypocaloric diet context]
• proportionally-same-components è same-juice. [Shopping list context]
• Problem: how to automatically compute the contextual identity links?
CONTEXTUAL IDENTITY
177
[Raad et al. 2014]
178. 178
CONTEXTUAL IDENTITY
Proposition of a new contextual identity link
• A new predicate to represent identity links valid in explicit contexts
• Hierarchy of the contexts and the identity links
• Defining identity contexts for each class (local contexts)
• Defining identity contexts for each class and its connected graph
(global context)
[Raad et al. 2014]
180. CONTEXTUAL IDENTITY
Global Context
• π(ck): local context of the class ck =(ck, DPck, Opck)
• src: a given source class of the ontology
• G: connected sub-graph containing src
• CG: set of classes belonging to G
• Ck: class belonging to CG
Π(src) =Uck ∈ CG π(ck)
180
[Raad et al. 2014]
181. CONTEXTUAL IDENTITY
Πa(Juice) ≤ Πc(Juice) Πb(Juice) ≤ Πc(Juice)
Πa(Juice) Πb(Juice)
Πc(Juice)
…
…
each local context in Πc(Juice) is less specific or equal to its corresponding local
context in Πa(Juice) 181
[Raad et al. 2014]
182. DECIDE METHOD – DETECTION OF
CONTEXTUAL IDENTITY
It automatically detects and adds these contextual
identity links in the knowledge graph
DECIDE
DEtection of Contextual IDEntity
Knowledge
Graph
Source
Class
Unwanted
Properties
Paired
Properties
Necessary
Properties
For each pair of instances (i1, i2) of the source class
set of the most specific global contexts in which (i1, i2)
are identical
182
[Raad et al. 2014]
184. EXPERIMENTS - LIONES PROJECT
Mixture
DECIDE
DEtection of Contextual IDEntity
CellExtraDry + Carredas
184
[Raad et al. 2014]
185. EXPERIMENTS - LIONES PROJECT
identiConTo<Π(Mixture)25>(i, j): two mixtures have the same nature (e.g. humid) and
the same components (water, yeast, mono-potassium, phosphate, and glucose),
with each of these components having identical weights (value + unit of measure)
Generation of rules for missing observations prediction
185
[Raad et al. 2014]
186. INVALIDATION /
REQUALIFICATION: SUMMARY
• Contextual identity useful for profiling identity links
• Useful for giving explanations for possible experts validation
• Allow to generate missing values prediction rules.
• Future directions:
• Is Contextual difference relevant/useful?
• How to make explicit incomplete information for a pair of resources?
• Use when the ontologies are different
• SameAs link invalidation
• Improves the precision without decreasing the recall
• May be used for error detection and/or anomalies in ontology axioms
• Future directions:
• Provide comprehensible explanations for experts
• Use when the ontologies are different
186
187. SOME FUTURE CHALLENGES
q Data evolution
q ontology axiom evolution
q link evolution
q temporal data linking
q Data veracity è what is the truth?
q Knowledge discovery in complex data
q key graphs
q causality rules
q Explanation problem
q provenance of the data and the data/knowledge processes
187
189. REFERENCES (1)
[Atencia et al. 2014] Data interlinking through robust Linkkey extraction.
Atencia, Manuel, Jérôme David, and Jérôme Euzenat. ECAI, 2014.
[Atencia et al.’12] Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking.
Manuel Atencia, Jérôme David, François Scharffe. In EKAW 2012
[Cohen et al. 2003] A comparison of string distance metrics for name-matching tasks.
William W. Cohen, Pradeep Ravikumar, and Stephen E. Fienberg.
In IIWEB@AAAI 2003.
[Ferrara13] Evaluation of instance matching tools: The experience of OAEI.
Alfio Ferrara, Andriy Nikolov, Jan Noessner, François Scharffe. OM@ISWC 2013
[Hu et al. 2011] A Self-Training Approach for Resolving Object Coreference on the Semantic Web.
Wei Hu, Jianfeng Chen, Yuzhong Qu. In WWW 2011
[Kang et al. 2008] Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its
Evaluation. Kang, Getoor, Shneiderman, Bilgic, Licamele,
In IEEE Trans. Vis. Comput. Graph2008
189
190. REFERENCES (2)
[Lenat and Feigenbaum 1991] On the threshold of knowledge.
Douglas B. Lenat and Edward A. Feigenbaum
In Artificial Intelligence 47 (1991)
[Nikolov et al’12] Unsupervised Learning of Link Discovery Configuration
Andriy Nikolov, Mathieu d'Aquin, Enrico Motta. In ESWC 2012.
[Pernelle et al.’13] An Automatic Key Discovery Approach for Data Linking.
Nathalie Pernelle, Fatiha Saïs. and Danai Symeounidou.
In Journal of Web Semantics
[Saïs et al.07] L2R: a Logical method for Reference Reconciliation.
Fatiha Saïs, Nathalie Pernelle and Marie-Christine Rousset. In AAAI 2007.
[Saïs et al.09] Combining a Logical and a Numerical Method for Data Reconciliation.
Fatiha Saïs., Nathalie Pernelle and Marie-Christine Rousset.
In Journal of Data Semantics.
[Shvaiko,Euzenat13] Ontology Matching: State of the Art and Future Challenges,
Pavel Shvaiko, Jérôme Euzenat. In TKDE 2013
190
191. REFERENCES (3)
[Suchanek11] PARIS: Probabilistic Alignment of Relations, Instances, and Schema
Fabian Suchanek, Serge Abiteboul, Pierre Senellart. In VLDB 2011.
[Soru et al. 2015] ROCKER: a refinement operator for key discovery.
Soru, Tommaso, Edgard Marx, and Axel-Cyrille Ngonga Ngomo.
In WWW, 2015.
[Symeonidou et al. 2014] SAKey: Scalable almost key discovery in RDF data.
Symeonidou, Danai, Vincent Armant, Nathalie Pernelle, and Fatiha Saïs.
In ISWC 2014.
[Symeonidou et al. 2017] VICKEY: Mining Conditional Keys on RDF datasets .
Danai Symeonidou, Luis Galarraga, Nathalie Pernelle, Fatiha Saïs and Fabian Suchanek.
In ISWC 2017.
[Papageorgiou et al. 2017] Approche numérique pour l'invalidation de liens d'identité (owl:SameAs).
Dimitrios Christaras Papageorgiou, Nathalie Pernelle and Fatiha Saïs. In IC 2017.
191
192. REFERENCES (4)
[Papaleo et al. 2014] Logical Detection of Invalid SameAs Statements in RDF Data,
Laura Papaleo, Nathalie Pernelle, Fatiha Saïs and Cyril Dumont. In EKAW 2014
[Volz et al'09] Silk – A Link Discovery Framework for the Web of Data.
Julius Volz, Christian Bizer et al. In WWW 2009.
[Zheng et al. 2013] Results for OAEI 2013
Qian Zheng, Chao Shao, Juanzi Li, Zhichun Wang and Linmei Hu. OM@ISWC 2010
192