Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Bootstrapping Entity Alignment with Knowledge Graph Embedding
1. Zequn Sun, Wei Hu, Qingheng Zhang and Yuzhong Qu
National Key Laboratory for Novel Software Technology
Nanjing University, China
{zqsun, qhzhang}.nju@gmail.com, {whu, yzqu}@nju.edu.cn
Bootstrapping Entity Alignment
with Knowledge Graph Embedding
1
2. Background
n Entity Alignment
¡ Find entities in different KGs that refer to the same real-world object
¡ Play a vital role in automatically integrating multiple KGs
n Conventional approaches
¡ Compute entity similarities based on entity attributes
¡ Are not always effective because of the semantic heterogeneity
n Embedding-based approaches
¡ Encode KGs into vector spaces
¡ Measure entity similarities via entity embeddings
2
3. Challenges
n Although embedding a single KG has been extensively studied
in the past few years, alignment-oriented KG embedding
remains largely unexplored.
n Embedding-based entity alignment usually relies on existing
entity alignment (prior alignment) as training data. However,
the accessible prior alignment usually accounts for a small
proportion.
3
4. Framework
n We model entity alignment as a classification problem of
using KG2 entities to label KG1 entities.
n To solve the aforementioned two issues, we proposed a
bootstrapping framework:
4
KG1 triples
KG2 triples
Prior
alignment
Supervised
triples
Parameter
swapping
Alignment
predictor
Train alignment-oriented
KG embeddings
Likely
alignment
Parameter
swapping
Alignment
editing
Alignment
labeling
5. Parameter Swapping
n We swap aligned entities in their triples to calibrate the
embeddings of KG1 and KG2 in the unified vector space.
!(#,%)
'
= {(*, +, ,)|(., +, ,) ∈ !0
1
} ∪ ℎ, +, * ℎ, +, . ∈ !0
1
∪ {(., +, ,)|(*, +, ,) ∈ !5
1
} ∪ ℎ, +, . ℎ, +, * ∈ !5
1
n The supervised triples are fed to our KG embedding model as
positives.
5
KG2’s triples
KG1’s triples
7. !-Truncated Negative Sampling
n Conventional uniform negative sampling
(Washington DC, capitalOf, USA) (Tim Berners-Lee, capitalOf, USA)
n !-Truncated negative sampling
(Washington DC, capitalOf, USA) (New York , capitalOf, USA)
7
The replacer is randomly sampled from all entities.
It may be easily distinguished from its original.
The sampling scope is limited to a group of candidates,
i.e., its "-nearest neighbors, where " = 1 − & ' .
8. Likely Alignment Labeling
n We choose to label likely alignment at the !-th iteration by
solving the following optimization problem:
max %
&∈(
%
)∈*+
,(.|0; 2 3
) 5 6 3
(0, .) ,
s. t. %
&;∈(
6 3
(0<
, .) ≤ 1,
%
);∈*+
6 3
(0, .<
) ≤ 1, ∀0, .
n We transform it to max-weighted matching on bipartite
graphs.
8
( *
one-to-one labeling
9. Likely Alignment Editing
n Labeling conflicts exist when accumulating the newly-
labeled alignment of different iterations.
¡ ! is labeled as " at the #-th iteration while as "$
at the (#+1)-th
iteration
n We calculate the following likelihood difference:
∆(',),)*)
(,)
= . " !; 0 ,
− .("$
|!; 0 ,
)
¡ If ∆(',),)*)
(,)
> 0, indicating labeling x as y gives more alignment
likelihood, we choose " to label !. Otherwise "$
.
9
10. Experiments
10
n Dataset
¡ DBP15K: three cross-lingual datasets built from the multilingual
versions of DBpedia: DBPZH-EN (Chinese to English), DBPJA-EN
(Japanese to English) and DBPFR-EN (French to English). Each
dataset contains 15 thousand reference entity alignment.
¡ DWY100K: two large-scale datasets extracted from DBpedia,
Wikidata and YAGO3, denoted by DBP-WD and DBP-YG. Each
dataset has 100 thousand reference entity alignment.
11. Experiments
11
n Comparative Approaches
¡ MTransE [ijcai 2017] learns a linear transformation between KGs.
¡ IPTransE [ijcai 2017] is an iterative method for entity alignment.
¡ JAPE [iswc 2017] combines relation and attribute embeddings for
entity alignment.
n Metrics
¡ Hits@k : the percentage of correct alignment ranked at top k
¡ MRR: the average of the reciprocal ranks of results
13. Experiments
13
n F1-score w.r.t. Distribution of Relation Triple Numbers
¡ We divided entity links in testing data into several intervals based on
the number of their relation triples.
¡ The performance was assessed by F1-score within a certain interval.
¡ This analysis demonstrated that BootEA can achieve promising
results on sparse data, indicating its practical use for real KGs.
0.0
0.2
0.4
0.6
0.8
1.0
[1,6) [6,11) [11,16) [16,21) [21,∞)
F1-score
Number of relation triples
MTransE IPTransE JAPE BootEA
Number of entity alignment within interval
14. Conclusion
14
n In this paper, we studied embedding-based entity alignment.
¡ We introduced a KG embedding model to learn alignment-oriented
embeddings across different KGs. It employs an !-truncated uniform
negative sampling method to improve alignment performance.
¡ We conducted entity alignment in a bootstrapping process. It labels
likely alignment as training data and edits alignment during iterations
¡ Our experiment results showed that the proposed approach
significantly outperformed three state-of-the-art embedding-based
ones, on three cross-lingual datasets and two new large-scale
datasets.
15. Thanks for your attention!
n This work is supported by the National Key R&D Program of China
(No. 2018YFB1004300)
n Codes and datasets of BootEA are now available at
https://github.com/nju-websoft/BootEA
n Welcome to my poster (#1425)
15