SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Faegheh Hasibi, Krisztian Balog, Svein E. Bratsberg
ECIR 2017
ENTITY LINKING IN QUERIES:
Efficiency vs. Effectiveness
IAI_group
ENTITY LINKING
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER PHILIP K. DICK
‣ Identifying entities in the text and linking them to the
corresponding entry in the knowledge base
Total Recall is a 1990 American science fiction action film
directed by Paul Verhoeven, starring Rachel Ticotin, Sharon
Stone, Arnold Schwarzenegger, Ronny Cox and Michael
Ironside. The film is loosely based on the Philip K. Dick
story…
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
ENTITY LINKING IN QUERIES (ELQ)
total recall arnold schwarzenegger
TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER
ENTITY LINKING IN QUERIES (ELQ)
total recall arnold schwarzenegger total recall
TOTAL RECALL (2012 FILM)TOTAL RECALL (1990 FILM)
Query Entity Linking Interpretation
‣ Identifying sets of entity linking interpretations
ENTITY LINKING IN QUERIES (ELQ)
[Carmel et al. 2014], [Hasibi et al. 2015], [Cornolti et al. 2016]
CONVENTIONAL APPROACH
Mention detection
Candidate Entity
Ranking
Disambiguation
new york
new york
pizza
0.8
NEW YORK
NEW YORK CITY
..
NEW YORK-STYLE PIZZA
SICILIAN_PIZZA
…
0.7
0.1
m e
0.9
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
s(m,e) interpretation sets
1 2
…
Incorporating different signals:
CONVENTIONAL APPROACH
‣ Contextual similarity between a candidate entity and text (or entity mention)
‣ Interdependence between all entity linking decisions (extracted from the
underlying KB)
e1 e2
e3
e4e5
What is special about entity linking in queries?
What is really different when it
comes to the queries?
RESEARCH QUESTIONS
ELQ is an online process and should be done under strict
time constraints
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?
Which group of features is
needed the most for effective
entity linking: contextual
similarity, interdependence
between entities, or both?
RQ2?
RESEARCH QUESTIONS
The context provided by the queries is limited

SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
1 2
SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
1 2
Estimate:
score(m, e, q)
Given:
• mention m
• entity e
• query q
CANDIDATE ENTITY RANKING
(I) Identify all possible entities that can
be linked in the query
(II) Rank them based on how likely
they are link targets
Goal
‣ lexical matching of query n-grams against
a rich dictionary of entity name variants
UNSUPERVISED [MLMcg]
rectly in the subsequent disambiguation step. Using lexical matching of query n-g
against a rich dictionary of entity name variants allows for the identification of ca
date entities with close to perfect recall [16]. We follow this approach to obtain a l
candidate entities together with their corresponding mentions in the query. Our f
of attention below is on ranking these candidate (m, e) pairs with respect to the q
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-the-art g
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]. This m
considers both the likelihood of the given mention and the similarity between the q
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the probability
mention being linked to an entity (a.k.a. commonness [22]), computed from the F
collection [12]. The query likelihood P(q|e) is estimated using the query length
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q|). The e
and collection language models, P(t|✓e) and P(t|C), are computed using the Mix
of Language Models (MLM) approach [27].
against a rich dictionary of entity name variants allows for the identificat
date entities with close to perfect recall [16]. We follow this approach to o
candidate entities together with their corresponding mentions in the quer
of attention below is on ranking these candidate (m, e) pairs with respect
i.e., estimating score(m, e, q).
Unsupervised For the unsupervised ranking approach, we take a state-of-t
ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]
considers both the likelihood of the given mention and the similarity betwe
and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the pro
mention being linked to an entity (a.k.a. commonness [22]), computed fro
collection [12]. The query likelihood P(q|e) is estimated using the query
malized language model similarity [20]:
P(q|e) =
Q
t2q P(t|✓e)P (t|q)
Q
t2q P(t|C)P (t|q)
,
where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q
and collection language models, P(t|✓e) and P(t|C), are computed using
Generative model for ranking entities based on the query
and the specific mention
Prob. mention being
linked to the entity
query likelihood based on

Mixture of Language Models (MLM)
[Hasibi et al. 2015]
SUPERVISED [LTR]
28 features mainly from literature
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
[Meij et al. 2013], [Medelyan et al. 2008]
SUPERVISED [LTR]
28 features mainly from literature
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
supervised alternatives, by adapting existing methods from the literature. We further
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and
Entity Linking in Queries: Efficiency vs. Effectiveness 5
Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention-
entity (ME), and query (Q) features.
Feature Description Type
Len(m) Number of terms in the entity mention M
NTEM(m)‡
Number of entities whose title equals the mention M
SMIL(m)‡
Number of entities whose title equals part of the mention M
Matches(m) Number of entities whose surface form matches the mention M
Redirects(e) Number of redirect pages linking to the entity E
Links(e) Number of entity out-links in DBpedia E
Commonness(e, m) Likelihood of entity e being the target link of mention m ME
MCT(e, m)‡
True if the mention contains the title of the entity ME
TCM(e, m)‡
True if title of the entity contains the mention ME
TEM(e, m)‡
True if title of the entity equals the mention ME
Pos1(e, m) Position of the 1st
occurrence of the mention in entity abstract ME
SimMf (e, m)†
Similarity between mention and field f of entity; Eq. (1) ME
LenRatio(m, q) Mention to query length ratio: |m|
|q|
Q
QCT(e, q) True if the query contains the title of the entity Q
TCQ(e, q) True if the title of entity contains the query Q
TEQ(e, q) True if the title of entity is equal query Q
Sim(e, q) Similarity between query and entity; Eq. (1) Q
SimQf (e, q)†
LM similarity between query and field f of entity; Eq. (1) Q
‡
Entity title refers to the rdfs:label predicate of the entity in DBpedia
†
Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) .
3.2 Disambiguation
The disambiguation step is concerned with the formation of entity linking interpreta-
[Meij et al. 2012], [Medelyan et al. 2008]
Extracted based on:

Mention, Entity,
Mention-entity, Query
TEST COLLECTIONS
‣ Y-ERD
• Based on the Yahoo Search Query Log to Entities dataset
• 2398 queries
‣ ERD-dev
• Released as part of the ERD challenge 2014
• 91 queries
[Carmel et al. 2014], [Hasibi et al. 2015]
Entity Linking in Queries: Efficiency vs. Effectiveness 9
Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.
RESULTS
Both methods (MLMcg and LTR) are able to
find the vast majority of the relevant entities
and return them at the top ranks.
Entity Linking in Queries: Efficiency vs. Effectiveness 9
Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for
each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
SYSTEMATIC INVESTIGATION
METHODS:
(1) Unsupervised (Greedy)
(2) Supervised (LTR)
METHODS:
(1) Unsupervised (MLMcg)
(2) Supervised (LTR)
Candidate Entity
Ranking
Disambiguation
21
Identify interpretation set(s):
Ei ={(m1,e1),...,(mk,ek)}
where:

mentions of each set are

non-overlapping
DISAMBIGUATION
Goal
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
interpretation sets
UNSUPERVISED - [Greedy]
Algorithm 1 Greedy Interpretation Finding (GIF)
Input: Ranked list of mention-entity pairs M; score threshold s
Output: Interpretations I = {E1, ..., Em}
begin
1: M0
Prune(M, s)
2: M0
PruneContainmentMentions(M0
)
3: I CreateInterpretations(M0
)
4: return I
end
1: function CREATEINTERPRETATIONS(M)
2: I {;}
3: for (m, e) in M do
4: h 0
5: for E in I do
6: if ¬ hasOverlap(E, (m, e)) then
7: E.add((m, e))
8: h 1
9: end if
10: end for
11: if h == 0 then
12: I.add({(m, e)})
13: end if
14: end for
15: return I
16: end function
(I) Pruning of mention-entity pairs
Discards the ones with score < threshold τs

For containment mentions, keep only the highest
scoring one
(II) Set generation
Adding mention-entity pairs in decreasing order
of score to an interpretation set
SUPERVISED
(I) Generate all interpretations (from the top-K entity-mention pairs)
(II) Collectively select the interpretations with a binary classifier
Collective disambiguation
new york
manhattan
0.8
NEW YORK CITY
NEW YORK
MANHATTAN
MANHATTAN (FILM)
0.7
0.2
m e
0.8
(new york, NEW YORK),
(manhattan, MANHATTAN)
(new york, NEW YORK CITY),
(manhattan, MANHATTAN)
interpretation sets
…
SYRACUSE, NEW YORK
ALBANY,_NEW_YORK
new york pizza NEW YORK-STYLE PIZZA 0.9
K 0.5
0.2
(new york, SYRACUSE, NEW YORK),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA),
(manhattan, MANHATTAN)
(new york pizza, NEW YORK-STYLE PIZZA)
…
✓
✗
✗
✓
✗
new york
manhattan
new york
new york
…
SUPERVISED
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
mention-entity pairs (obtained from the CER step) and generate all possible interpreta-
tions out of those. We further require that mentions within the same interpretation do
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
Entity-based features
computed for the
entire interpretation set
individual entity
features, aggregated
for all entities in the set
Set-based features
SUPERVISED
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
mention-entity pairs (obtained from the CER step) and generate all possible interpreta-
tions out of those. We further require that mentions within the same interpretation do
6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg
Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen-
dent (QD) or query independent (QI).
Set-based Features Type
CommonLinks(E) Number of common links in DBpedia: T
e2E out(e). QI
TotalLinks(E) Number of distinct links in DBpedia: S
e2E out(e) QI
JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E)
T otalLink(E)
QI
Jcorpora(E)‡
Jaccard similarity based on FACC:
|
T
e2E doc(e)|
|
S
e2E doc(e)|
QI
RelMW (E)‡
Relatedness similarity [25] according to FACC QI
P(E) Co-occurrence probability based on FACC:
|
T
e2E doc(e)|
T otalDocs
QI
H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI
Completeness(E)†
Completeness of set E as a graph: |edges(GE )|
|edges(K|E|)|
QI
LenRatioSet(E, q)§
Ratio of mentions length to the query length:
P
e2E |me|
|q|
QD
SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD
Entity-based Features
Links(e) Number of entity out-links in DBpedia QI
Commonness(e, m) Likelihood of entity e being the target link of mention m QD
Score(e, q) Entity ranking score, obtained from the CER step QD
iRank(e, q) Inverse of rank, obtained from the CER step: 1
rank(e,q)
QD
Sim(e, q) Similarity between query and the entity; Eq. (1) QD
ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD
‡
doc(e) represents all documents that have a link to entity e
†
GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices
§
me denotes the mention that corresponds to entity e
Set-based features
Entity-based features
Query independent

features
Method
Y-ERD ERD-dev
MAP R@5 P@1 MAP R@5 P@1
MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333
CMNS 0.7831N
0.8230N
0.7779N
0.7037 0.7222O
0.7556
MLMcg 0.8536NN
0.8997NN
0.8280NN
0.8543MN
0.9015 N
0.8444
LTR 0.8667NNN
0.9022NN
0.8479NNN
0.8606MN
0.9289MN
0.8222
Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig-
nificance for line i > 1 is tested against lines 1..i 1.
Method
Y-ERD ERD-dev
Prec Recall F1 Time Prec Recall F1 Time
MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085
MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185
LTR-LTR 0.731M
0.732M
0.731M
0.881 0.758 0.748 0.753 1.185
LTR-Greedy 0.786NNN
0.787NNN
0.787NNN
0.382 0.852NNM
0.828NM
0.840NNM
0.423
Candidate entity ranking Table 4 presents the results for CER on the Y-ERD and
ERD-dev datasets. We find that commonness is a strong performer (this is in line with
the findings of [1, 16]). Combining commonness with MLM in a generative model
(MLMcg) delivers excellent performance, with MAP above 0.85 and R@5 around 0.9.
The LTR approach can bring in further slight, but for Y-ERD significant, improvements.
This means that both of our CER methods (MLMcg and LTR) are able to find the vast
majority of the relevant entities and return them at the top ranks.
Disambiguation Table 5 reports on the disambiguation results. We use the naming
RESULTS
LTR-Greedy outperforming other approaches
significantly on both test sets, and is the second
most efficient one.
RESULTS
Comparison with the top performers of the ERD challenge
(on official ERD test platform)
LTR-Greedy approach performs on a par
with the state-of-the-art systems.
Remarkable finding considering the
complexity of other solutions.
bi, Krisztian Balog, and Svein Erik Bratsberg
Table 6. ELQ results on the of-
ficial ERD test platform.
Method F1
LTR-Greedy 0.699
SMAPH-2 [6] 0.708
NTUNLP [5] 0.680
Seznam [10] 0.669
ults, LTR-Greedy is our overall rec-
ompare this method against the top
RD challenge (using the official chal-
Table 6. For this comparison, we
spell checking, as this has also been
erforming system (SMAPH-2) [6].
hat our LTR-Greedy approach per-
h the state-of-the-art systems. This
g into account the simplicity of the
ion algorithm vs. the considerably
ons employed by others.
r results reveal that candidate entity ranking is of higher importance
for ELQ. Hence, it is more beneficial to perform the (expensive)
early on in the pipeline for the seemingly easier CER step; dis-
n be tackled successfully with an unsupervised (greedy) algorithm.
he top ranked entity does not yield an immediate solution; as shown
ion is an indispensable step in ELQ.)
top-3
systems
Entity Linking in Queries: Efficiency vs. Effectiveness 11
3 0.06 0.09 0.12
SetSim
LenRatioSet
P
ContextSim/avg
H
ContextSim/min
ContextSim/max
Commonness/avg
iRank/max
Commonness/min
iRank/min
iRank/avg
Score/max
Score/min
Score/avg
0.00 0.04 0.08 0.12 0.16 0.20
LTR-LTR
MLM-LTR
SimQ-label
SimQ-subject
SimQ-asbtract
Len
Links
Redirects
LenRatio
TEM
SimQ-wikiLink
Sim
QCT
SimQ-content
MCT
Commonness
Matches
0.00 0.03
t features used in the supervised approaches, sorted by Gini score: (Left)
ng, (Right) Disambiguation.
ore components: candidate entity ranking and disambiguation. For
FEATURE IMPORTANCE
LTR-LTR

MLMcg-LTR
WE ASKED …
If we are to allocate the
available processing time
between the two steps,
which one would yield the
highest gain?
RQ1?
Candidate entity ranking is of higher
importance than disambiguation. 

It is more beneficial to perform the
(expensive) supervised learning early
on, and tackle disambiguation with an
unsupervised algorithm.
Answer
WE ASKED …
Which group of features is
needed the most for
effective entity
disambiguation: contextual
similarity, interdependence
between entities, or both?
RQ2?
Contextual similarity features are the
most effective for entity disambiguation.



Entity interdependences are helpful
when sufficiently many entities are
mentioned in the text; not the case for
queries.
Answer
Efficiency vs. Effectiveness
Code and resources at:
http://bit.ly/ecir2017-elq
Thank you!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...Jie Bao
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in JuliaJiahao Chen
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP ProgrammersDave Stokes
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic WikisJie Bao
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishJinho Choi
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsJie Bao
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEduserv Foundation
 
A Computational Approach to Yoruba Morphology
A Computational Approach to Yoruba MorphologyA Computational Approach to Yoruba Morphology
A Computational Approach to Yoruba MorphologyGuy De Pauw
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog LanguageREHMAT ULLAH
 
Normalization 1 nf,2nf,3nf,bcnf
Normalization 1 nf,2nf,3nf,bcnf Normalization 1 nf,2nf,3nf,bcnf
Normalization 1 nf,2nf,3nf,bcnf Shriya agrawal
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Ballerina philosophy
Ballerina philosophy Ballerina philosophy
Ballerina philosophy Ballerina
 

Was ist angesagt? (20)

Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic Wikis
 
p138-jiang
p138-jiangp138-jiang
p138-jiang
 
DBMS CS2
DBMS CS2DBMS CS2
DBMS CS2
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in English
 
A Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description LogicsA Distributed Tableau Algorithm for Package-based Description Logics
A Distributed Tableau Algorithm for Package-based Description Logics
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadata
 
A Computational Approach to Yoruba Morphology
A Computational Approach to Yoruba MorphologyA Computational Approach to Yoruba Morphology
A Computational Approach to Yoruba Morphology
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
 
Normalization 1 nf,2nf,3nf,bcnf
Normalization 1 nf,2nf,3nf,bcnf Normalization 1 nf,2nf,3nf,bcnf
Normalization 1 nf,2nf,3nf,bcnf
 
Database management system session 5
Database management system session 5Database management system session 5
Database management system session 5
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Ballerina philosophy
Ballerina philosophy Ballerina philosophy
Ballerina philosophy
 

Ähnlich wie Entity Linking in Queries: Efficiency vs. Effectiveness

EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
RDBMS ER2 Relational
RDBMS ER2 RelationalRDBMS ER2 Relational
RDBMS ER2 RelationalSarmad Ali
 
Entity Resolution in Large Graphs
Entity Resolution in Large GraphsEntity Resolution in Large Graphs
Entity Resolution in Large GraphsApurva Kumar
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...Raj vardhan
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Er & eer to relational mapping
Er & eer to relational mappingEr & eer to relational mapping
Er & eer to relational mappingsaurabhshertukde
 
Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Nexgen Technology
 
Grouping search engine returned citations for person-name queries
Grouping search engine returned citations for person-name queriesGrouping search engine returned citations for person-name queries
Grouping search engine returned citations for person-name queriesAvelin Huo
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1JEAN-MICHEL LETENNIER
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.pptNaglaaFathy42
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relationalJafar Nesargi
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relationalJafar Nesargi
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relationalJafar Nesargi
 
chapter 3-Data Modelling using Entity Relationship .pdf
chapter 3-Data Modelling using Entity Relationship .pdfchapter 3-Data Modelling using Entity Relationship .pdf
chapter 3-Data Modelling using Entity Relationship .pdfMisganawAbeje1
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping dannyijwest
 
Week 4 The Relational Data Model & The Entity Relationship Data Model
Week 4 The Relational Data Model & The Entity Relationship Data ModelWeek 4 The Relational Data Model & The Entity Relationship Data Model
Week 4 The Relational Data Model & The Entity Relationship Data Modeloudesign
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 

Ähnlich wie Entity Linking in Queries: Efficiency vs. Effectiveness (20)

Unit 2 DBMS
Unit 2 DBMSUnit 2 DBMS
Unit 2 DBMS
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
RDBMS ER2 Relational
RDBMS ER2 RelationalRDBMS ER2 Relational
RDBMS ER2 Relational
 
Entity Resolution in Large Graphs
Entity Resolution in Large GraphsEntity Resolution in Large Graphs
Entity Resolution in Large Graphs
 
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
The Relational Data Model and Relational Database Constraints Ch5 (Navathe 4t...
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Er & eer to relational mapping
Er & eer to relational mappingEr & eer to relational mapping
Er & eer to relational mapping
 
ERD.pptx
ERD.pptxERD.pptx
ERD.pptx
 
Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,Entity linking with a knowledge base issues,
Entity linking with a knowledge base issues,
 
Grouping search engine returned citations for person-name queries
Grouping search engine returned citations for person-name queriesGrouping search engine returned citations for person-name queries
Grouping search engine returned citations for person-name queries
 
Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1Towards a New Data Modelling Architecture - Part 1
Towards a New Data Modelling Architecture - Part 1
 
Lec2_Information Integration.ppt
 Lec2_Information Integration.ppt Lec2_Information Integration.ppt
Lec2_Information Integration.ppt
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relational
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relational
 
Chapter 6 relational data model and relational
Chapter  6  relational data model and relationalChapter  6  relational data model and relational
Chapter 6 relational data model and relational
 
chapter 3-Data Modelling using Entity Relationship .pdf
chapter 3-Data Modelling using Entity Relationship .pdfchapter 3-Data Modelling using Entity Relationship .pdf
chapter 3-Data Modelling using Entity Relationship .pdf
 
Secured Ontology Mapping
Secured Ontology Mapping Secured Ontology Mapping
Secured Ontology Mapping
 
Week 4 The Relational Data Model & The Entity Relationship Data Model
Week 4 The Relational Data Model & The Entity Relationship Data ModelWeek 4 The Relational Data Model & The Entity Relationship Data Model
Week 4 The Relational Data Model & The Entity Relationship Data Model
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 

Kürzlich hochgeladen

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

Kürzlich hochgeladen (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

Entity Linking in Queries: Efficiency vs. Effectiveness

  • 1. Faegheh Hasibi, Krisztian Balog, Svein E. Bratsberg ECIR 2017 ENTITY LINKING IN QUERIES: Efficiency vs. Effectiveness IAI_group
  • 2. ENTITY LINKING TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER PHILIP K. DICK ‣ Identifying entities in the text and linking them to the corresponding entry in the knowledge base Total Recall is a 1990 American science fiction action film directed by Paul Verhoeven, starring Rachel Ticotin, Sharon Stone, Arnold Schwarzenegger, Ronny Cox and Michael Ironside. The film is loosely based on the Philip K. Dick story…
  • 3. TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER ENTITY LINKING IN QUERIES (ELQ) total recall arnold schwarzenegger
  • 4. TOTAL RECALL (1990 FILM) ARNOLD SCHWARZENEGGER ENTITY LINKING IN QUERIES (ELQ) total recall arnold schwarzenegger total recall TOTAL RECALL (2012 FILM)TOTAL RECALL (1990 FILM)
  • 5. Query Entity Linking Interpretation ‣ Identifying sets of entity linking interpretations ENTITY LINKING IN QUERIES (ELQ) [Carmel et al. 2014], [Hasibi et al. 2015], [Cornolti et al. 2016]
  • 6. CONVENTIONAL APPROACH Mention detection Candidate Entity Ranking Disambiguation new york new york pizza 0.8 NEW YORK NEW YORK CITY .. NEW YORK-STYLE PIZZA SICILIAN_PIZZA … 0.7 0.1 m e 0.9 (new york, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) s(m,e) interpretation sets 1 2 …
  • 7. Incorporating different signals: CONVENTIONAL APPROACH ‣ Contextual similarity between a candidate entity and text (or entity mention) ‣ Interdependence between all entity linking decisions (extracted from the underlying KB) e1 e2 e3 e4e5
  • 8. What is special about entity linking in queries? What is really different when it comes to the queries?
  • 9. RESEARCH QUESTIONS ELQ is an online process and should be done under strict time constraints If we are to allocate the available processing time between the two steps, which one would yield the highest gain? RQ1?
  • 10. Which group of features is needed the most for effective entity linking: contextual similarity, interdependence between entities, or both? RQ2? RESEARCH QUESTIONS The context provided by the queries is limited

  • 11. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 1 2
  • 12. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 1 2
  • 13. Estimate: score(m, e, q) Given: • mention m • entity e • query q CANDIDATE ENTITY RANKING (I) Identify all possible entities that can be linked in the query (II) Rank them based on how likely they are link targets Goal ‣ lexical matching of query n-grams against a rich dictionary of entity name variants
  • 14. UNSUPERVISED [MLMcg] rectly in the subsequent disambiguation step. Using lexical matching of query n-g against a rich dictionary of entity name variants allows for the identification of ca date entities with close to perfect recall [16]. We follow this approach to obtain a l candidate entities together with their corresponding mentions in the query. Our f of attention below is on ranking these candidate (m, e) pairs with respect to the q i.e., estimating score(m, e, q). Unsupervised For the unsupervised ranking approach, we take a state-of-the-art g ative model, specifically, the MLMcg model proposed by Hasibi et al. [16]. This m considers both the likelihood of the given mention and the similarity between the q and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the probability mention being linked to an entity (a.k.a. commonness [22]), computed from the F collection [12]. The query likelihood P(q|e) is estimated using the query length malized language model similarity [20]: P(q|e) = Q t2q P(t|✓e)P (t|q) Q t2q P(t|C)P (t|q) , where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q|). The e and collection language models, P(t|✓e) and P(t|C), are computed using the Mix of Language Models (MLM) approach [27]. against a rich dictionary of entity name variants allows for the identificat date entities with close to perfect recall [16]. We follow this approach to o candidate entities together with their corresponding mentions in the quer of attention below is on ranking these candidate (m, e) pairs with respect i.e., estimating score(m, e, q). Unsupervised For the unsupervised ranking approach, we take a state-of-t ative model, specifically, the MLMcg model proposed by Hasibi et al. [16] considers both the likelihood of the given mention and the similarity betwe and the entity: score(m, e, q) = P(e|m)P(q|e), where P(e|m) is the pro mention being linked to an entity (a.k.a. commonness [22]), computed fro collection [12]. The query likelihood P(q|e) is estimated using the query malized language model similarity [20]: P(q|e) = Q t2q P(t|✓e)P (t|q) Q t2q P(t|C)P (t|q) , where P(t|q) is the term’s relative frequency in the query (i.e., n(t, q)/|q and collection language models, P(t|✓e) and P(t|C), are computed using Generative model for ranking entities based on the query and the specific mention Prob. mention being linked to the entity query likelihood based on
 Mixture of Language Models (MLM) [Hasibi et al. 2015]
  • 15. SUPERVISED [LTR] 28 features mainly from literature Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- [Meij et al. 2013], [Medelyan et al. 2008]
  • 16. SUPERVISED [LTR] 28 features mainly from literature Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and supervised alternatives, by adapting existing methods from the literature. We further Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- tions {E1, ..., Em}. Similar to the previous step, we examine both unsupervised and Entity Linking in Queries: Efficiency vs. Effectiveness 5 Table 2. Feature set used for ranking entities, categorized to mention (M), entity (E), mention- entity (ME), and query (Q) features. Feature Description Type Len(m) Number of terms in the entity mention M NTEM(m)‡ Number of entities whose title equals the mention M SMIL(m)‡ Number of entities whose title equals part of the mention M Matches(m) Number of entities whose surface form matches the mention M Redirects(e) Number of redirect pages linking to the entity E Links(e) Number of entity out-links in DBpedia E Commonness(e, m) Likelihood of entity e being the target link of mention m ME MCT(e, m)‡ True if the mention contains the title of the entity ME TCM(e, m)‡ True if title of the entity contains the mention ME TEM(e, m)‡ True if title of the entity equals the mention ME Pos1(e, m) Position of the 1st occurrence of the mention in entity abstract ME SimMf (e, m)† Similarity between mention and field f of entity; Eq. (1) ME LenRatio(m, q) Mention to query length ratio: |m| |q| Q QCT(e, q) True if the query contains the title of the entity Q TCQ(e, q) True if the title of entity contains the query Q TEQ(e, q) True if the title of entity is equal query Q Sim(e, q) Similarity between query and entity; Eq. (1) Q SimQf (e, q)† LM similarity between query and field f of entity; Eq. (1) Q ‡ Entity title refers to the rdfs:label predicate of the entity in DBpedia † Computed for all individual DBpedia fields f 2 F and also for field content (cf. Sec 4.1) . 3.2 Disambiguation The disambiguation step is concerned with the formation of entity linking interpreta- [Meij et al. 2012], [Medelyan et al. 2008] Extracted based on:
 Mention, Entity, Mention-entity, Query
  • 17. TEST COLLECTIONS ‣ Y-ERD • Based on the Yahoo Search Query Log to Entities dataset • 2398 queries ‣ ERD-dev • Released as part of the ERD challenge 2014 • 91 queries [Carmel et al. 2014], [Hasibi et al. 2015]
  • 18. Entity Linking in Queries: Efficiency vs. Effectiveness 9 Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423 RESULTS Both methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks.
  • 19. RESULTS Both methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks. Entity Linking in Queries: Efficiency vs. Effectiveness 9 Table 4. Candidate entity ranking results on the Y-ERD and ERD-dev datasets. Best scores for each metric are in boldface. Significance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423
  • 20. SYSTEMATIC INVESTIGATION METHODS: (1) Unsupervised (Greedy) (2) Supervised (LTR) METHODS: (1) Unsupervised (MLMcg) (2) Supervised (LTR) Candidate Entity Ranking Disambiguation 21
  • 21. Identify interpretation set(s): Ei ={(m1,e1),...,(mk,ek)} where:
 mentions of each set are
 non-overlapping DISAMBIGUATION Goal (new york, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) interpretation sets
  • 22. UNSUPERVISED - [Greedy] Algorithm 1 Greedy Interpretation Finding (GIF) Input: Ranked list of mention-entity pairs M; score threshold s Output: Interpretations I = {E1, ..., Em} begin 1: M0 Prune(M, s) 2: M0 PruneContainmentMentions(M0 ) 3: I CreateInterpretations(M0 ) 4: return I end 1: function CREATEINTERPRETATIONS(M) 2: I {;} 3: for (m, e) in M do 4: h 0 5: for E in I do 6: if ¬ hasOverlap(E, (m, e)) then 7: E.add((m, e)) 8: h 1 9: end if 10: end for 11: if h == 0 then 12: I.add({(m, e)}) 13: end if 14: end for 15: return I 16: end function (I) Pruning of mention-entity pairs Discards the ones with score < threshold τs
 For containment mentions, keep only the highest scoring one (II) Set generation Adding mention-entity pairs in decreasing order of score to an interpretation set
  • 23. SUPERVISED (I) Generate all interpretations (from the top-K entity-mention pairs) (II) Collectively select the interpretations with a binary classifier Collective disambiguation new york manhattan 0.8 NEW YORK CITY NEW YORK MANHATTAN MANHATTAN (FILM) 0.7 0.2 m e 0.8 (new york, NEW YORK), (manhattan, MANHATTAN) (new york, NEW YORK CITY), (manhattan, MANHATTAN) interpretation sets … SYRACUSE, NEW YORK ALBANY,_NEW_YORK new york pizza NEW YORK-STYLE PIZZA 0.9 K 0.5 0.2 (new york, SYRACUSE, NEW YORK), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA), (manhattan, MANHATTAN) (new york pizza, NEW YORK-STYLE PIZZA) … ✓ ✗ ✗ ✓ ✗ new york manhattan new york new york …
  • 24. SUPERVISED 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e mention-entity pairs (obtained from the CER step) and generate all possible interpreta- tions out of those. We further require that mentions within the same interpretation do 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e Entity-based features computed for the entire interpretation set individual entity features, aggregated for all entities in the set Set-based features
  • 25. SUPERVISED 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e mention-entity pairs (obtained from the CER step) and generate all possible interpreta- tions out of those. We further require that mentions within the same interpretation do 6 Faegheh Hasibi, Krisztian Balog, and Svein Erik Bratsberg Table 3. Feature set used in the supervised disambiguation approach. Type is either query depen- dent (QD) or query independent (QI). Set-based Features Type CommonLinks(E) Number of common links in DBpedia: T e2E out(e). QI TotalLinks(E) Number of distinct links in DBpedia: S e2E out(e) QI JKB(E) Jaccard similarity based on DBpedia: CommonLinks(E) T otalLink(E) QI Jcorpora(E)‡ Jaccard similarity based on FACC: | T e2E doc(e)| | S e2E doc(e)| QI RelMW (E)‡ Relatedness similarity [25] according to FACC QI P(E) Co-occurrence probability based on FACC: | T e2E doc(e)| T otalDocs QI H(E) Entropy of E: P (E)log(P (E)) (1 P (E))log(1 P (E)) QI Completeness(E)† Completeness of set E as a graph: |edges(GE )| |edges(K|E|)| QI LenRatioSet(E, q)§ Ratio of mentions length to the query length: P e2E |me| |q| QD SetSim(E, q) Similarity between query and the entities in the set; Eq (2) QD Entity-based Features Links(e) Number of entity out-links in DBpedia QI Commonness(e, m) Likelihood of entity e being the target link of mention m QD Score(e, q) Entity ranking score, obtained from the CER step QD iRank(e, q) Inverse of rank, obtained from the CER step: 1 rank(e,q) QD Sim(e, q) Similarity between query and the entity; Eq. (1) QD ContextSim(e, q) Contextual similarity between query and entity; Eq (3) QD ‡ doc(e) represents all documents that have a link to entity e † GE is a DBpedia subgraph containing only entities from E; and K|E| is a complete graph of |E| vertices § me denotes the mention that corresponds to entity e Set-based features Entity-based features Query independent
 features
  • 26. Method Y-ERD ERD-dev MAP R@5 P@1 MAP R@5 P@1 MLM 0.7507 0.8556 0.6839 0.7675 0.8622 0.7333 CMNS 0.7831N 0.8230N 0.7779N 0.7037 0.7222O 0.7556 MLMcg 0.8536NN 0.8997NN 0.8280NN 0.8543MN 0.9015 N 0.8444 LTR 0.8667NNN 0.9022NN 0.8479NNN 0.8606MN 0.9289MN 0.8222 Table 5. End-to-end performance of ELQ systems on the Y-ERD and ERD-dev query sets. Sig- nificance for line i > 1 is tested against lines 1..i 1. Method Y-ERD ERD-dev Prec Recall F1 Time Prec Recall F1 Time MLMcg-Greedy 0.709 0.709 0.709 0.058 0.724 0.712 0.713 0.085 MLMcg-LTR 0.725 0.724 0.724 0.893 0.725 0.731 0.728 1.185 LTR-LTR 0.731M 0.732M 0.731M 0.881 0.758 0.748 0.753 1.185 LTR-Greedy 0.786NNN 0.787NNN 0.787NNN 0.382 0.852NNM 0.828NM 0.840NNM 0.423 Candidate entity ranking Table 4 presents the results for CER on the Y-ERD and ERD-dev datasets. We find that commonness is a strong performer (this is in line with the findings of [1, 16]). Combining commonness with MLM in a generative model (MLMcg) delivers excellent performance, with MAP above 0.85 and R@5 around 0.9. The LTR approach can bring in further slight, but for Y-ERD significant, improvements. This means that both of our CER methods (MLMcg and LTR) are able to find the vast majority of the relevant entities and return them at the top ranks. Disambiguation Table 5 reports on the disambiguation results. We use the naming RESULTS LTR-Greedy outperforming other approaches significantly on both test sets, and is the second most efficient one.
  • 27. RESULTS Comparison with the top performers of the ERD challenge (on official ERD test platform) LTR-Greedy approach performs on a par with the state-of-the-art systems. Remarkable finding considering the complexity of other solutions. bi, Krisztian Balog, and Svein Erik Bratsberg Table 6. ELQ results on the of- ficial ERD test platform. Method F1 LTR-Greedy 0.699 SMAPH-2 [6] 0.708 NTUNLP [5] 0.680 Seznam [10] 0.669 ults, LTR-Greedy is our overall rec- ompare this method against the top RD challenge (using the official chal- Table 6. For this comparison, we spell checking, as this has also been erforming system (SMAPH-2) [6]. hat our LTR-Greedy approach per- h the state-of-the-art systems. This g into account the simplicity of the ion algorithm vs. the considerably ons employed by others. r results reveal that candidate entity ranking is of higher importance for ELQ. Hence, it is more beneficial to perform the (expensive) early on in the pipeline for the seemingly easier CER step; dis- n be tackled successfully with an unsupervised (greedy) algorithm. he top ranked entity does not yield an immediate solution; as shown ion is an indispensable step in ELQ.) top-3 systems
  • 28. Entity Linking in Queries: Efficiency vs. Effectiveness 11 3 0.06 0.09 0.12 SetSim LenRatioSet P ContextSim/avg H ContextSim/min ContextSim/max Commonness/avg iRank/max Commonness/min iRank/min iRank/avg Score/max Score/min Score/avg 0.00 0.04 0.08 0.12 0.16 0.20 LTR-LTR MLM-LTR SimQ-label SimQ-subject SimQ-asbtract Len Links Redirects LenRatio TEM SimQ-wikiLink Sim QCT SimQ-content MCT Commonness Matches 0.00 0.03 t features used in the supervised approaches, sorted by Gini score: (Left) ng, (Right) Disambiguation. ore components: candidate entity ranking and disambiguation. For FEATURE IMPORTANCE LTR-LTR
 MLMcg-LTR
  • 29. WE ASKED … If we are to allocate the available processing time between the two steps, which one would yield the highest gain? RQ1? Candidate entity ranking is of higher importance than disambiguation. 
 It is more beneficial to perform the (expensive) supervised learning early on, and tackle disambiguation with an unsupervised algorithm. Answer
  • 30. WE ASKED … Which group of features is needed the most for effective entity disambiguation: contextual similarity, interdependence between entities, or both? RQ2? Contextual similarity features are the most effective for entity disambiguation.
 
 Entity interdependences are helpful when sufficiently many entities are mentioned in the text; not the case for queries. Answer
  • 32. Code and resources at: http://bit.ly/ecir2017-elq Thank you! Questions?