On the Reproducibility of the TAGME entity linking system

On the Reproducibility of the
TAGME Entity Linking System
Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
ECIR conference, March 2016

England national football
team
1966 FIFA World Cup Argentina national football
team
Entity linking
British expatriates brought football to Argentina in the 19th
century.  
The rivalry between the England and Argentina national
football teams, however, is generally traced back to the  
1966 FIFA World Cup.

TAGME
• Cited:
‣ >50 times based on ACM DL
‣ > 200 based on Google scholar
• One of the must-have baselines
• Delivers competitive results
‣ 1st and 2nd ranked systems of the ERD
challenge used TAGME
- P. Ferragina and U. Scaiella. TAGME: On-the-ﬂy annotation of short text fragments (by Wikipedia
entities). In Proc. of CIKM ’10, pages 1625–1628, 2010.
- P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. CoRR,
abs/1006.3498, 2010.

Offers invaluable sources for reproducibility:
• Test collections
• Source code
• RESTful API
• GUI demo
TAGME

In this talk …
✓ Repeatability
✓ Reproducibility
✓ Generalizability
J. Arguello, F. Diaz, J. Lin, and A. Trotman. SIGIR 2015 Workshop on Reproducibility, Inexplicability,
and Generalizability of Results (RIGOR). In Proc. of SIGIR '15, pages1147-1148.

In this talk …
✓ Repeatability
✓ Reproducibility
✓ Generalizability
“Repeating a previous result under the original conditions
(e.g., same dataset and system conﬁguration).”
“Reproducing a previous result under different, but comparable
conditions (e.g., different, but comparable dataset).”
“Applying an existing, empirically validated technique to a different IR
task/domain than the original.”
J. Arguello, F. Diaz, J. Lin, and A. Trotman. SIGIR 2015 Workshop on Reproducibility, Inexplicability,
and Generalizability of Results (RIGOR). In Proc. of SIGIR '15, pages1147-1148.

Question
• Does the code actually implement what is described in the paper?
• We need to (re)implement the entity linking method
• Integrating in a larger framework
• Making a (fair) comparison between different entity linking
approaches
What is the point of reproducibility experiments
when the source code is made available?

Our goal is …
• Learn about reproducibility
• Veriﬁcation
• Criticism

Agenda
• Overview of TAGME
• Repeatability
• Reproducibility
• Generalizability
• Lessons to be learned

Approach
British expatriates brought football to
Argentina in the 19th century.
The rivalry between the England and
Argentina national …
- England
- England National Football Team
- England Cricket Team
…
British: United Kingdom
expatriates: Expatriate
century: Century (song)
England: England
…
British: United Kingdom
expatriates: Expatriate
century: Century (song)
England: England
…
Parsing Disambiguation PruningText
Annotated
text

Test collections
• Wiki-Disamb30
‣ For evaluating disambiguation phase
‣ Each snippet is linked to single entity
• Wiki-Annot30
‣ For evaluating end-to-end performance
‣ All entity mentions are annotated 
☞ Number of snippets deviate from what is reported in the paper
#Snippets Original Paper
Wiki-Disamb30 2M 1.4M
Wiki-Annot30 185K 180K

Repeatability
Repeating previous results under the original conditions

Repeatability challenges
• Unavailability of Wikipedia dump Nov. 2009
‣ Could not be provided by the TAGME authors
• Unavailability of training and test set splits
• Discrepancies between the number of snippets

Repeatability
‣ Weka could load 1.4M snippets of Wiki-Disamb30
‣ Whole Wiki-Annot30 is used; difference is a matter of approximation
TAGME results are not repeatable due to
unavailability of data
Post-acceptance responses:

Reproducibility
Reproducing results under different, but comparable conditions

Reproducibility
TAGME paper results are compared with:
• TAGME API (similar to running the source code)
• implementation of TAGME
• Our implementation
D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani. Dexter: An open source framework for
entity linking. In Proc. of the Sixth International Workshop on Exploiting Semantic Annotations in
Information Retrieval, pages 17–20, 2013.

Implementation
• Implementation is based on the paper
• Whenever in doubt: checking the source code
• The closet available Wikipedia dump: April 2010

Implementation
Link probability:
Number of times mention m appears as a link
Number of times mention m occurs in Wikipedia (as a link or not)

Implementation
Link probability:
‣ Due to efﬁciency reasons TAGME makes estimations
Number of articles containing the mention m
➝

Implementation
Link probability:
Number of articles containing the mention m
➝
Number of articles mention m is linked to an entity
(Wikipedia creates link for the ﬁrst occurrence of an entity)
≈

Implementation
Link probability:
‣ In fact, TAGME implements Keyphraseness:

Implementation
Relatedness:
• Deﬁned as:

Implementation
Relatedness:
• Deﬁned as:• Implemented as:

Implementation
Relatedness:
• Deﬁned as:• Implemented as:
Pruning based on commonness:
• TAGME performs and extra pruning in the parsing step
• We followed TAGME, as it makes the system considerably faster

Table 1
Results to be reproduced:
Approach:
• Submit Wiki-Disamb30 snippets to the TAGME API
• Set the pruning threshold to 0

Table 1-Evaluation metrics
Several questions are left unanswered:
• Are the metrics micro- or macro-averaged?
• What are the matching criteria for the mentions?
‣ E.g. “New York City” = “New York”
?

Table 1-Evaluation metrics
We computed the upper bound:
• If any of the entities matches the ground truth:
‣ Precision =1 Recall = 1
• Otherwise:
‣ Precision =0 Recall = 0
Other interpretation of precision or recall would result in a lower number.

Table 1- Results
Given the magnitude of the differences, even against their own API,
we did not get the results for our implementation.
Reproducing of the disambiguation phase:

Table 1- Results
Our initial guess:
• Discrepancy between the number of snippets made differences
‣ TAGME performs extra (undocumented) ﬁltering before pruning
‣ Computation of evaluation metrics are explained

Table 2
Results to be reproduced:

Table 2- Results
TAGME results are reproducible through
its own API.

Table 2- Results
‣ TAGME uses wiki page-to-page link records, while our (and Dexter’s)
implementation extracts links from the body of the pages.
‣ TAGME API and Source code corresponds to a newer version (v.2)
‣ Several optimizations has been performed in v.2
‣ The evaluation metrics are micro-averaged

Generalizability
Applying an existing technique to a different IR task/domain

Entity Linking in Queries
(ELQ)
Entity linking Entity linking in queries
“new york pizza manhattan” {New York City, Manhattan}
{New York City, Manhattan}
{New York-style Pizza, Manhattan}
“cambridge population” {Cambridge}
{Cambridge}
{Cambridge, Massachusetts}
- F. Hasibi, K. Balog, and S. E. Bratsberg. Entity Linking in Queries: Tasks and Evaluation. In Proc. of the
ICTIR ’15, pages 171–180, 2015.
- D. Carmel, M.-W. Chang, E. Gabrilovich,B.-J.P. Hsu, and K. Wang. ERD’14: Entity recognition and
disambiguation challenge. SIGIR Forum, 48(2):63–77, 2014.

Why ELQ?
TAGME has great potential to be used for ELQ
✓ Designed to operate with short texts
✓ On-ﬂy-annotation

Generalizability
TAGME results are generalizable to the task of
entity linking in queries.
TAGME API > Dexter > TAGME-wp12 > TAGME-wp10

Lessons learned 1/2
• All technical details that affect performance should
be mentioned in the paper
• Differences between the published approach and
publicly API/code should be made explicit

Lessons learned 2/2
• Evaluation metrics should be explained in detail
• Keep all data sources used in a published paper
Maintain an “online appendix” to a publication
• Extra details can be explained there
• Can be easily edited and extended

Thanks!
Questions?
Check our online appendix
http://bit.ly/tagme-rep

On the Reproducibility of the TAGME entity linking system

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie On the Reproducibility of the TAGME entity linking system

Ähnlich wie On the Reproducibility of the TAGME entity linking system (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

On the Reproducibility of the TAGME entity linking system