Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

On the Reproducibility of the TAGME entity linking system

1.235 Aufrufe

Veröffentlicht am

Slide for the ECIR '16 paper: “On the reproducibility of the TAGME Entity Linking System”

Reproducibility is a fundamental requirement of scientific research. In this paper, we examine the repeatability, reproducibility, and generalizability of TAGME, one of the most popular entity linking systems. By comparing results obtained from its public API with (re)implementations from scratch, we obtain the following findings. The results reported in the TAGME paper cannot be repeated due to the unavailability of data sources. Part of the results are reproducible through the provided API, while the rest are not reproducible. We further show that the TAGME approach is generalizable to the task of entity linking in queries. Finally, we provide insights gained during this process and formulate lessons learned to inform future reducibility efforts.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

On the Reproducibility of the TAGME entity linking system

  1. 1. On the Reproducibility of the TAGME Entity Linking System Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg ECIR conference, March 2016
  2. 2. England national football team 1966 FIFA World Cup Argentina national football team Entity linking British expatriates brought football to Argentina in the 19th century. 
 The rivalry between the England and Argentina national football teams, however, is generally traced back to the 
 1966 FIFA World Cup.
  3. 3. TAGME
  4. 4. TAGME • Cited: ‣ >50 times based on ACM DL ‣ > 200 based on Google scholar • One of the must-have baselines • Delivers competitive results ‣ 1st and 2nd ranked systems of the ERD challenge used TAGME - P. Ferragina and U. Scaiella. TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities). In Proc. of CIKM ’10, pages 1625–1628, 2010. - P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with Wikipedia pages. CoRR, abs/1006.3498, 2010.
  5. 5. Offers invaluable sources for reproducibility: • Test collections • Source code • RESTful API • GUI demo TAGME
  6. 6. In this talk … ✓ Repeatability ✓ Reproducibility ✓ Generalizability J. Arguello, F. Diaz, J. Lin, and A. Trotman. SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). In Proc. of SIGIR '15, pages1147-1148.
  7. 7. In this talk … ✓ Repeatability ✓ Reproducibility ✓ Generalizability “Repeating a previous result under the original conditions (e.g., same dataset and system configuration).” “Reproducing a previous result under different, but comparable conditions (e.g., different, but comparable dataset).” “Applying an existing, empirically validated technique to a different IR task/domain than the original.” J. Arguello, F. Diaz, J. Lin, and A. Trotman. SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR). In Proc. of SIGIR '15, pages1147-1148.
  8. 8. Question • Does the code actually implement what is described in the paper? • We need to (re)implement the entity linking method • Integrating in a larger framework • Making a (fair) comparison between different entity linking approaches What is the point of reproducibility experiments when the source code is made available?
  9. 9. Our goal is … • Learn about reproducibility • Verification • Criticism
  10. 10. Agenda • Overview of TAGME • Repeatability • Reproducibility • Generalizability • Lessons to be learned
  11. 11. Agenda • Overview of TAGME • Repeatability • Reproducibility • Generalizability • Lessons to be learned
  12. 12. Approach British expatriates brought football to Argentina in the 19th century. The rivalry between the England and Argentina national … - England - England National Football Team - England Cricket Team … British: United Kingdom expatriates: Expatriate century: Century (song) England: England … British: United Kingdom expatriates: Expatriate century: Century (song) England: England … Parsing Disambiguation PruningText Annotated text
  13. 13. Test collections • Wiki-Disamb30 ‣ For evaluating disambiguation phase ‣ Each snippet is linked to single entity • Wiki-Annot30 ‣ For evaluating end-to-end performance ‣ All entity mentions are annotated
 ☞ Number of snippets deviate from what is reported in the paper #Snippets Original Paper Wiki-Disamb30 2M 1.4M Wiki-Annot30 185K 180K
  14. 14. Repeatability Repeating previous results under the original conditions
  15. 15. Repeatability challenges • Unavailability of Wikipedia dump Nov. 2009 ‣ Could not be provided by the TAGME authors • Unavailability of training and test set splits • Discrepancies between the number of snippets
  16. 16. Repeatability ‣ Weka could load 1.4M snippets of Wiki-Disamb30 ‣ Whole Wiki-Annot30 is used; difference is a matter of approximation TAGME results are not repeatable due to unavailability of data Post-acceptance responses:
  17. 17. Reproducibility Reproducing results under different, but comparable conditions
  18. 18. Reproducibility TAGME paper results are compared with: • TAGME API (similar to running the source code) • implementation of TAGME • Our implementation D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani. Dexter: An open source framework for entity linking. In Proc. of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, pages 17–20, 2013.
  19. 19. Implementation • Implementation is based on the paper • Whenever in doubt: checking the source code • The closet available Wikipedia dump: April 2010
  20. 20. Implementation Link probability: Number of times mention m appears as a link Number of times mention m occurs in Wikipedia (as a link or not)
  21. 21. Implementation Link probability: ‣ Due to efficiency reasons TAGME makes estimations Number of articles containing the mention m ➝
  22. 22. Implementation Link probability: ‣ Due to efficiency reasons TAGME makes estimations Number of articles containing the mention m ➝ Number of articles mention m is linked to an entity (Wikipedia creates link for the first occurrence of an entity) ≈
  23. 23. Implementation Link probability: ‣ Due to efficiency reasons TAGME makes estimations ‣ In fact, TAGME implements Keyphraseness:
  24. 24. Implementation Relatedness: • Defined as:
  25. 25. Implementation Relatedness: • Defined as:• Implemented as:
  26. 26. Implementation Relatedness: • Defined as:• Implemented as: Pruning based on commonness: • TAGME performs and extra pruning in the parsing step • We followed TAGME, as it makes the system considerably faster
  27. 27. Table 1 Results to be reproduced: Approach: • Submit Wiki-Disamb30 snippets to the TAGME API • Set the pruning threshold to 0
  28. 28. Table 1-Evaluation metrics Several questions are left unanswered: • Are the metrics micro- or macro-averaged? • What are the matching criteria for the mentions? ‣ E.g. “New York City” = “New York” ?
  29. 29. Table 1-Evaluation metrics We computed the upper bound: • If any of the entities matches the ground truth: ‣ Precision =1 Recall = 1 • Otherwise: ‣ Precision =0 Recall = 0 Other interpretation of precision or recall would result in a lower number.
  30. 30. Table 1- Results Given the magnitude of the differences, even against their own API, we did not get the results for our implementation. Reproducing of the disambiguation phase:
  31. 31. Table 1- Results Our initial guess: • Discrepancy between the number of snippets made differences Post-acceptance responses: ‣ TAGME performs extra (undocumented) filtering before pruning ‣ Computation of evaluation metrics are explained
  32. 32. Table 2 Results to be reproduced:
  33. 33. Table 2- Results TAGME results are reproducible through its own API.
  34. 34. Table 2- Results Post-acceptance responses: ‣ TAGME uses wiki page-to-page link records, while our (and Dexter’s) implementation extracts links from the body of the pages. ‣ TAGME API and Source code corresponds to a newer version (v.2) ‣ Several optimizations has been performed in v.2 ‣ The evaluation metrics are micro-averaged
  35. 35. Generalizability Applying an existing technique to a different IR task/domain
  36. 36. Entity Linking in Queries (ELQ) Entity linking Entity linking in queries “new york pizza manhattan” {New York City, Manhattan} {New York City, Manhattan} {New York-style Pizza, Manhattan} “cambridge population” {Cambridge} {Cambridge} {Cambridge, Massachusetts} - F. Hasibi, K. Balog, and S. E. Bratsberg. Entity Linking in Queries: Tasks and Evaluation. In Proc. of the ICTIR ’15, pages 171–180, 2015. - D. Carmel, M.-W. Chang, E. Gabrilovich,B.-J.P. Hsu, and K. Wang. ERD’14: Entity recognition and disambiguation challenge. SIGIR Forum, 48(2):63–77, 2014.
  37. 37. Why ELQ? TAGME has great potential to be used for ELQ ✓ Designed to operate with short texts ✓ On-fly-annotation
  38. 38. Generalizability TAGME results are generalizable to the task of entity linking in queries. TAGME API > Dexter > TAGME-wp12 > TAGME-wp10
  39. 39. Lessons learned
  40. 40. Lessons learned 1/2 • All technical details that affect performance should be mentioned in the paper • Differences between the published approach and publicly API/code should be made explicit
  41. 41. Lessons learned 2/2 • Evaluation metrics should be explained in detail • Keep all data sources used in a published paper Maintain an “online appendix” to a publication • Extra details can be explained there • Can be easily edited and extended
  42. 42. Thanks! Questions? Check our online appendix http://bit.ly/tagme-rep

×