SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5
Expression extractions should be improved and implemented on open source software. The careful use of natural language processing
algorithms could provide better filtering metrics and support in expression merging
The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering
important entities that could have been excluded by the automatic filtering.
Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo)
OOV IV
LATTICE Lab
CNRS – Ecole Normale Supérieure
U Paris 3 Sorbonne Nouvelle
ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators
Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie
pablo.ruiz.fabo@ens.fr
Our users’ needs in Entity Linking (EL)
o Target users: social science researchers
o Performance of EL systems varies widely depending on corpus
characteristics and types of entities required
o Difficult for users to choose optimal EL system for their corpora
o Our target users wish to filter EL results, making informed
choices about entities to keep and discard
o Public open source tools
o Combine outputs of several tools to get complementary results
o Providing metrics for users to evaluate quality of an annotation
o Simultaneous access to metrics and text to validate annotations
o Besides manual selection, automatic selection also possible via
weighted voting of annotations
The Problem Our Approach
Demo features
TRAFFIC-LIGHT MATRIX FORMAT
o Annotation confidence scores provided by EL services
o Measures of coherence between an entity and the most
representative entities in the corpus
› Wikipedia Link-based Measure: Relatedness between two entities
as a function of Wikipedia pages linking to both and linking to one only
Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4])
› Other possible measures
• Distance between entities’ categories in a Wikipedia
category graph
Corpus: subset of PoliInformatics [2], about 2008 US financial crisis
(1) Query via Search Text displays:
• Document Panel: Documents matching the query
• Entity Panel: Entities extracted in the documents matching the
query displayed on doc. panel, plus:
(2) Confidence Scores for each annotator, normalized to a 0-1
range. (T=Tagme, S=Spotlight, W=Wikipedia Miner).
(3) Coherence score between the entity and a representative
subset of the corpus entities.
(4) Entities not coherent with the corpus are flagged in red.
(5) Query via Search Entities displays:
• Entity Panel: Entities matching the query.
• Document Panel: Documents containing one of the entities
displayed on the entity panel.
(6) Refine Search: Entities can be selected with a list of types
(like ORG) or selected individually with checkboxes.
(7) The Auto-Selection tab shows the output of an automatic
filtering via weighted voting of annotations.
(8) Charts: examples of co-occurrence networks, created offline
exploiting workflow information (sentence number, confidence, …)
0.0
1.0
Scale
DOC.PANELENTITYPANEL 1
5
3
4
6
2
7
8
System workflows
o User always has access to full results, but the workflow can
select a subset of the annotations automatically.
o Workflow combines, via weighted voting, outputs of:
TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy
o Votes are weighted according to each annotator’s precision on
two reference corpora (IITB and AIDA/CONLL B), depending on
whether user requires annotations for common-noun entity
mentions or not.
on demo not shown on demo
Evaluation
o Automatic EL system combination improved results over each
individual system’s results ([5], our *SEM poster).
o Assessed with strong annotation match and entity match [6] on
four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT.
[1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text
Analytics. Sociologica, 3:1-17. Il Mulino, Bologna.
[2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL
LACSS Workshop.
[3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from
Wikipedia links. In Proc AAAI WS on Wikipedia and AI.
[4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP.
[5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through
weighted voting. In Proc. *SEM.
[6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation
systems. In Proc. of WWW, 249-260.
Metrics to assist in manual filtering
Annotation voting for automatic filtering
DEMO LINK: http://129.199.228.10/nav/gui/

Weitere ähnliche Inhalte

Ähnlich wie Entity Linking Combining Open Source Annotators

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologycsandit
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Thomas Burguiere
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$Sof Ouni
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...IJwest
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsWaqas Tariq
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)SangMe Nam
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide sharerwpreston135
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classificationIsabella Peters
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxanhlodge
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 

Ähnlich wie Entity Linking Combining Open Source Annotators (20)

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
Finding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontologyFinding prominent features in communities in social networks using ontology
Finding prominent features in communities in social networks using ontology
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069Syst biol 2012-burguiere-sysbio sys069
Syst biol 2012-burguiere-sysbio sys069
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
Sub1557
Sub1557Sub1557
Sub1557
 
Iot ontologies state of art$$$
Iot ontologies state of art$$$Iot ontologies state of art$$$
Iot ontologies state of art$$$
 
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
Towards From Manual to Automatic Semantic Annotation: Based on Ontology Eleme...
 
Rule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak ReportsRule-based Information Extraction from Disease Outbreak Reports
Rule-based Information Extraction from Disease Outbreak Reports
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
CSE509 Lecture 5
CSE509 Lecture 5CSE509 Lecture 5
CSE509 Lecture 5
 
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
Cyworld Jeju 2009 Conference(10 Aug2009)No2(2)
 
Assignment 5 interoperability slide share
Assignment 5 interoperability slide shareAssignment 5 interoperability slide share
Assignment 5 interoperability slide share
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docxRUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
RUNNING HEADER Analytics Ecosystem1Analytics Ecosystem4.docx
 
Notes on mining social media updated
Notes on mining social media updatedNotes on mining social media updated
Notes on mining social media updated
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 

Kürzlich hochgeladen

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 

Kürzlich hochgeladen (20)

Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 

Entity Linking Combining Open Source Annotators

  • 1. http://www.lattice.cnrs.fr | Demonstrations at NAACL HLT 2015, Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies, Denver, Colorado (US), May 31-June 5 Expression extractions should be improved and implemented on open source software. The careful use of natural language processing algorithms could provide better filtering metrics and support in expression merging The manual filtering is crucial because it allows entities to be reduced to a set size appropriate for analysis, but also recovering important entities that could have been excluded by the automatic filtering. Expressed in [1] by social scientists from médialab (Paris Institute of Political Studies, SciencesPo) OOV IV LATTICE Lab CNRS – Ecole Normale Supérieure U Paris 3 Sorbonne Nouvelle ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators Pablo Ruiz, Thierry Poibeau and Frédérique Mélanie pablo.ruiz.fabo@ens.fr Our users’ needs in Entity Linking (EL) o Target users: social science researchers o Performance of EL systems varies widely depending on corpus characteristics and types of entities required o Difficult for users to choose optimal EL system for their corpora o Our target users wish to filter EL results, making informed choices about entities to keep and discard o Public open source tools o Combine outputs of several tools to get complementary results o Providing metrics for users to evaluate quality of an annotation o Simultaneous access to metrics and text to validate annotations o Besides manual selection, automatic selection also possible via weighted voting of annotations The Problem Our Approach Demo features TRAFFIC-LIGHT MATRIX FORMAT o Annotation confidence scores provided by EL services o Measures of coherence between an entity and the most representative entities in the corpus › Wikipedia Link-based Measure: Relatedness between two entities as a function of Wikipedia pages linking to both and linking to one only Milne-Witten [3] coherence between entities e1 and e2 (as in Hoffart et al. [4]) › Other possible measures • Distance between entities’ categories in a Wikipedia category graph Corpus: subset of PoliInformatics [2], about 2008 US financial crisis (1) Query via Search Text displays: • Document Panel: Documents matching the query • Entity Panel: Entities extracted in the documents matching the query displayed on doc. panel, plus: (2) Confidence Scores for each annotator, normalized to a 0-1 range. (T=Tagme, S=Spotlight, W=Wikipedia Miner). (3) Coherence score between the entity and a representative subset of the corpus entities. (4) Entities not coherent with the corpus are flagged in red. (5) Query via Search Entities displays: • Entity Panel: Entities matching the query. • Document Panel: Documents containing one of the entities displayed on the entity panel. (6) Refine Search: Entities can be selected with a list of types (like ORG) or selected individually with checkboxes. (7) The Auto-Selection tab shows the output of an automatic filtering via weighted voting of annotations. (8) Charts: examples of co-occurrence networks, created offline exploiting workflow information (sentence number, confidence, …) 0.0 1.0 Scale DOC.PANELENTITYPANEL 1 5 3 4 6 2 7 8 System workflows o User always has access to full results, but the workflow can select a subset of the annotations automatically. o Workflow combines, via weighted voting, outputs of: TagMe2, DBpedia Spotlight, Wikipedia Miner, AIDA, Babelfy o Votes are weighted according to each annotator’s precision on two reference corpora (IITB and AIDA/CONLL B), depending on whether user requires annotations for common-noun entity mentions or not. on demo not shown on demo Evaluation o Automatic EL system combination improved results over each individual system’s results ([5], our *SEM poster). o Assessed with strong annotation match and entity match [6] on four different corpora: AIDA/CONLL B, IITB, MSNBC, AQUAINT. [1] T. Venturini & D. Guido. 2012. Once upon a text. An ANT [Actor-Network Theory] Tale in Text Analytics. Sociologica, 3:1-17. Il Mulino, Bologna. [2] N. Smith et al. 2014. Overview of the 2014 NLP Unshared Task in PoliInformatics. In Proc. ACL LACSS Workshop. [3] D. Milne & I. Witten. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proc AAAI WS on Wikipedia and AI. [4] J. Hoffart et al. 2011. Robust disambiguation of named entities in text. In Proc. EMNLP. [5] P. Ruiz & T. Poibeau. 2015. Combining open source annotators for entity linking through weighted voting. In Proc. *SEM. [6] M. Cornolti, P. Ferragina & M. Ciaramita. (2013). A framework for benchmarking entity-annotation systems. In Proc. of WWW, 249-260. Metrics to assist in manual filtering Annotation voting for automatic filtering DEMO LINK: http://129.199.228.10/nav/gui/