Named Entity Recognition - ACL 2011 Presentation

•Als PPTX, PDF herunterladen•

2 gefällt mir•2,559 views

Richard Littauer

Given for the Multiw

Bildung Technologie

The Web is not a PERSON, Berners-
Lee is not an ORGANIZATION, and
African-Americans are not
LOCATIONS:
An Analysis of the Performance of
Named-Entity Recognition
Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin
Madnani (ETS)

A Review by Richard
Littauer (UdS)

The Background
 Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)

The Background
 Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)
 Various competitions

The Background
 Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)
 Various competitions
 Recently:
◦ non-English languages
◦ improving unsupervised learning methods

The Background
 “There are no well-established
standards for evaluation of NER.”

The Background
 “There are no well-established
standards for evaluation of NER.”
◦ Criteria for NER system changes for
competitions
◦ Proprietary software

The Background
 KDM wanted to identify MWEs…

The Background
 KDM wanted to identify MWEs…
… but false positives, tagging
inconsistencies stopped this.

The Background
 KDM wanted to identify MWEs…
… but false positives, tagging
inconsistencies stopped this.

 IE derives Recall and Precision from
Information Retrieval
 NER is just a small part of this, so is
rarely evaluated independently

The Background
 So, they want to test NER systems,
and provide a unit test based on the
problems encountered

Evaluation
Compared three NER taggers:
 Stanford:
◦ CRF, 100m training corpus;
 University of Illinois (LBJ):
◦ Regularized average perceptron, Reuters
1996 News Corpus;
 BBN IdentiFinder (IdentiFinder):
◦ HMMs, commercial

Evaluation
 Agreement on Classification

Evaluation
 Agreement on Classification
 Ambiguity in Discourse

Evaluation
 Agreement on Classification
 Ambiguity in Discourse

 Stanford vs. LBJ on internal ETS
425m corpus
 All three on American National Corpus

Stanford vs. LBJ
 NER reported as 85-95% accurate.

Stanford vs. LBJ
 NER reported as 85-95% accurate.
 Same number for both: 1.95m for
Stanford, 1.8m for LBJ (7.6%
difference)
 However, errors:

Stanford vs. LBJ vs.
IdentiFinder
 Agreement:

Stanford vs. LBJ vs.
IdentiFinder
 Differences:
◦ How they are tokenized
◦ Number of entities recognized overall

Stanford vs. LBJ vs.
IdentiFinder
 Ambiguity:

Unit Test
 Created two documents that can be
used as texts
◦ Different cases for true positives of
PERSON, LOCATION, ORGANIZATION
◦ Entirely upper case not NE (Ex.
AAARGH)
◦ Punctuated terms not NE
◦ Terms with Initials
◦ Acronyms (some expanded, some not)
◦ Last names in close proximity to first
names

Unit Test
 Created two documents that can be
used as texts
◦ Terms with prepositions (Mass. Inst. Of
Tech.)
◦ Terms with location and organization
(Amherst College)

 Provided freely online.

One NE Tag per Discourse
 Unusual for multiple occurrences of a
token in a document to be different
entities
 True for homonyms
 An exception: Location + sports team

One NE Tag per Discourse
 Stanford, LBJ have features for non-
local dependencies to help with this.
 KDM: Two other uses for NLD:
◦ Source of error in evaluation
◦ A way to identify semantically related
entities

 These should be treated as
exceptions

Discussion
 There are guidelines for NER – but we
need standards.
 The community should focus on
PERSON, ORGANISATION,
LOCATION, and MISC.
◦ Harder to deal with than Dates, Times.
◦ Disagreement between taggers.
◦ MISC is necessary.
◦ These have important value elsewhere.

Discussion
 To improve intrinsic evaluation for
NER:
1. Create test sets for divers domains.
2. Use standardized sets for different
phenomena.
3. Report accuracy for POL separately.
4. Establish uncertainty in the tagging
system.

Conclusion
 90% accuracy not real.
 We need to use only entities that are
agreed on by multiple taggers.
 Even in cases where they both
disagree (Hint: Future work.)

 Unit test downloadable.

Cheers/PERSON

Richard/ORGANISATION thanks the
Mword Class/LOCATION for listening to
his talk about Berners-Lee/MISC

Weitere ähnliche Inhalte

Andere mochten auch

Dictionary-based named entity recognitionLars Juhl Jensen

Named EntitiesKnut O. Hellan

A Semi-Automatic Annotation Tool For Arabic Online Handwritten TextRanda Elanwar

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya

Automatic Term Ambiguity DetectionYunyao Li

Exploring Linked Data content through network analysisChristophe Guéret

Linked Data: What’s the Story?WiLS

Entity Search Engine DRTC Indian Statistical Institute Bangalore

Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel

Multlingual Linked Data PatternsJose Emilio Labra Gayo

QER : query entity recognitionDhwaj Raj

Text miningLars Juhl Jensen

RDF and other linked data standards — how to make use of big localization dataDave Lewis

Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK

Scaling up Linked DataEUCLID project

Interaction with Linked DataEUCLID project

Discoverers of Surface AnalysisYamada Language Center

Enhancing Entity Linking by Combining NER ModelsJulien PLU

Natural language procssing Rajnish Raj

Recipes for PhDMilad Shokouhi

Andere mochten auch (20)

Dictionary-based named entity recognition

Named Entities

A Semi-Automatic Annotation Tool For Arabic Online Handwritten Text

Recurrent Neural Networks I (D2L2 Deep Learning for Speech and Language UPC 2...

Automatic Term Ambiguity Detection

Exploring Linked Data content through network analysis

Linked Data: What’s the Story?

Entity Search Engine

Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...

Multlingual Linked Data Patterns

QER : query entity recognition

Text mining

RDF and other linked data standards — how to make use of big localization data

Dynamically Optimizing Queries over Large Scale Data Platforms

Scaling up Linked Data

Interaction with Linked Data

Discoverers of Surface Analysis

Enhancing Entity Linking by Combining NER Models

Natural language procssing

Recipes for PhD

Ähnlich wie Named Entity Recognition - ACL 2011 Presentation

Csmr13d.pptYann-Gaël Guéhéneuc

130102 venera arnaoudova - a new family of software anti-patterns linguisti...Ptidej Team

Creating an Urban Legend: A System for Electrophysiology Data Management and ...Anita de Waard

Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsMarina Santini

leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...KristiLBurns

asdrfasdfasdfSwayattaDaw1

SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab

How We Use Functional Programming to Find the Bad GuysNew York City College of Technology Computer Systems Technology Colloquium

Learn How to Overcome Patient Identity ChallengesIatric Systems

columbia-gwuTianrui Peng

Data Science Course In Pune APT

data science institute in bangaloredevipatnala1

Data Science Course PuneAPT

Data science course pdfAPT

Data Science CourseData Analytics Courses in Pune

data science certificationData Analytics Courses in Pune

data science course in punedevipatnala1

Data miningdevipatnala1

data science certificationdevipatnala1

data science institute in bangaloredevipatnala1

Ähnlich wie Named Entity Recognition - ACL 2011 Presentation (20)

Csmr13d.ppt

130102 venera arnaoudova - a new family of software anti-patterns linguisti...

Creating an Urban Legend: A System for Electrophysiology Data Management and ...

Towards a Quality Assessment of Web Corpora for Language Technology Applications

leewayhertz.com-Named Entity Recognition NER Unveiling the value in unstructu...

asdrfasdfasdf

SANAPHOR: Ontology-based Coreference Resolution

How We Use Functional Programming to Find the Bad Guys

Learn How to Overcome Patient Identity Challenges

columbia-gwu

Data Science Course In Pune

data science institute in bangalore

Data Science Course Pune

Data science course pdf

Data Science Course

data science certification

data science course in pune

Data mining

data science certification

data science institute in bangalore

Mehr von Richard Littauer

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Richard Littauer

Marcu 2000 presentationRichard Littauer

Barzilay & Lapata 2008 presentationRichard Littauer

Saarland and UdSRichard Littauer

Building Corpora from Social MediaRichard Littauer

Visualising Typological Relationships: Plotting WALS with Heat MapsRichard Littauer

On Tocharian Exceptionality to the centum/satem IsoglossRichard Littauer

The Evolution of Morphological AgreementRichard Littauer

Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Richard Littauer

Evolution of Morphological Agreement - Peche KuchaRichard Littauer

Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Richard Littauer

The Evolution of Speech Segmentation: A Computer SimulationRichard Littauer

Towards Open Methods: Using Scientific Workflows in LinguisticsRichard Littauer

A Reanalysis of Anatomical Changes for LanguageRichard Littauer

Mehr von Richard Littauer (14)

Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...

Marcu 2000 presentation

Barzilay & Lapata 2008 presentation

Saarland and UdS

Building Corpora from Social Media

Visualising Typological Relationships: Plotting WALS with Heat Maps

On Tocharian Exceptionality to the centum/satem Isogloss

The Evolution of Morphological Agreement

Trends in Use of Scientific Workflows: Insights from a Public Repository and ...

Evolution of Morphological Agreement - Peche Kucha

Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...

The Evolution of Speech Segmentation: A Computer Simulation

Towards Open Methods: Using Scientific Workflows in Linguistics

A Reanalysis of Anatomical Changes for Language

Kürzlich hochgeladen

Software Engineering Methodologies (overview)eniolaolutunde

Student login on Anyboli platform.helpinRaunakKeshri1

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Grant Readiness 101 TechSoup and Remy ConsultingTechSoup

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

microwave assisted reaction. General introductionMaksud Ahmed

Measures of Central Tendency: Mean, Median and ModeThiyagu K

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Arihant handbook biology for class 11 .pdfchloefrazer622

Paris 2024 Olympic Geographies - an activityGeoBlogs

Nutritional Needs Presentation - HLTH 104misteraugie

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

Introduction to AI in Higher Education_draft.pptxpboyjonauth

Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

Kürzlich hochgeladen (20)

Software Engineering Methodologies (overview)

Student login on Anyboli platform.helpin

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...

Staff of Color (SOC) Retention Efforts DDSD

Grant Readiness 101 TechSoup and Remy Consulting

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf

Introduction to ArtificiaI Intelligence in Higher Education

microwave assisted reaction. General introduction

Measures of Central Tendency: Mean, Median and Mode

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Arihant handbook biology for class 11 .pdf

Paris 2024 Olympic Geographies - an activity

Nutritional Needs Presentation - HLTH 104

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Web & Social Media Analytics Previous Year Question Paper.pdf

Hybridoma Technology ( Production , Purification , and Application )

Introduction to AI in Higher Education_draft.pptx

Z Score,T Score, Percential Rank and Box Plot Graph

Call Girls in Dwarka Mor Delhi Contact Us 9654467111

Named Entity Recognition - ACL 2011 Presentation

1. The Web is not a PERSON, Berners- Lee is not an ORGANIZATION, and African-Americans are not LOCATIONS: An Analysis of the Performance of Named-Entity Recognition Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin Madnani (ETS) A Review by Richard Littauer (UdS)

2. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)

3. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)  Various competitions

4. The Background  Named-Entity Recognition (NER) is normally judged in the context of Information Extraction (IE)  Various competitions  Recently: ◦ non-English languages ◦ improving unsupervised learning methods

5. The Background  “There are no well-established standards for evaluation of NER.”

6. The Background  “There are no well-established standards for evaluation of NER.” ◦ Criteria for NER system changes for competitions ◦ Proprietary software

7. The Background  KDM wanted to identify MWEs…

8. The Background  KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.

9. The Background  KDM wanted to identify MWEs… … but false positives, tagging inconsistencies stopped this.  IE derives Recall and Precision from Information Retrieval  NER is just a small part of this, so is rarely evaluated independently

10. The Background  So, they want to test NER systems, and provide a unit test based on the problems encountered

11. Evaluation Compared three NER taggers:  Stanford: ◦ CRF, 100m training corpus;  University of Illinois (LBJ): ◦ Regularized average perceptron, Reuters 1996 News Corpus;  BBN IdentiFinder (IdentiFinder): ◦ HMMs, commercial

12. Evaluation  Agreement on Classification

13. Evaluation  Agreement on Classification  Ambiguity in Discourse

14. Evaluation  Agreement on Classification  Ambiguity in Discourse  Stanford vs. LBJ on internal ETS 425m corpus  All three on American National Corpus

15. Stanford vs. LBJ  NER reported as 85-95% accurate.

16. Stanford vs. LBJ  NER reported as 85-95% accurate.  Same number for both: 1.95m for Stanford, 1.8m for LBJ (7.6% difference)  However, errors:

17. Stanford vs. LBJ  Agreement:

18. Stanford vs. LBJ  Ambiguity:

19. Stanford vs. LBJ vs. IdentiFinder  Agreement:

20. Stanford vs. LBJ vs. IdentiFinder  Agreement:

21. Stanford vs. LBJ vs. IdentiFinder  Differences: ◦ How they are tokenized ◦ Number of entities recognized overall

22. Stanford vs. LBJ vs. IdentiFinder  Ambiguity:

23. Unit Test  Created two documents that can be used as texts ◦ Different cases for true positives of PERSON, LOCATION, ORGANIZATION ◦ Entirely upper case not NE (Ex. AAARGH) ◦ Punctuated terms not NE ◦ Terms with Initials ◦ Acronyms (some expanded, some not) ◦ Last names in close proximity to first names

24. Unit Test  Created two documents that can be used as texts ◦ Terms with prepositions (Mass. Inst. Of Tech.) ◦ Terms with location and organization (Amherst College)  Provided freely online.

25. One NE Tag per Discourse  Unusual for multiple occurrences of a token in a document to be different entities  True for homonyms  An exception: Location + sports team

26. One NE Tag per Discourse  Stanford, LBJ have features for non- local dependencies to help with this.  KDM: Two other uses for NLD: ◦ Source of error in evaluation ◦ A way to identify semantically related entities  These should be treated as exceptions

27. Discussion  There are guidelines for NER – but we need standards.  The community should focus on PERSON, ORGANISATION, LOCATION, and MISC. ◦ Harder to deal with than Dates, Times. ◦ Disagreement between taggers. ◦ MISC is necessary. ◦ These have important value elsewhere.

28. Discussion  To improve intrinsic evaluation for NER: 1. Create test sets for divers domains. 2. Use standardized sets for different phenomena. 3. Report accuracy for POL separately. 4. Establish uncertainty in the tagging system.

29. Conclusion  90% accuracy not real.  We need to use only entities that are agreed on by multiple taggers.  Even in cases where they both disagree (Hint: Future work.)  Unit test downloadable.

30. Cheers/PERSON Richard/ORGANISATION thanks the Mword Class/LOCATION for listening to his talk about Berners-Lee/MISC

Hinweis der Redaktion

NER: The Aim is to recognize and classify different types of entities (names, organizations, locations, dates, etc.)
Not sure why they focused on competitions, to be honest. But they mention the Message Understanding Conference, and CoNLL.
They give two possible reasons for this:
Part of the problem is that
No Gold Standards for any of these. So, they compared on two levels
How well do they work on PERSON, ORGANIZATION, and LOCATION? How much to they agree? What mistakes?
How frequently does each tagger produce multiple classifications for the same entity in a single document? Clinton as a person, and place, for instance.
ANC tagged for IdentiFinder already.
However, this was often not consistent
Identifiner got much more ORGANISATION than the others. Also uses extra class, Geo-Political Entity
Existing taggers treat the non-local dependencies as a way of dealing with the sparse data problem, and as a way to resolve tagging differences by look- ing at how often one token is classified as one type versus another.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.

Named Entity Recognition - ACL 2011 Presentation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Named Entity Recognition - ACL 2011 Presentation

Ähnlich wie Named Entity Recognition - ACL 2011 Presentation (20)

Mehr von Richard Littauer

Mehr von Richard Littauer (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Named Entity Recognition - ACL 2011 Presentation

Hinweis der Redaktion