Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Named Entity Recognition - ACL 2011 Presentation
1. The Web is not a PERSON, Berners-
Lee is not an ORGANIZATION, and
African-Americans are not
LOCATIONS:
An Analysis of the Performance of
Named-Entity Recognition
Robert Krovetz (Lexicalresearch.com), Paul Deane, Nitin
Madnani (ETS)
A Review by Richard
Littauer (UdS)
2. The Background
Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)
3. The Background
Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)
Various competitions
4. The Background
Named-Entity Recognition (NER) is
normally judged in the context of
Information Extraction (IE)
Various competitions
Recently:
◦ non-English languages
◦ improving unsupervised learning methods
5. The Background
“There are no well-established
standards for evaluation of NER.”
6. The Background
“There are no well-established
standards for evaluation of NER.”
◦ Criteria for NER system changes for
competitions
◦ Proprietary software
8. The Background
KDM wanted to identify MWEs…
… but false positives, tagging
inconsistencies stopped this.
9. The Background
KDM wanted to identify MWEs…
… but false positives, tagging
inconsistencies stopped this.
IE derives Recall and Precision from
Information Retrieval
NER is just a small part of this, so is
rarely evaluated independently
10. The Background
So, they want to test NER systems,
and provide a unit test based on the
problems encountered
11. Evaluation
Compared three NER taggers:
Stanford:
◦ CRF, 100m training corpus;
University of Illinois (LBJ):
◦ Regularized average perceptron, Reuters
1996 News Corpus;
BBN IdentiFinder (IdentiFinder):
◦ HMMs, commercial
14. Evaluation
Agreement on Classification
Ambiguity in Discourse
Stanford vs. LBJ on internal ETS
425m corpus
All three on American National Corpus
23. Unit Test
Created two documents that can be
used as texts
◦ Different cases for true positives of
PERSON, LOCATION, ORGANIZATION
◦ Entirely upper case not NE (Ex.
AAARGH)
◦ Punctuated terms not NE
◦ Terms with Initials
◦ Acronyms (some expanded, some not)
◦ Last names in close proximity to first
names
24. Unit Test
Created two documents that can be
used as texts
◦ Terms with prepositions (Mass. Inst. Of
Tech.)
◦ Terms with location and organization
(Amherst College)
Provided freely online.
25. One NE Tag per Discourse
Unusual for multiple occurrences of a
token in a document to be different
entities
True for homonyms
An exception: Location + sports team
26. One NE Tag per Discourse
Stanford, LBJ have features for non-
local dependencies to help with this.
KDM: Two other uses for NLD:
◦ Source of error in evaluation
◦ A way to identify semantically related
entities
These should be treated as
exceptions
27. Discussion
There are guidelines for NER – but we
need standards.
The community should focus on
PERSON, ORGANISATION,
LOCATION, and MISC.
◦ Harder to deal with than Dates, Times.
◦ Disagreement between taggers.
◦ MISC is necessary.
◦ These have important value elsewhere.
28. Discussion
To improve intrinsic evaluation for
NER:
1. Create test sets for divers domains.
2. Use standardized sets for different
phenomena.
3. Report accuracy for POL separately.
4. Establish uncertainty in the tagging
system.
29. Conclusion
90% accuracy not real.
We need to use only entities that are
agreed on by multiple taggers.
Even in cases where they both
disagree (Hint: Future work.)
Unit test downloadable.
NER: The Aim is to recognize and classify different types of entities (names, organizations, locations, dates, etc.)
Not sure why they focused on competitions, to be honest. But they mention the Message Understanding Conference, and CoNLL.
They give two possible reasons for this:
Part of the problem is that
No Gold Standards for any of these. So, they compared on two levels
How well do they work on PERSON, ORGANIZATION, and LOCATION? How much to they agree? What mistakes?
How frequently does each tagger produce multiple classifications for the same entity in a single document? Clinton as a person, and place, for instance.
ANC tagged for IdentiFinder already.
However, this was often not consistent
Identifiner got much more ORGANISATION than the others. Also uses extra class, Geo-Political Entity
Existing taggers treat the non-local dependencies as a way of dealing with the sparse data problem, and as a way to resolve tagging differences by look- ing at how often one token is classified as one type versus another.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.
1. They didn’t do this. 2. And actually use them, not just one of them. 3. Report accuracy rates separately for the three major classes. Accuracy rates should be further broken down according to the items in the unit test that are designed to assess mistakes: or- thography, acronym processing, frequent false positives, and knowledge-based classification.They go on to say that ANC is doing it right, but is too small, hence their ETS corpus.