Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Anaphora resolution

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 26 Anzeige
Anzeige

Weitere Verwandte Inhalte

Aktuellste (20)

Anzeige

Anaphora resolution

  1. 1. Nobal Niraula 4th Oct 2010 1
  2. 2.  Introduction to Anaphora and Anaphora Resolution (AR)  Types of Anaphora  Process of Anaphora Resolution  Tools  Issues 2
  3. 3.  Ruslan Mitkov, School of Languages and European Studies University of Wolverhampton, Stafford Street, UK ◦ ANAPHORA RESOLUTION: THE STATE OF THE ART ◦ Outstanding Issues in Anaphora Resolution 3
  4. 4.  Anaphora in Etymology ◦ Ancient Greek : Anaphora = anajora (Anajora)  ana (Ana)  back in an upward direction  jora (Jora )  the act of carrying back upstream  Example: ◦ The Empress hasn't arrived yet but she should be here any minute.  she Anaphor  The Empress (NP)  Antecedent  Empress (N) is NOT the antecedent !  Coreferent  Both The Empress and she refer to the same REAL WORLD ENTITY 4
  5. 5.  Catafora ◦ when the “anaphor” precedes the “antecedent” ◦ Because she was going to the post office, Julie was asked to post a small parcel 5
  6. 6.  Anaphora Resolution(AR) is the process of determining the antecedent of an anaphor. ◦ Anaphor – The reference that points to the previous item ◦ Antecedent –The entity to which anaphor refers  Needed to derive the “Correct Interpretation” of a text  Is a complicated problem in NLP ! 6
  7. 7.  Natural Language Interfaces  Machine Translation  Automatic Abstracting  Information Extraction 7
  8. 8.  when the anaphor and more than one of the preceding (or following) entities (usually noun phrases) have the same referent and are therefore pairwise coreferential 8
  9. 9.  Anaphora Resolution ◦ System has to determine antecedents of anaphors  Coreference Resolution ◦ Identify all coreference CHAINS ◦ the task of AR is considered successful, if any of the preceding entities in the coreferential chain is identified as an antecedent. 9
  10. 10.  Pronominal anaphora ◦ The most widespread type of anaphora ◦ Realized by anaphoric pronouns  Computational Linguists from many different countries attended the tutorial. They took extensive notes. ◦ Not all pronouns in English are anaphoric ◦ Example  It is raining.  It is non-anaphoric 10
  11. 11.  Definite noun phrase anaphora ◦ when the antecedent is referred by a definite noun phrase representing either same concept (repetition) or semantically close concepts (e.g. synonyms, superordinates). ◦ Example  Computational Linguists from many different countries attended the tutorial. The participants found it hard to cope with the speed of the presentation. 11
  12. 12.  One-anaphora ◦ One-anaphora is the case when the anaphoric expression is realized by a "one" noun phrase. ◦ Example:  If you cannot attend a tutorial in the morning, you can go for an afternoon one.  Some are easier than others.  The red ones went in the side pocket. 12
  13. 13.  Intrasentential anaphors ◦ Referring to an antecedent which is in the same sentence as the anaphor  Intersentential anaphors ◦ Referring to an antecedent which is in a different sentence from that of the anaphor 13
  14. 14.  Possibilities: identifying anaphors which have noun phrases, verb phrases, clauses, sentences or even paragraphs/discourse segments as antecedents  Most of the AR systems deal with : identifying anaphors which have noun phrases as their antecedents 14
  15. 15.  Typical Process ◦ Candidates  All noun phrases (NPs) preceding an anaphor are initially regarded as potential candidates for antecedents ◦ Search scope  Most approaches look for NPs in the current and preceding sentence.  Antecedents which are 17 sentences away from the anaphor have already been reported ! 15
  16. 16.  GUITAR ◦ http://cswww.essex.ac.uk/Research/nle/GuiTAR/gtarNew.html ◦ Uses Charniak’s parser ◦ Input Minimally Associated XML format (MAS-XML) ◦ Output in XML ◦ Java  BART ◦ http://www.bart-coref.org/ ◦ John Hopkins University Tool for AR ◦ Uses Charniak’s parser ◦ Comes up with Machine learning model ◦ Can take Input as XML. Output XML  MARS ◦ http://clg.wlv.ac.uk/demos/MARS/index.php ◦ Built on Mitkov principle ◦ Uses certain linguistic rules ◦ If your data has noises, don’t use it.  Java-RAP (pronouns) ◦ http://aye.comp.nus.edu.sg/~qiu/NLPTools/JavaRAP.html ◦ Working on Lipin and Leass rules (1994) ◦ Uses Charniak’s parser 16
  17. 17.  OPEN NLP ◦ http://opennlp.sourceforge.net  CherryPicker: A Coreference Resolution Tool ◦ http://www.hlt.utdallas.edu/~altaf/cherrypicker.ht ml  Reconcile - Coreference Resolution Engine ◦ http://www.cs.utah.edu/nlp/reconcile/ 17
  18. 18.  Why BART ? ◦ Open Source ◦ It is beautiful (Beautiful Anaphora Resolution Toolkit)  ◦ It works !  ◦ http://www.bart-coref.org 18
  19. 19.  Where do we stand today? ◦ Domain-specific and linguistic knowledge are not needed ◦ Knowledge-poor anaphora strategies have developed  By emergence of cheaper and more reliable corpus- based NLP tools such as POS taggers, Sallow parsers, and other NLP resources (ontologies) ◦ Use of modern approaches  Machine Learning  Genetic Algorithms 19
  20. 20.  Accuracy of Pre-processing is still TOO low  Pre-processing stage ◦ Morphological Analysis / POS tagging -- fair ◦ Name Entity Reorganization – still a challenge  Best performing NER 96 % when trained and tested on news about a SPECIFIC topic and 93 % when trained on news about topic and tested on news about other topic ◦ Unknown word recognition ◦ NP extraction – have a long way to go  NP chunking 90%-93% recall and precision ◦ Gender recognition – still a challenge ◦ Identification of pleonastic pronouns ◦ Parsing  Best Accuracy in robust parsing of unrestricted texts = 87 %  How accurate an AR algorithm we make, it won’t perform well until the accuracy of pre-pressing improves 20
  21. 21.  Majority of Anaphora resolution systems do not operate in fully automatic mode.  Fully automatic AR is more difficult than previously thought ◦ 54.65 % in MARS ◦ It was as high as 90 % for perfectly analyzed inputs 21
  22. 22.  Need for Annotated Corpora ◦ Needed for training Machine Learning Algorithms or statistical approaches ◦ Corpora annotated with anaphoric or coreferential links are not widely available ◦ Annotation tool ◦ Annotation scheme 22
  23. 23.  Evaluation in Anaphora Resolution  Factors (constraints and preferences) ◦ Mutually dependent  Most people still work mainly on pronoun resolution  Multilingual Context  Service to Research Community ◦ Sharing software, experience, data ◦ There are just 3 demos for AR ;) 23
  24. 24.  “NLP in general is very difficult but after working hard on Anaphora Resolution we have learned that it is particularly difficult”~ Mitkov  "A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty”~Winston Churchill  “All we have to do is work more and HOPE for SLOW but STEADY progress. We just have to be PATIENT !” ~ Mitkov 24
  25. 25.  Some papers ◦ AR in multi-person dialogue  Haven’t found any paper specific to Tutoring 25
  26. 26. Massimo Poesio Slides: Anaphora Resolution for Practical Tasks, University of Trento 26

×