Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Building a Graph of Names and Contextual Patterns for Named Entity ClassificationEcir09 poster

312 Aufrufe

Veröffentlicht am

Authors: César de Pablo Sánchez, Paloma Martínez
ECIR 2009: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, Tolouse, France (April 6-9 2009)

Veröffentlicht in: Daten & Analysen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Building a Graph of Names and Contextual Patterns for Named Entity ClassificationEcir09 poster

  1. 1. Building a Graph of Names and Contextual Patterns for Named Entity Classification C´esar de Pablo S´anchez and Paloma Mart´ınez LABDA, Computer Science Dept., Universidad Carlos III de Madrid {cdepablo,pmf}@inf.uc3m.es Objectives • NERC for multilingual applications • Bootstrap a name list and indicative patterns – Large document collection – Few example seeds for every class Nseeds < 40 – Language independence (as an aim) Initial assumptions • Dual bootstrapping • One sense per entity type (name) • Indelibility of class assignments • Counter-training: learn several classes at once • Query based exploration of the indexed collection. PERSON(x) Left patterns Right patterns Num Name Num Text Num Text 15 Fernando Arrabal 0 Gobierno del presidente 6 , ### esta tarde 64 Teodoro Obiang 1 Gobierno del ### 9 , vencedor 68 Salvador Allende 12 gobierno del presidente 21 y el ex 128 Peres 13 presidente del pa´ıs , 26 , viajar´a 156 Edouard Balladur 29 actual presidente 34 , y su colega 332 Grachov 47 palabras de 42 , visitar´a 423 Calder´on 50 cuyo ### , 49 , y el l´ıder 450 Colom 60 presidente , 63 y el presidente 522 Joaqu´ın Almunia 61 reuni´on con 65 se entrevist´o’ Direct Evaluation: Name Lists (AvgPrec) Model PER LOC ORG M / T Mean PLO 94.8 52.7 67.1 – 71.5 PLOM 93.0 44.8 79.3 75.0 73.0 PLOT 94.8 87.4 81.1 40.9 76.0 Name Classification Model P R F Acc baseline CONLL 26.27 56.48 35.86 – ORG – – – 39.34 entities PLO 77.33 54.34 63.83 64.04 PLOM 78.85 51.53 62.36 66.24 PLOT 78.72 41.58 54.42 62.18 entities+patterns PLO 66.12 57.97 61.78 63.17 PLOM 73.65 61.73 67.17 71.29 PLOT 66.35 56.62 61.10 62.50 Algorithm Pattern selection and evaluation 1. Rank by Support, filter min-support, select top-k 2. Evaluate min-Acc: Acc(p) = Pos Pos+Neg 3. Evaluate min-Conf: Conf(p) == Pos−Neg Pos+Neg+Unk Entity selection and evaluation 1. Rank by Support, filter min-support, select top-k 2. Evaluate min-Conf: Confslot(a) = 1 − i (1 − Confpattern(pi)) , ConfNE(a) = Confleft(a) ∗ Confright(a) Conclusions • Efficient bootstrapping from large indexed collections with less seeds • Already useful for NERC • F-measure is lower than supervised machine learning • More classes improves precision, not always recall Future work • Other languages and domains • Complex semantic models • Language independence and NE Recognition • Seed selection and improve effectiveness Acknowledgements: This work has been supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267) and by the Spanish Ministry of Education under the project BRAVO (TIN2007-67407-C03-01).

×