Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

ANNIS workshop sfb 2014

665 Aufrufe

Veröffentlicht am

These slides present the the new features of ANNIS 3.1.6, 3.1.7 and a basic introduction oh what is ANNIS and how to use it. ANNIS is an open source, cross platform (Linux, Mac, Windows), web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

ANNIS workshop sfb 2014

  1. 1. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 1 Carolin Odebrecht & Florian Zipser Humboldt-Universität zu Berlin ANNIS workshop 2014-08-26
  2. 2. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 2 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Imports existing corpora ● Corpora already have to be annotated, ANNIS only uses what's there ● No NLP!
  3. 3. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 3 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Makes corpora searchable ● One query language for all corpora (AQL) ● Abstraction over linguistic data necessary ● But: Corpora have different annotations → query has to match the annotations
  4. 4. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 4 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Displays corpora ● Many visualizations available ● Corresponding to type of annotation (syntactic trees, phrase trees (RST), grids, coreferences ...)
  5. 5. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 5 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL
  6. 6. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 6 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters
  7. 7. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 7 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters – You need to be exact → e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the blank)
  8. 8. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 8 ANNIS basics ANNIS basics
  9. 9. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 9 Enter query Corpus list Previous queries Virtual Keyboard (e.g. arabic)
  10. 10. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 10 Sample queries (corresponding to corpus)
  11. 11. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 11 Query result Visualizations
  12. 12. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 12 Corpus metadata Corpus metadata window
  13. 13. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 13 Document metadata Document metadata window
  14. 14. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 14 ANNIS basics ● Basic principles of AQL (ANNIS Query Language) – Attributes and values ● Searching for exact character sequences ● Searching for patterns – Combinatory search
  15. 15. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 15 Demo corpus ● Corpus for demonstration: pcc2 (a sub corpus of pcc) https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg ● Potsdam Commentary Corpus – German Newspaper commentaries 'Märkische Allgemeine Zeitung' https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html – Multiple annotations
  16. 16. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 16 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  17. 17. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 17 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  18. 18. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 18 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" "jugendlichen"
  19. 19. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 19 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" 3 hits "jugendlichen" 0 hits → tok="jugendlichen"
  20. 20. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 20 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos = "NN" attribute value – Attributes can have more than one value – Searching for all values of an attribute
  21. 21. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 21 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" pos="ADJA"
  22. 22. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 22 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" 62 hits pos="ADJA" 18 hits searching for all values of an attribute pos 399 hits
  23. 23. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 23 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s"
  24. 24. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 24 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s" 28 hits
  25. 25. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 25 Metadata ● Sent="s" 28 hits – necessary to know which annotations are in a corpus
  26. 26. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 26 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ /jugend.*/
  27. 27. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 27 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ 5 hits ("Jugendlichen" 3 hits) Jugendlichen Jugendliche /jugend.*/ 0 hits ("jugendlichen" 0 hits)
  28. 28. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 28 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ includes NN & NE searching for all adjectives pos=/ADJ./ includes ADJA & ADJD
  29. 29. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 29 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ 73 hits (pos="NN" 62 hits) searching for all adjectives pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
  30. 30. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 30 Relations between annotations ● Span annotation searching for all NPs cat="NP" 41 hits (pos="NN" 62 hits) e.g. Die Jugendlichen in Zossen
  31. 31. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 31 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" 41 hits pos="APPR" 19 hits e.g. Die Jugendlichen in Zossen → no relation between the two information!
  32. 32. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 32 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" #1 pos="APPR" #2 e.g. Die Jugendlichen in Zossen → NP includes APPR
  33. 33. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 33 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" & pos="APPR" & #1_i_#2 e.g. Die Jugendlichen in Zossen
  34. 34. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 34 Hierarchy relations ● Relations between attributes searching for all NPs which are objects cat="NP" e.g. Die Jugendlichen in Zossen -->subject!
  35. 35. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 35 Hierarchy relations ● Relations between attributes searching all NPs which are objects – NP → node annotation – OA → edge annotation To ke n To ke n To ke n Sp a n N o d e Ed ge
  36. 36. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 36 Hierarchy relations ● Relations between attributes searching all NPs which are objects cat="NP" the syntactic function in the tree func="OA" → Note: At least there are two elements which relate in a way to each other!
  37. 37. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 37 Hierarchy relations ● Relations between attributes searching all NPs which are objects node & cat="NP" & #1 >[func="OA"] #2 e.g. ein Musikcafé -->object!
  38. 38. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 38 Used Relations ● Relations we used: A _i_ B A includes B A > B A dominates B A >[func=“OA“] B A dominates B and B is an object The full list of relations can be found in ANNIS
  39. 39. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 39 What's new in ANNIS What's new in ANNIS version 3.1.7
  40. 40. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 40 What's new in ANNIS ● Simplified syntax (AQL) ● Frequency analysis (Visualisierung) ● Expand match context (Visualisierung) ● Equality and Inequality (AQL) ● Variables (AQL) ● Complex OR expression (AQL) ● Document browser (Visualisierung) ● CSV export (Visualisierung) ● Tooltip for corpus names (Visualisierung) ● Report problem (Visualisierung)
  41. 41. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 41 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4
  42. 42. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 42 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4 Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  43. 43. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 43 Frequency analysis ● Question: – How many words tagged as „NN“, „ADJA“ or „ADV“ does a corpus contain? – What are the most frequent part-of-speech tags followed by a noun? – What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? – ...
  44. 44. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 44 Frequency analysis
  45. 45. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 45 Frequency analysis
  46. 46. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 46 Frequency analysis Attention: A frequency analysis has to be bound to a query!
  47. 47. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 47 Frequency analysis ● What are the most frequent part-of-speech tags followed by a noun? ● What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? pos . pos="NN" cat="S" > cat="PP" > pos
  48. 48. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 48 Expand match context ● Even more than 25 is possible, it's a free text field ● Sometimes the context is too small
  49. 49. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 49 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  50. 50. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 50 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes ● Question (equality): two same part-of-speech tags, one directly following the other pos . pos & #1 == #2
  51. 51. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 51 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  52. 52. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 52 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  53. 53. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 53 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
  54. 54. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 54 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug Variables and numbers can be mixed: cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
  55. 55. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 55 Complex OR expression ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  56. 56. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 56 Complex OR expression pos="NN" | pos="ADJA" | pos= "ART" ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article ● OR for expressions pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  57. 57. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 57 Complex OR expression (cat="S" > cat="PP") | cat="NP" ● Question (complex OR): A prepositional phrase, which is dominated by a sentence, or just a nominal phrase
  58. 58. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 58 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article
  59. 59. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 59 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article Attention: All expressions in brackets have to use the same variable … & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
  60. 60. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 60 Document browser ● Displays the entire text of a document
  61. 61. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 61 Document browser
  62. 62. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 62 CSV export ● Export data for futher processing
  63. 63. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 63 Tooltips for corpus names ● Sometimes corpus names can get very long
  64. 64. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 64 Report problem
  65. 65. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 65 Get ANNIS ● ANNIS comes in two flavors – A server version – A desktop version (ANNIS kickstarter) – Both are downloadable at: http://www.sfb632.uni-potsdam.de/annis/ ● ANNIS is open source (Apache license 2.0) and hosted on github – https://github.com/korpling/ANNIS
  66. 66. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 66 Thanks for your attention! Any questions? carolin.odebrecht@hu-berlin.de, f.zipser@gmx.de

×