ANNIS workshop sfb 2014

606 Aufrufe

Veröffentlicht am

These slides present the the new features of ANNIS 3.1.6, 3.1.7 and a basic introduction oh what is ANNIS and how to use it. ANNIS is an open source, cross platform (Linux, Mac, Windows), web browser-based search and visualization architecture for complex multilayer linguistic corpora with diverse types of annotation.

Veröffentlicht in: Technologie
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
606
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
1
Aktionen
Geteilt
0
Downloads
6
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

ANNIS workshop sfb 2014

  1. 1. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 1 Carolin Odebrecht & Florian Zipser Humboldt-Universität zu Berlin ANNIS workshop 2014-08-26
  2. 2. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 2 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Imports existing corpora ● Corpora already have to be annotated, ANNIS only uses what's there ● No NLP!
  3. 3. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 3 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Makes corpora searchable ● One query language for all corpora (AQL) ● Abstraction over linguistic data necessary ● But: Corpora have different annotations → query has to match the annotations
  4. 4. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 4 A brief introduction ● Search and Visualization in Multilayer Linguistic Corpora – Displays corpora ● Many visualizations available ● Corresponding to type of annotation (syntactic trees, phrase trees (RST), grids, coreferences ...)
  5. 5. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 5 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL
  6. 6. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 6 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters
  7. 7. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 7 A brief introduction ● What ANNIS cannot do – Does not know how to speak natural language → so you have to learn AQL – ANNIS does not know any semantics → „NN“, „NP“, „sentence“, „word“, „my favorite annotation“ … are just sequences of characters – You need to be exact → e.g. „POS“ != „pos“ and „NN“ != „NN “ (regard the blank)
  8. 8. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 8 ANNIS basics ANNIS basics
  9. 9. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 9 Enter query Corpus list Previous queries Virtual Keyboard (e.g. arabic)
  10. 10. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 10 Sample queries (corresponding to corpus)
  11. 11. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 11 Query result Visualizations
  12. 12. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 12 Corpus metadata Corpus metadata window
  13. 13. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 13 Document metadata Document metadata window
  14. 14. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 14 ANNIS basics ● Basic principles of AQL (ANNIS Query Language) – Attributes and values ● Searching for exact character sequences ● Searching for patterns – Combinatory search
  15. 15. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 15 Demo corpus ● Corpus for demonstration: pcc2 (a sub corpus of pcc) https://korpling.german.hu-berlin.de/annis3/#_c=cGNjMg ● Potsdam Commentary Corpus – German Newspaper commentaries 'Märkische Allgemeine Zeitung' https://www.ling.uni-potsdam.de/acl-lab/Forsch/pcc/pcc.html – Multiple annotations
  16. 16. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 16 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  17. 17. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 17 ANNIS basics ● Different types of annotations – Token annotation – Span annotation – Pointing relation – Hierarchy annotation (trees) To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y To ke n To ke n To ke n To ke n To ke n To ke n Sp a n Sp a n Sp a n N o d e Ed ge K e y K e y K e y
  18. 18. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 18 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" "jugendlichen"
  19. 19. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 19 Exact word forms ● Token annotation – Exact sequence searching for a word form "Jugendlichen" 3 hits "jugendlichen" 0 hits → tok="jugendlichen"
  20. 20. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 20 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos = "NN" attribute value – Attributes can have more than one value – Searching for all values of an attribute
  21. 21. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 21 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" pos="ADJA"
  22. 22. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 22 Exact token annotation ● Token annotation – Exact sequence searching for an exact part of speech tag pos="NN" 62 hits pos="ADJA" 18 hits searching for all values of an attribute pos 399 hits
  23. 23. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 23 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s"
  24. 24. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 24 Exact span annotation ● Span annotation – Exact sequence searching for sentences Sent="s" 28 hits
  25. 25. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 25 Metadata ● Sent="s" 28 hits – necessary to know which annotations are in a corpus
  26. 26. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 26 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ /jugend.*/
  27. 27. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 27 Pattern ● Token annotation – Patterns . matches any single character * zero or more of the preceding element searching for the beginning a of word /Jugend.*/ 5 hits ("Jugendlichen" 3 hits) Jugendlichen Jugendliche /jugend.*/ 0 hits ("jugendlichen" 0 hits)
  28. 28. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 28 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ includes NN & NE searching for all adjectives pos=/ADJ./ includes ADJA & ADJD
  29. 29. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 29 Pattern ● Token annotation – patterns searching for all nouns pos=/N./ 73 hits (pos="NN" 62 hits) searching for all adjectives pos=/ADJ./ 32 hits (pos="ADJA" 18 hits)
  30. 30. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 30 Relations between annotations ● Span annotation searching for all NPs cat="NP" 41 hits (pos="NN" 62 hits) e.g. Die Jugendlichen in Zossen
  31. 31. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 31 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" 41 hits pos="APPR" 19 hits e.g. Die Jugendlichen in Zossen → no relation between the two information!
  32. 32. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 32 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" #1 pos="APPR" #2 e.g. Die Jugendlichen in Zossen → NP includes APPR
  33. 33. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 33 Relations between annotations ● Relations between attributes searching for all NPs which contain a preposition cat="NP" & pos="APPR" & #1_i_#2 e.g. Die Jugendlichen in Zossen
  34. 34. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 34 Hierarchy relations ● Relations between attributes searching for all NPs which are objects cat="NP" e.g. Die Jugendlichen in Zossen -->subject!
  35. 35. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 35 Hierarchy relations ● Relations between attributes searching all NPs which are objects – NP → node annotation – OA → edge annotation To ke n To ke n To ke n Sp a n N o d e Ed ge
  36. 36. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 36 Hierarchy relations ● Relations between attributes searching all NPs which are objects cat="NP" the syntactic function in the tree func="OA" → Note: At least there are two elements which relate in a way to each other!
  37. 37. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 37 Hierarchy relations ● Relations between attributes searching all NPs which are objects node & cat="NP" & #1 >[func="OA"] #2 e.g. ein Musikcafé -->object!
  38. 38. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 38 Used Relations ● Relations we used: A _i_ B A includes B A > B A dominates B A >[func=“OA“] B A dominates B and B is an object The full list of relations can be found in ANNIS
  39. 39. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 39 What's new in ANNIS What's new in ANNIS version 3.1.7
  40. 40. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 40 What's new in ANNIS ● Simplified syntax (AQL) ● Frequency analysis (Visualisierung) ● Expand match context (Visualisierung) ● Equality and Inequality (AQL) ● Variables (AQL) ● Complex OR expression (AQL) ● Document browser (Visualisierung) ● CSV export (Visualisierung) ● Tooltip for corpus names (Visualisierung) ● Report problem (Visualisierung)
  41. 41. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 41 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4
  42. 42. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 42 Simplified syntax ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence So far: cat="S" & cat="NP" & "Die" & "Jugendlichen" & #1 > #2 & #2 > #3 & #2 > #4 & #3 . #4 Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  43. 43. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 43 Frequency analysis ● Question: – How many words tagged as „NN“, „ADJA“ or „ADV“ does a corpus contain? – What are the most frequent part-of-speech tags followed by a noun? – What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? – ...
  44. 44. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 44 Frequency analysis
  45. 45. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 45 Frequency analysis
  46. 46. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 46 Frequency analysis Attention: A frequency analysis has to be bound to a query!
  47. 47. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 47 Frequency analysis ● What are the most frequent part-of-speech tags followed by a noun? ● What are the most frequent part-of-speech tags in a prepositional phrase, which is in a sentence? pos . pos="NN" cat="S" > cat="PP" > pos
  48. 48. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 48 Expand match context ● Even more than 25 is possible, it's a free text field ● Sometimes the context is too small
  49. 49. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 49 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  50. 50. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 50 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes ● Question (equality): two same part-of-speech tags, one directly following the other pos . pos & #1 == #2
  51. 51. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 51 Equality and Inequality ● Equality „==“ and inequality „!=“ for attributes Question (inequality): two different part-of-speech tags, one directly following the other pos . pos & #1 != #2
  52. 52. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 52 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > cat="NP" > "Die" . "Jugendlichen" & #2 > #4
  53. 53. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 53 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug
  54. 54. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 54 Variables ● Question: „Die“ followed by „Jugendlichen“ both being dominated by a prepositional phrase which is dominated by a sentence Simplified: cat="S" > np#cat="NP" > "Die" . jug#"Jugendlichen" & #np > #jug Variables and numbers can be mixed: cat="S" > np#cat="NP" > "Die" . "Jugendlichen" & #np > #4
  55. 55. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 55 Complex OR expression ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  56. 56. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 56 Complex OR expression pos="NN" | pos="ADJA" | pos= "ART" ● Question (simple OR): A part-of-speech tag which is a noun, an attributive adjective or an article ● OR for expressions pos=/(NN)|(ADJA)|(ART)/ (in pattern search)
  57. 57. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 57 Complex OR expression (cat="S" > cat="PP") | cat="NP" ● Question (complex OR): A prepositional phrase, which is dominated by a sentence, or just a nominal phrase
  58. 58. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 58 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article
  59. 59. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 59 Complex OR expression a#cat="PP" & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & #a > #b ● Question (nested OR): A prepositional phrase, which dominates a noun, an attributive adjective or an article Attention: All expressions in brackets have to use the same variable … & (b#pos="NN" | b#pos="ADJA" | b#pos= "ART") & ...
  60. 60. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 60 Document browser ● Displays the entire text of a document
  61. 61. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 61 Document browser
  62. 62. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 62 CSV export ● Export data for futher processing
  63. 63. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 63 Tooltips for corpus names ● Sometimes corpus names can get very long
  64. 64. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 64 Report problem
  65. 65. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 65 Get ANNIS ● ANNIS comes in two flavors – A server version – A desktop version (ANNIS kickstarter) – Both are downloadable at: http://www.sfb632.uni-potsdam.de/annis/ ● ANNIS is open source (Apache license 2.0) and hosted on github – https://github.com/korpling/ANNIS
  66. 66. ANNIS workshopCarolin Odebrecht & Florian Zipser ANNIS: Search and Visualization in Multilayer Linguistic Corpora 66 Thanks for your attention! Any questions? carolin.odebrecht@hu-berlin.de, f.zipser@gmx.de

×