SlideShare a Scribd company logo
1 of 23
Poio API and GrAF-XML
A radical stand-off approach in
language documentation and language typology
Jonathan Blumtritt, Cologne Center for eHumanities, University of Cologne
Peter Bouda, Centro Interdisciplinar de Documentação Linguística e Social
Felix Rau, Department of Linguistics, University of Cologne
Overview
● Existing infrastructure and workflows
● CLARIN
● Annotation graphs
● GrAF and Poio API
● Example: Elan EAF to GrAF-XML
● CLASS
Fieldwork
Fotos
Existing Infrastructure
LD tools and standards
● Elan: EAF, MPEG, WAV
● Toolbox: TXT, XML, WAV
● Arbil: IMDI/CIMDI („Component MetaData
Infrastructure“)
● Praat: XML, WAV
● ...
● No standards for tier hierarchies, tier names or
annotation schemes
● Efforts in ISOcat
● European initiative within the European Research
Infrastructure Consortium: Common Language Resources
and Technology Infrastructure (CLARIN)
● aims at providing easy and sustainable access for scholars in
the humanities and social sciences to digital language data
● Started in 2006, part of a roadmap process, timeline currently
ending 2020
● CLARIN-D: working groups in Germany
● Curation projects for different research areas in linguistics
Annotation Graphs
● the underlying data model for linguistic annotations
● pivot structure for linguistic data
● time vs. byte offsets
● not hierarchical (but trees are also graphs)
● stand-off annotation
● "It is important to recognize that translation into AGs does
not magically create compatibility among systems whose
semantics are different." [Bird & Liberman 2001]
AGs visualized
GrAF
● GrAF: Graph Annotation Framework
● ISO 24612: Language resource management - Linguistic
annotation framework (LAF)
● Started as stand-off version of XCES
● API and representation as data structures, not a file format
● GrAF/XML as XML representation
● Used for the MASC of the ANC
● Nodes, edges, regions, annotations, feature structures
TEI and GrAF
● Schemata for GrAF created with TEI Roma
● Custumized version of TEI P5 schema
● ODD: „One Document Does it all“
● GrAF is not TEI compliant
● Share data types and feature structures of annotations
● TEI has „stand-off“ variant, uses XPointer/XLink
– Primary data has to be XML
Why we use GrAF
● Because it's new! :-)
● No inline markup
● Radical stand-off approach
– Easier to share and manage data
– Preferred solution to archive cultural heritage
– Ideal for sparse annotations
● Existing code: Java and Python
● The beauty of annotation graphs
Poio API
● Think of GrAF as an assembly language for linguistic
annotation; then Poio API is a libray to map from and to
higher-level languages
● Subset of GrAF to represent tier based annotation
● Filters and filter chains for search
● Plugin mechanism for file formats
– Mapping semantics: tiers and annotations to nodes and edges
● Meta-data for additional information (tier types etc.)
Example: Mapping of EAF to GrAF-XML
Elan EAF
<TIER DEFAULT_LOCALE="en" LINGUISTIC_TYPE_REF="words"
PARENT_REF="W-Spch" PARTICIPANT="" TIER_ID="W-Words">
<ANNOTATION>
<ALIGNABLE_ANNOTATION ANNOTATION_ID="a23"
TIME_SLOT_REF1="ts4" TIME_SLOT_REF2="ts6">
<ANNOTATION_VALUE>so</ANNOTATION_VALUE>
</ALIGNABLE_ANNOTATION>
</ANNOTATION>
<ANNOTATION>
[...]
</ANNOTATION>
</TIER>
GrAF entities
GrAF structure
GrAF-XML
<node xml:id="words..W-Words..na23">
<link targets="words..W-Words..ra23"/>
</node>
<region anchors="780 1340" xml:id="words..W-Words..ra23"/>
<edge from="utterance..W-Spch..n8" to="words..W-Words..na23"
xml:id="ea23"/>
<a as="words" label="words" ref="words..W-Words..na23"
xml:id="a23">
<fs>
<f name="annotation_value">so</f>
</fs>
</a>
Tier hierarchies
[
['utterance..K-Spch'],
['utterance..W-Spch',
['words..W-Words',
['part_of_speech..W-POS']
],
['phonetic_transcription..W-IPA']
],
['gestures..W-RGU',
['gesture_phases..W-RGph',
['gesture_meaning..W-RGMe']
]
],
['gestures..K-RGU',
['gesture_phases..K-RGph',
['gesture_meaning..K-RGMe']
]
]
]
The code
ag = poioapi.annotationgraph.AnnotationGraph()
parser = poioapi.io.ElanParser("example.eaf")
writer = poioapi.io.graf.Writer()
converter = poioapi.io.graf.GrAFConverter(parser, writer)
converter.parse()
converter.write("example.hdr")
Analysis workflows
● Graph-based methods
● Pipe to scientific Python libraries
● GrAF connectors for major linguistic workflow
tools (GATE and Apache UIMA)
● Example: Polysemy in dictionaries
● Example: Counting word orders
CLASS
Thank you for your attention!
pbouda@cidles.eu
Links
Clarin curation project:
http://de.clarin.eu/en/discipline-specific-working-groups/wg-3-linguistic-fieldwork-anthr
Poio API:
http://media.cidles.eu/poio/poio-api/
GrAF:
http://www.xces.org/ns/GrAF/1.0/
CLASS:
http://class.uni-koeln.de

More Related Content

What's hot

R data presentation
R data presentationR data presentation
R data presentationJulie Hartig
 
Mapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiMapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiPaola Espinoza-Arias
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportOWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportNormunds Grūzītis
 
Challenges operating and scaling GrapheneDB by Francisco Fernandez
Challenges operating and scaling GrapheneDB by Francisco Fernandez Challenges operating and scaling GrapheneDB by Francisco Fernandez
Challenges operating and scaling GrapheneDB by Francisco Fernandez J On The Beach
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdEOSC-hub project
 
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...FIDA: a framework to automatically integrate FPGA kernels within Data-Science...
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...NECST Lab @ Politecnico di Milano
 
Vocabulary for Linked Data Visualization Model - Dateso 2015
Vocabulary for Linked Data Visualization Model - Dateso 2015Vocabulary for Linked Data Visualization Model - Dateso 2015
Vocabulary for Linked Data Visualization Model - Dateso 2015Jiří Helmich
 
Linq presentation by vaidhesh
Linq presentation by vaidheshLinq presentation by vaidhesh
Linq presentation by vaidheshVaidheswaran CS
 
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...giovannibiallo
 

What's hot (13)

R data presentation
R data presentationR data presentation
R data presentation
 
Mapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApiMapping the Web Ontology Language to OpenApi
Mapping the Web Ontology Language to OpenApi
 
PyData2015
PyData2015PyData2015
PyData2015
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL SupportOWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
OWLGrEd/CNL: a Graphical Editor for OWL with Multilingual CNL Support
 
OOoCon Lpod
OOoCon LpodOOoCon Lpod
OOoCon Lpod
 
Challenges operating and scaling GrapheneDB by Francisco Fernandez
Challenges operating and scaling GrapheneDB by Francisco Fernandez Challenges operating and scaling GrapheneDB by Francisco Fernandez
Challenges operating and scaling GrapheneDB by Francisco Fernandez
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
 
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...FIDA: a framework to automatically integrate FPGA kernels within Data-Science...
FIDA: a framework to automatically integrate FPGA kernels within Data-Science...
 
Vocabulary for Linked Data Visualization Model - Dateso 2015
Vocabulary for Linked Data Visualization Model - Dateso 2015Vocabulary for Linked Data Visualization Model - Dateso 2015
Vocabulary for Linked Data Visualization Model - Dateso 2015
 
Linq presentation by vaidhesh
Linq presentation by vaidheshLinq presentation by vaidhesh
Linq presentation by vaidhesh
 
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
 

Viewers also liked

Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Peter Bouda
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentationsusanvo_lavc
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysisPeter Bouda
 

Viewers also liked (6)

Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentation
 
Noord januari 2013
Noord januari 2013Noord januari 2013
Noord januari 2013
 
My Presentation
My PresentationMy Presentation
My Presentation
 
Multimiedia project
Multimiedia projectMultimiedia project
Multimiedia project
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 

Similar to Poio API and GrAF-XML: A radical stand-off approach in language documentation and language typology

2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinalProf. Wim Van Criekinge
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
Ploneconf2012 talk
Ploneconf2012 talkPloneconf2012 talk
Ploneconf2012 talksimahawk
 
Advantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfAdvantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfvegasystemsusa
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to pythonNikhil Kapoor
 
Python Programming Unit1_Aditya College of Engg & Tech
Python Programming Unit1_Aditya College of Engg & TechPython Programming Unit1_Aditya College of Engg & Tech
Python Programming Unit1_Aditya College of Engg & TechRamanamurthy Banda
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
Python programming ppt.pptx
Python programming ppt.pptxPython programming ppt.pptx
Python programming ppt.pptxnagendrasai12
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Prompsit Language Engineering
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Gema Ramirez-Sanchez
 
Python workshop
Python workshopPython workshop
Python workshopShiraz LUG
 
Python.pptx
Python.pptxPython.pptx
Python.pptxabclara
 
introduction to Python (for beginners)
introduction to Python (for beginners)introduction to Python (for beginners)
introduction to Python (for beginners)guobichrng
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Itaapy
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Neo4j
 

Similar to Poio API and GrAF-XML: A radical stand-off approach in language documentation and language typology (20)

2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Ploneconf2012 talk
Ploneconf2012 talkPloneconf2012 talk
Ploneconf2012 talk
 
Advantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdfAdvantage of Phyton Language for Development.pdf
Advantage of Phyton Language for Development.pdf
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
What is python
What is pythonWhat is python
What is python
 
Python Programming Unit1_Aditya College of Engg & Tech
Python Programming Unit1_Aditya College of Engg & TechPython Programming Unit1_Aditya College of Engg & Tech
Python Programming Unit1_Aditya College of Engg & Tech
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
PYTHON UNIT 1
PYTHON UNIT 1PYTHON UNIT 1
PYTHON UNIT 1
 
Python programming ppt.pptx
Python programming ppt.pptxPython programming ppt.pptx
Python programming ppt.pptx
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
Python workshop
Python workshopPython workshop
Python workshop
 
Python workshop
Python workshopPython workshop
Python workshop
 
Protocol buffers
Protocol buffersProtocol buffers
Protocol buffers
 
Python.pptx
Python.pptxPython.pptx
Python.pptx
 
introduction to Python (for beginners)
introduction to Python (for beginners)introduction to Python (for beginners)
introduction to Python (for beginners)
 
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010Presentation of lpOD (ODF automation platform) at FOSDEM 2010
Presentation of lpOD (ODF automation platform) at FOSDEM 2010
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Poio API and GrAF-XML: A radical stand-off approach in language documentation and language typology

  • 1. Poio API and GrAF-XML A radical stand-off approach in language documentation and language typology Jonathan Blumtritt, Cologne Center for eHumanities, University of Cologne Peter Bouda, Centro Interdisciplinar de Documentação Linguística e Social Felix Rau, Department of Linguistics, University of Cologne
  • 2. Overview ● Existing infrastructure and workflows ● CLARIN ● Annotation graphs ● GrAF and Poio API ● Example: Elan EAF to GrAF-XML ● CLASS
  • 5. LD tools and standards ● Elan: EAF, MPEG, WAV ● Toolbox: TXT, XML, WAV ● Arbil: IMDI/CIMDI („Component MetaData Infrastructure“) ● Praat: XML, WAV ● ... ● No standards for tier hierarchies, tier names or annotation schemes ● Efforts in ISOcat
  • 6. ● European initiative within the European Research Infrastructure Consortium: Common Language Resources and Technology Infrastructure (CLARIN) ● aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data ● Started in 2006, part of a roadmap process, timeline currently ending 2020 ● CLARIN-D: working groups in Germany ● Curation projects for different research areas in linguistics
  • 7. Annotation Graphs ● the underlying data model for linguistic annotations ● pivot structure for linguistic data ● time vs. byte offsets ● not hierarchical (but trees are also graphs) ● stand-off annotation ● "It is important to recognize that translation into AGs does not magically create compatibility among systems whose semantics are different." [Bird & Liberman 2001]
  • 9. GrAF ● GrAF: Graph Annotation Framework ● ISO 24612: Language resource management - Linguistic annotation framework (LAF) ● Started as stand-off version of XCES ● API and representation as data structures, not a file format ● GrAF/XML as XML representation ● Used for the MASC of the ANC ● Nodes, edges, regions, annotations, feature structures
  • 10. TEI and GrAF ● Schemata for GrAF created with TEI Roma ● Custumized version of TEI P5 schema ● ODD: „One Document Does it all“ ● GrAF is not TEI compliant ● Share data types and feature structures of annotations ● TEI has „stand-off“ variant, uses XPointer/XLink – Primary data has to be XML
  • 11. Why we use GrAF ● Because it's new! :-) ● No inline markup ● Radical stand-off approach – Easier to share and manage data – Preferred solution to archive cultural heritage – Ideal for sparse annotations ● Existing code: Java and Python ● The beauty of annotation graphs
  • 12. Poio API ● Think of GrAF as an assembly language for linguistic annotation; then Poio API is a libray to map from and to higher-level languages ● Subset of GrAF to represent tier based annotation ● Filters and filter chains for search ● Plugin mechanism for file formats – Mapping semantics: tiers and annotations to nodes and edges ● Meta-data for additional information (tier types etc.)
  • 13. Example: Mapping of EAF to GrAF-XML
  • 14. Elan EAF <TIER DEFAULT_LOCALE="en" LINGUISTIC_TYPE_REF="words" PARENT_REF="W-Spch" PARTICIPANT="" TIER_ID="W-Words"> <ANNOTATION> <ALIGNABLE_ANNOTATION ANNOTATION_ID="a23" TIME_SLOT_REF1="ts4" TIME_SLOT_REF2="ts6"> <ANNOTATION_VALUE>so</ANNOTATION_VALUE> </ALIGNABLE_ANNOTATION> </ANNOTATION> <ANNOTATION> [...] </ANNOTATION> </TIER>
  • 17. GrAF-XML <node xml:id="words..W-Words..na23"> <link targets="words..W-Words..ra23"/> </node> <region anchors="780 1340" xml:id="words..W-Words..ra23"/> <edge from="utterance..W-Spch..n8" to="words..W-Words..na23" xml:id="ea23"/> <a as="words" label="words" ref="words..W-Words..na23" xml:id="a23"> <fs> <f name="annotation_value">so</f> </fs> </a>
  • 19. The code ag = poioapi.annotationgraph.AnnotationGraph() parser = poioapi.io.ElanParser("example.eaf") writer = poioapi.io.graf.Writer() converter = poioapi.io.graf.GrAFConverter(parser, writer) converter.parse() converter.write("example.hdr")
  • 20. Analysis workflows ● Graph-based methods ● Pipe to scientific Python libraries ● GrAF connectors for major linguistic workflow tools (GATE and Apache UIMA) ● Example: Polysemy in dictionaries ● Example: Counting word orders
  • 21. CLASS
  • 22. Thank you for your attention! pbouda@cidles.eu
  • 23. Links Clarin curation project: http://de.clarin.eu/en/discipline-specific-working-groups/wg-3-linguistic-fieldwork-anthr Poio API: http://media.cidles.eu/poio/poio-api/ GrAF: http://www.xces.org/ns/GrAF/1.0/ CLASS: http://class.uni-koeln.de