SlideShare ist ein Scribd-Unternehmen logo

Poio API: a CLARIN-D curation project for language documentation and language typology

Poio API is an open source software library written in Python and is being developed as part of a curation project within the working group “Linguistic Fieldwork, Anthropology, Language Typology” of CLARIN-D . The goal of Poio API is to provide unified access to pivot data structures parsed from different file formats that researchers use in language documentation projects. As unified data structures we chose an implementation of the “Graph Annotation Framework” (GrAF) that was standardized as ISO 24612 in 2012. In our presentation, we will discuss the connections between GrAF and TEI, and present two use cases that demonstrate the innovation and advantage of our approach in comparison to existing methods.

1 von 22
Downloaden Sie, um offline zu lesen
Poio API: a CLARIN-D curation project for
language documentation and language typology
Peter Bouda
Centro Interdisciplinar de Documentação Linguística e Social
pbouda@cidles.eu
Overview
● Existing infrastructure and workflows
● Poio API and CLASS within CLARIN
● GrAF and TEI
● Poio API
● GrAF as pivot structures (IGT)
● GrAF for retro-digitization (Dictionary)
Fieldwork
Fotos
Existing Infrastructure
LD tools and standards
● Elan: EAF, MPEG, WAV
● Toolbox: TXT, XML, WAV
● Arbil: IMDI/CIMDI („Component MetaData
Infrastructure“)
● Praat: XML, WAV
● ...
● No standards for tier hierarchies, tier names or
annotation schemes
● Efforts in ISOcat
Interlinear Glossed Text

Recomendados

Poio API and GraF-XML @ Balisage 2013
Poio API and GraF-XML @ Balisage 2013Poio API and GraF-XML @ Balisage 2013
Poio API and GraF-XML @ Balisage 2013Peter Bouda
 
1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in r1.3 introduction to R language, importing dataset in r, data exploration in r
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
 
HyperGraphDb
HyperGraphDbHyperGraphDb
HyperGraphDbborislav
 
Evolution of the Graph Schema
Evolution of the Graph SchemaEvolution of the Graph Schema
Evolution of the Graph SchemaJoshua Shinavier
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016Sergio Fernández
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 

Más contenido relacionado

Was ist angesagt?

A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebOlaf Hartig
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
 
R Introduction
R IntroductionR Introduction
R Introductionschamber
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software Kamarul Imran
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2Dimitris Kontokostas
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overviewMaxim Litvak
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...GUANGYUAN PIAO
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programmingNimrita Koul
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on rAshraf Uddin
 
Exposing relational database as rdf
Exposing relational database as rdfExposing relational database as rdf
Exposing relational database as rdfShakil Ahmed
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 

Was ist angesagt? (20)

R programming
R programmingR programming
R programming
 
A Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the WebA Context-Based Semantics for SPARQL Property Paths over the Web
A Context-Based Semantics for SPARQL Property Paths over the Web
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force status
 
R Programming
R ProgrammingR Programming
R Programming
 
R Introduction
R IntroductionR Introduction
R Introduction
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Introducing The R Software
Introducing The R Software  Introducing The R Software
Introducing The R Software
 
Graph databases & data integration v2
Graph databases & data integration v2Graph databases & data integration v2
Graph databases & data integration v2
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
R programming language: conceptual overview
R programming language: conceptual overviewR programming language: conceptual overview
R programming language: conceptual overview
 
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
WISE2017 - Factorization Machines Leveraging Lightweight Linked Open Data-ena...
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R programming
R programmingR programming
R programming
 
Poster
PosterPoster
Poster
 
Workshop presentation hands on r programming
Workshop presentation hands on r programmingWorkshop presentation hands on r programming
Workshop presentation hands on r programming
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on r
 
Exposing relational database as rdf
Exposing relational database as rdfExposing relational database as rdf
Exposing relational database as rdf
 
R crash course
R crash courseR crash course
R crash course
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 

Destacado

Best episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developerBest episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developerPeter Bouda
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentationsusanvo_lavc
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysisPeter Bouda
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisPeter Bouda
 
Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Peter Bouda
 
Transmision
TransmisionTransmision
Transmisionpailooot
 
Sci cafe humangenome&health
Sci cafe humangenome&healthSci cafe humangenome&health
Sci cafe humangenome&healthToby Rossman
 
Parker catalogue 2012
Parker catalogue 2012Parker catalogue 2012
Parker catalogue 2012PeterRamy
 
Product in theory and practice
Product in theory and practiceProduct in theory and practice
Product in theory and practiceRavi Chandegara
 
RxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScriptRxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScriptViliam Elischer
 
Data models in Angular 1 & 2
Data models in Angular 1 & 2Data models in Angular 1 & 2
Data models in Angular 1 & 2Adam Klein
 
Top Secret: Large-Scale SPA
Top Secret: Large-Scale SPATop Secret: Large-Scale SPA
Top Secret: Large-Scale SPAAnderson Braz
 
Cycling for noobs
Cycling for noobsCycling for noobs
Cycling for noobsSteve Lee
 
Development By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo EditionDevelopment By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo EditionAnthony Ferrara
 

Destacado (20)

Best episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developerBest episode ever: Angular 2 from the perspective of an Angular 1 developer
Best episode ever: Angular 2 from the perspective of an Angular 1 developer
 
Smart Pen Presentation
Smart Pen PresentationSmart Pen Presentation
Smart Pen Presentation
 
Noord januari 2013
Noord januari 2013Noord januari 2013
Noord januari 2013
 
Multimiedia project
Multimiedia projectMultimiedia project
Multimiedia project
 
How community software supports language documentation and data analysis
How community software supports language documentation and data analysisHow community software supports language documentation and data analysis
How community software supports language documentation and data analysis
 
My Presentation
My PresentationMy Presentation
My Presentation
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysis
 
Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...Poio API - An annotation framework to bridge Language Documentation and Natur...
Poio API - An annotation framework to bridge Language Documentation and Natur...
 
Transmision
TransmisionTransmision
Transmision
 
Sci cafe humangenome&health
Sci cafe humangenome&healthSci cafe humangenome&health
Sci cafe humangenome&health
 
Parker catalogue 2012
Parker catalogue 2012Parker catalogue 2012
Parker catalogue 2012
 
Product in theory and practice
Product in theory and practiceProduct in theory and practice
Product in theory and practice
 
Pompa sentrifugal
Pompa sentrifugalPompa sentrifugal
Pompa sentrifugal
 
RxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScriptRxJS - The Reactive extensions for JavaScript
RxJS - The Reactive extensions for JavaScript
 
Data models in Angular 1 & 2
Data models in Angular 1 & 2Data models in Angular 1 & 2
Data models in Angular 1 & 2
 
Top Secret: Large-Scale SPA
Top Secret: Large-Scale SPATop Secret: Large-Scale SPA
Top Secret: Large-Scale SPA
 
Cycling for noobs
Cycling for noobsCycling for noobs
Cycling for noobs
 
01 - Git vs SVN
01 - Git vs SVN01 - Git vs SVN
01 - Git vs SVN
 
Simple testable code
Simple testable codeSimple testable code
Simple testable code
 
Development By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo EditionDevelopment By The Numbers - ConFoo Edition
Development By The Numbers - ConFoo Edition
 

Ähnlich wie Poio API: a CLARIN-D curation project for language documentation and language typology

Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formatsVigen Sahakyan
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Wes McKinney
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Neo4j
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stackFliptop
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep JoshiSpiffy
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
Getting Started with PHP Extensions
Getting Started with PHP ExtensionsGetting Started with PHP Extensions
Getting Started with PHP ExtensionsMichaelBrunoLochemem
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...IndicThreads
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonRalf Gommers
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftTalentica Software
 
Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)lennartkats
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonJen Stirrup
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationTim Burks
 

Ähnlich wie Poio API: a CLARIN-D curation project for language documentation and language typology (20)

Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
How to integrate python into a scala stack
How to integrate python into a scala stackHow to integrate python into a scala stack
How to integrate python into a scala stack
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
DPFManager workshop
DPFManager workshopDPFManager workshop
DPFManager workshop
 
Getting Started with PHP Extensions
Getting Started with PHP ExtensionsGetting Started with PHP Extensions
Getting Started with PHP Extensions
 
Unit V.pdf
Unit V.pdfUnit V.pdf
Unit V.pdf
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)Using Aspects for Language Portability (SCAM 2010)
Using Aspects for Language Portability (SCAM 2010)
 
Introduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and PythonIntroduction to Analytics with Azure Notebooks and Python
Introduction to Analytics with Azure Notebooks and Python
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
 

Último

AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsPremsankar Chakkingal
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoProduct School
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentThorsten Huelsmann
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfRodneyThomas28
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsEvangelia Mitsopoulou
 
Utilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsUtilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsIES VE
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfSafe Software
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Cprime
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31shyamraj55
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Product School
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Huntly presentation deck design for Behance
Huntly presentation deck design for BehanceHuntly presentation deck design for Behance
Huntly presentation deck design for Behancewhalesdesign
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emNho Vĩnh
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Jay Zhao
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyMustafa Kuğu
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024ThousandEyes
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfTejal81
 

Último (20)

AI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the ClassroomsAI for Educators - Integrating AI in the Classrooms
AI for Educators - Integrating AI in the Classrooms
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry development
 
AGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdfAGFM - Toyota Coaster 1HZ Install Guide.pdf
AGFM - Toyota Coaster 1HZ Install Guide.pdf
 
Battle of React State Managers in frontend applications
Battle of React State Managers in frontend applicationsBattle of React State Managers in frontend applications
Battle of React State Managers in frontend applications
 
Utilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding ApplicationsUtilising Energy Modelling for LCSF and PSDS Funding Applications
Utilising Energy Modelling for LCSF and PSDS Funding Applications
 
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdfIntroducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
Introducing the New FME Community Webinar - Feb 21, 2024 (2).pdf
 
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
Improving IT Investment Decisions and Business Outcomes with Integrated Enter...
 
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
Unleash the Solace Pub Sub connector | Banaglore MuleSoft Meetup #31
 
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
Harnessing the Power of GenAI for Exceptional Product Outcomes by Booking.com...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Huntly presentation deck design for Behance
Huntly presentation deck design for BehanceHuntly presentation deck design for Behance
Huntly presentation deck design for Behance
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
Python For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ emPython For Kids - Sách Lập trình cho trẻ em
Python For Kids - Sách Lập trình cho trẻ em
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
Leonis Insights: The State of AI (7 trends for 2023 and 7 predictions for 2024)
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5Company
 
New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024New ThousandEyes Product Features and Release Highlights: February 2024
New ThousandEyes Product Features and Release Highlights: February 2024
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
 

Poio API: a CLARIN-D curation project for language documentation and language typology

  • 1. Poio API: a CLARIN-D curation project for language documentation and language typology Peter Bouda Centro Interdisciplinar de Documentação Linguística e Social pbouda@cidles.eu
  • 2. Overview ● Existing infrastructure and workflows ● Poio API and CLASS within CLARIN ● GrAF and TEI ● Poio API ● GrAF as pivot structures (IGT) ● GrAF for retro-digitization (Dictionary)
  • 5. LD tools and standards ● Elan: EAF, MPEG, WAV ● Toolbox: TXT, XML, WAV ● Arbil: IMDI/CIMDI („Component MetaData Infrastructure“) ● Praat: XML, WAV ● ... ● No standards for tier hierarchies, tier names or annotation schemes ● Efforts in ISOcat
  • 8. GrAF ● GrAF: Graph Annotation Framework ● ISO 24612: Language resource management - Linguistic annotation framework (LAF) ● Started as stand-off version of XCES ● API and representation as data structures, not a file format ● GrAF/XML as XML representation ● Used for the MASC of the ANC ● Nodes, edges, regions, annotations, feature structures
  • 11. GrAF-XML <node xml:id="words..W-Words..na23"> <link targets="words..W-Words..ra23"/> </node> <region anchors="780 1340" xml:id="words..W-Words..ra23"/> <edge from="utterance..W-Spch..n8" to="words..W-Words..na23" xml:id="ea23"/> <a as="words" label="words" ref="words..W-Words..na23" xml:id="a23"> <fs> <f name="annotation_value">so</f> </fs> </a>
  • 12. Why we use GrAF ● No inline markup ● Radical stand-off approach – Easier to share and manage data – Preferred solution to archive cultural heritage – Ideal for sparse annotations ● Existing code: Java and Python ● API vs. XQuery ● The beauty of annotation graphs
  • 13. Poio API ● Think of GrAF as an assembly language for linguistic annotation; then Poio API is a libray to map from and to higher-level languages ● Subset of GrAF to represent tier based annotation – Interlinear glossed text (IGT) ● Filters and filter chains for search ● Plugin mechanism for file formats – Mapping semantics: tiers and annotations to nodes and edges ● Meta-data for additional information (tier types etc.) ● Efforts to map between TEI and GrAF – Poio API supports IGT, next step is dictionaries and lexica – Retro-digitized dictionary data at University of Marburg are published as GrAF files – We want to publish as TEI
  • 14. A basic converter in Poio API parser = poioapi.io.wikipedia_extractor.Parser("Wikipedia.xml") writer = poioapi.io.graf.Writer() converter = poioapi.io.graf.GrAFConverter(parser, writer) converter.parse() converter.write("Wikipedia.hdr")
  • 15. A parser for CSV files class CsvParser(poioapi.io.graf.BaseParser): def get_root_tiers(self): pass def get_child_tiers_for_tier(self, tier): pass def get_annotations_for_tier(self, tier, annotation_parent=None): pass def tier_has_regions(self, tier): pass def region_for_annotation(self, annotation): pass def get_primary_data(self): pass
  • 17. Example: Analysis of CSV data http://nbviewer.ipython.org/urls/raw.github.com/pbouda/notebooks/master/Diana%2520Hinuq%25
  • 18. Retro-digitization of dictionaries ● From scan to .doc to XML to DB to GrAF ● Radical stand-off approach for unsupervised collaboration ● Dictionaries as cultural heritage texts ● GrAF as primary publication format ● Connectors to brat and TEI
  • 19. Analysis of the data ● Spanish as pivot language, subset of bodypart terms ● Converting GrAF to networkx graph ● Nodes are heads, translations, etc. ● Head and translation connected via edges if they appear in one entry ● Merge of graphs ● Count of paths of length 2 between spanish heads ● Python writes JSON graph, visualized with D3.js
  • 21. Thank you for your attention! pbouda@cidles.eu