SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
About OpenSoNaR-CGN
SoNaR-500 and CGN made accessible through a web application,
WhiteLab, which makes it possible to explore and search these collections
with use of information contained in the metadata and linguistic
annotations.
WhiteLab
• Web application for exploring and searching large text collections
• Provides direct access to the texts, audio, transcriptions, and linguistic
annotations
• Uses CQP query language (CQP)
• Offers user interfaces for novice, advanced, and expert users
• Developed by de Taalmonsters in collaboration with Tilburg University
and INL; the current version (2.0) with Radboud University/CLST.
Explore
• View the composition of a collection or corpus through the tree map
view
• Retrieve statistics: frequency lists of (word) tokens, lemmas, parts of
speech, phonetic form
• Retrieve n-grams (max. n=5); combinations of words, lemmas, parts of
speech and/or phonetic forms
• Retrieve specific samples (CGN) or documents (SoNaR)
Search
• Selection of subcorpus by means of metadata filter(s)
• Specification of search pattern or query involving
̶ one or more word(s)
̶ POS tag(s)
̶ lemma(s)
• Queries make use of CQP; however, users can opt to specify their queries
without having to use CQP: search patterns formulated in the simple or
extended version of the interface are interpreted and converted to CQP
automatically.
Presentation of results
• Concordance (KWIC), sorted on the basis of lexical information or
metadata
• Link to larger context in which result was found
• (For CGN data) link to aligned audio file
• Graphical display of frequencies and other statistics
Export of results
Retrieved lists of (meta) data may be exported in tsv format.
SoNaR-500
• Reference corpus of contemporary written Dutch as encountered in
texts originating from the Dutch speaking language area in the
Netherlands and Flanders as well as Dutch translations published in and
targeted at this area.
• Comprises 500+ M words (~ 2 M documents) and includes various
genres and text types, incl. books, magazines, newspapers, discussion
fora, web sites, autocues, and subtitles
• Comes with metadata relating to authors and texts
• Linguistic annotations available: POS tagging, lemmatisation
CGN
• Corpus of contemporary spoken standard Dutch as spoken by adults in
the Netherlands and Flanders
• ~ 9 M words (800+ hours of speech), including various types of speech,
ranging from prepared monologues to spontaneous conversations
• Audio recordings & orthographic transcriptions
• For a subset of the data also phonetic transcriptions are available
• Comes with metadata relating to speakers (e.g. gender, age) and
recordings
• Linguistic annotations include POS tagging and lemmatisation
OpenSoNaR-CGN was developed by de Taalmonsters in collaboration with
Radboud University Nijmegen/CLST, Tilburg University, and INL.
We gratefully acknowledge the feedback we received from our user group
and the funding provided by CLARIN NL under grant number CLARIN-NL-15-
005.
Tree map view of CGN
Metadata filters for specifying subcorpora
Query specification in “extended” mode
Results presented in the form of a concordance
Query in CQP

Weitere ähnliche Inhalte

Was ist angesagt?

Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Alannah Fitzgerald
 
K2 elhanan adler_israelbibliographicdata
K2 elhanan adler_israelbibliographicdataK2 elhanan adler_israelbibliographicdata
K2 elhanan adler_israelbibliographicdataevaminerva
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...locloud
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010nottyknight
 
The Open-Source FLAX Language System
The Open-Source FLAX Language System The Open-Source FLAX Language System
The Open-Source FLAX Language System Alannah Fitzgerald
 
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4j
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4jInvestigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4j
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4jNeo4j
 
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...Neo4j
 

Was ist angesagt? (7)

Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP Sharing an Open Methodology for Building Domain-specific Corpora for EAP
Sharing an Open Methodology for Building Domain-specific Corpora for EAP
 
K2 elhanan adler_israelbibliographicdata
K2 elhanan adler_israelbibliographicdataK2 elhanan adler_israelbibliographicdata
K2 elhanan adler_israelbibliographicdata
 
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
LoCloud Vocabulary Services: Thesaurus management introduction, Walter Koch a...
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010
 
The Open-Source FLAX Language System
The Open-Source FLAX Language System The Open-Source FLAX Language System
The Open-Source FLAX Language System
 
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4j
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4jInvestigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4j
Investigating the Panama Papers Connections with Neo4j - Stefan Komar, Neo4j
 
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...
Neo4j Partner Tag Berlin - Investigating the Panama Papers connections with n...
 

Ähnlich wie Open sonar martinreynaert

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPariadnenetwork
 
Attia sfcm presentation
Attia sfcm presentationAttia sfcm presentation
Attia sfcm presentationMohammed Attia
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3Nick Grattan
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...Felipe Albrecht
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaMohammed Attia
 
Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Jenn Riley
 
Information Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataInformation Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataWim Peters
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...roelandordelman.nl
 
Reborn Digital: coding text
Reborn Digital: coding textReborn Digital: coding text
Reborn Digital: coding textPip Willcox
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampWim Peters
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...Franck Michel
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeGeorg Rehm
 

Ähnlich wie Open sonar martinreynaert (20)

Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Innovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLPInnovative methods for data integration: Linked Data and NLP
Innovative methods for data integration: Linked Data and NLP
 
Attia sfcm presentation
Attia sfcm presentationAttia sfcm presentation
Attia sfcm presentation
 
co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
DeepBlue epigenomic data server: programmatic data retrieval and analysis of ...
 
Fsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attiaFsmnlp presentation mohammed_attia
Fsmnlp presentation mohammed_attia
 
Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...Improving Description through Collaboration: The Ethnomusicological Video for...
Improving Description through Collaboration: The Ethnomusicological Video for...
 
Information Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament dataInformation Extraction from EuroParliament and UK Parliament data
Information Extraction from EuroParliament and UK Parliament data
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
 
Reborn Digital: coding text
Reborn Digital: coding textReborn Digital: coding text
Reborn Digital: coding text
 
Information Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative CampInformation Extraction in the TalkOfEurope Creative Camp
Information Extraction in the TalkOfEurope Creative Camp
 
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
 
Curation Technologies for Multilingual Europe
Curation Technologies for Multilingual EuropeCuration Technologies for Multilingual Europe
Curation Technologies for Multilingual Europe
 

Mehr von CLARIAH

ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018CLARIAH
 
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018CLARIAH
 
Masterclass innosurance 2018
Masterclass innosurance 2018Masterclass innosurance 2018
Masterclass innosurance 2018CLARIAH
 
Flat TLA
Flat TLAFlat TLA
Flat TLACLARIAH
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstrationCLARIAH
 
Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.CLARIAH
 
CMDI2RDF
CMDI2RDFCMDI2RDF
CMDI2RDFCLARIAH
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4CLARIAH
 
2016 05-20-clariah-wp3
2016 05-20-clariah-wp32016 05-20-clariah-wp3
2016 05-20-clariah-wp3CLARIAH
 
2016 05-20-clariah-wp2
2016 05-20-clariah-wp22016 05-20-clariah-wp2
2016 05-20-clariah-wp2CLARIAH
 
2016 05-20-clariah-wp5
2016 05-20-clariah-wp52016 05-20-clariah-wp5
2016 05-20-clariah-wp5CLARIAH
 
MTAS Henny Brugman
MTAS Henny BrugmanMTAS Henny Brugman
MTAS Henny BrugmanCLARIAH
 
LREC Ton vd Wouden
LREC Ton vd WoudenLREC Ton vd Wouden
LREC Ton vd WoudenCLARIAH
 
Paqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan OdijkPaqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan OdijkCLARIAH
 
Struc data Auke Rijpma
Struc data Auke RijpmaStruc data Auke Rijpma
Struc data Auke RijpmaCLARIAH
 
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek VossenDiachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek VossenCLARIAH
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin KomenCLARIAH
 
Athena richard zijdeman
Athena richard zijdemanAthena richard zijdeman
Athena richard zijdemanCLARIAH
 
Struc data aukerijpma
Struc data aukerijpmaStruc data aukerijpma
Struc data aukerijpmaCLARIAH
 
Anansi jauco noordzij
Anansi jauco noordzijAnansi jauco noordzij
Anansi jauco noordzijCLARIAH
 

Mehr von CLARIAH (20)

ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
 
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
 
Masterclass innosurance 2018
Masterclass innosurance 2018Masterclass innosurance 2018
Masterclass innosurance 2018
 
Flat TLA
Flat TLAFlat TLA
Flat TLA
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
 
Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.
 
CMDI2RDF
CMDI2RDFCMDI2RDF
CMDI2RDF
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
2016 05-20-clariah-wp3
2016 05-20-clariah-wp32016 05-20-clariah-wp3
2016 05-20-clariah-wp3
 
2016 05-20-clariah-wp2
2016 05-20-clariah-wp22016 05-20-clariah-wp2
2016 05-20-clariah-wp2
 
2016 05-20-clariah-wp5
2016 05-20-clariah-wp52016 05-20-clariah-wp5
2016 05-20-clariah-wp5
 
MTAS Henny Brugman
MTAS Henny BrugmanMTAS Henny Brugman
MTAS Henny Brugman
 
LREC Ton vd Wouden
LREC Ton vd WoudenLREC Ton vd Wouden
LREC Ton vd Wouden
 
Paqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan OdijkPaqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan Odijk
 
Struc data Auke Rijpma
Struc data Auke RijpmaStruc data Auke Rijpma
Struc data Auke Rijpma
 
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek VossenDiachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
 
Athena richard zijdeman
Athena richard zijdemanAthena richard zijdeman
Athena richard zijdeman
 
Struc data aukerijpma
Struc data aukerijpmaStruc data aukerijpma
Struc data aukerijpma
 
Anansi jauco noordzij
Anansi jauco noordzijAnansi jauco noordzij
Anansi jauco noordzij
 

Kürzlich hochgeladen

Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 

Kürzlich hochgeladen (20)

Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 

Open sonar martinreynaert

  • 1. About OpenSoNaR-CGN SoNaR-500 and CGN made accessible through a web application, WhiteLab, which makes it possible to explore and search these collections with use of information contained in the metadata and linguistic annotations. WhiteLab • Web application for exploring and searching large text collections • Provides direct access to the texts, audio, transcriptions, and linguistic annotations • Uses CQP query language (CQP) • Offers user interfaces for novice, advanced, and expert users • Developed by de Taalmonsters in collaboration with Tilburg University and INL; the current version (2.0) with Radboud University/CLST. Explore • View the composition of a collection or corpus through the tree map view • Retrieve statistics: frequency lists of (word) tokens, lemmas, parts of speech, phonetic form • Retrieve n-grams (max. n=5); combinations of words, lemmas, parts of speech and/or phonetic forms • Retrieve specific samples (CGN) or documents (SoNaR) Search • Selection of subcorpus by means of metadata filter(s) • Specification of search pattern or query involving ̶ one or more word(s) ̶ POS tag(s) ̶ lemma(s) • Queries make use of CQP; however, users can opt to specify their queries without having to use CQP: search patterns formulated in the simple or extended version of the interface are interpreted and converted to CQP automatically. Presentation of results • Concordance (KWIC), sorted on the basis of lexical information or metadata • Link to larger context in which result was found • (For CGN data) link to aligned audio file • Graphical display of frequencies and other statistics Export of results Retrieved lists of (meta) data may be exported in tsv format. SoNaR-500 • Reference corpus of contemporary written Dutch as encountered in texts originating from the Dutch speaking language area in the Netherlands and Flanders as well as Dutch translations published in and targeted at this area. • Comprises 500+ M words (~ 2 M documents) and includes various genres and text types, incl. books, magazines, newspapers, discussion fora, web sites, autocues, and subtitles • Comes with metadata relating to authors and texts • Linguistic annotations available: POS tagging, lemmatisation CGN • Corpus of contemporary spoken standard Dutch as spoken by adults in the Netherlands and Flanders • ~ 9 M words (800+ hours of speech), including various types of speech, ranging from prepared monologues to spontaneous conversations • Audio recordings & orthographic transcriptions • For a subset of the data also phonetic transcriptions are available • Comes with metadata relating to speakers (e.g. gender, age) and recordings • Linguistic annotations include POS tagging and lemmatisation OpenSoNaR-CGN was developed by de Taalmonsters in collaboration with Radboud University Nijmegen/CLST, Tilburg University, and INL. We gratefully acknowledge the feedback we received from our user group and the funding provided by CLARIN NL under grant number CLARIN-NL-15- 005. Tree map view of CGN Metadata filters for specifying subcorpora Query specification in “extended” mode Results presented in the form of a concordance Query in CQP