SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Technological Approaches to
Linguistic Documentation
and
Metadocumentation
Pankaj Dwivedi
Gulab Chand
Somdev Kar
Indian Institute of Technology Ropar
Rupnagar, Punjab 140001
India
27 March 2014 1
Language Documentation
Principles and methods used for the
recording and analysis of primary
language and cultural materials, and
metadata about them.
Unlike before, with the revolution in the
area of information technologies, it is now
possible to maintain organized and long-
lasting linguistic and cultural records.
27 March 2014 2
Why documenting languages is
IMPORTANT?
Half of the world’s language may no
longer to continue to exist after a few
more generations as they are not being
learnt by children as first languages
(Austin & Sallabank, 2011).
Crystal (2002) claims that the rate of
language disappearance is as high as two
languages each month.
27 March 2014 3
How ?
 Creating Dictionaries
 Preparing Language Teaching Materials
 Archiving
 Language Corpora (Written & Spoken)
27 March 2014 4
What is needed?
Lot of language data and latest technology
Language data: Text, Audio and Video
Technology: software and tools which can
handle the language data and platforms
wherein these data can be effectively made
use of.
27 March 2014 5
What do we need?
 Language data ( No Problem)
 Platforms (will see later on)
 Latest TOOLS and SOFTWARE for:
1. Recording and Capturing
2. Analysis
3. Archiving
4. Mobilization
27 March 2014 6
ONE MOMENT!!!
Is ‘Latest’ the best?
or
Old is gold?
CHOOSE CAREFULLY !!!
27 March 2014 7
Is ‘TECHNOLOGY’ adoption
always good?
 Languages may live on without orthography.
But no language will be able to function as
administrative language in a modern society
without a developed language technology
(Trosterud, 2006).
 Technology changes quickly and an uncritical
adoption of new tools and technologies might
compromise with long-term
sustainability, portability, usability and
compatibility with other platforms (Bird &
Simons, 2003). 27 March 2014 8
Striking a balance
 Portability: operating
systems, formats, software, encodings
 Sustainability: long-term preservation
and usefulness
 Maintenance and Distribution:
finances, space, tools and reach
 Access and protocols: paid or free, open
or closed, research or business, full or
partial
27 March 2014 9
Capturing Audio Media
27 March 2014 10
Why or Why not WAV?
27 March 2014 11
Capturing Video Media
27 March 2014 12
 CODECS
CONTAINERS
27 March 2014 13
Capturing Digital Text
 Character Encoding:
Unicode, ASCII, Windows/ANSI, Bi
g5, Latin 5 etc.
 Data Encoding:
XML, SGML, MSWord etc.
 File Encoding: plain-
text, PDF, MSWord etc.
27 March 2014 14
Digital text: An overview
27 March 2014 15
Analysis tools
 Transcription
 Annotation
 Translation
 Metadata Management
27 March 2014 16
Popular Tools
27 March 2014 17
Metadata Management
 Cataloguing: title, speakers, collectors, time
and place, language name etc.
 Descriptive: information about
content, relationship to other content etc.
 Structural: structures and patterns
 Technical: description of
formats, encoding, required tools and software
 Administrative: work log, access protocol etc.
(Nathan &Austin, 2004)
27 March 2014 18
Platforms
1. Online Language Archives:
Examples:OLAC, ANLA, ELAR, CLA, The
Language Archive, PARADISEC etc.
2. Social Media:
Facebook, Twitter, Blogs, etc.
Examples: ‘Indigenous Tweets’ and
‘Facebook in your language’ by Prof. Kevin
Scannell
27 March 2014 19
Conclusion
In the generation when the rate of language
death is at its peak, if we choose to use
moribund technologies to create and preserve
language data, when technologies die, unique
heritage is also lost or encrypted (Bird &
Simons, 2003).
We must keep in mind:
Purpose, Presentation, Portability
and
Preservation
27 March 2014 20
References
 Austin, P., & Sallabank, J. (Eds.) (2011). The
Cambridge handbook of endangered languages.
Cambridge University Press
 Bird, S., & Simons, G. (2003). Seven dimensions of
portability for language documentation and
description. Language, 79(3), pp. 557-582
 Crystal, D. (2002). Language death. Cambridge
University Press.
 Nathan, D., & Austin, P. (2004). Reconceiving
metadata: language documentation through thick and
thin. Language documentation and
description, 2, 179-187.
27 March 2014 21
 Trosterud, T. (2006). Grammatically based
language technology for minority languages.
TRENDS IN LINGUISTICS STUDIES AND
MONOGRAPHS, 175, 293.
27 March 2014 22
Thank You!
Questions and Feedback.
27 March 2014 23

Weitere ähnliche Inhalte

Andere mochten auch

Camera shots and angles
Camera shots and anglesCamera shots and angles
Camera shots and angles
haletheprice
 
Sistem pencernaan 1
Sistem pencernaan 1Sistem pencernaan 1
Sistem pencernaan 1
Mela Barbie
 
Materi 1 proses berpikir
Materi 1 proses berpikirMateri 1 proses berpikir
Materi 1 proses berpikir
Mela Barbie
 

Andere mochten auch (11)

Marketing Intelligance System
Marketing Intelligance SystemMarketing Intelligance System
Marketing Intelligance System
 
Nwtwork Market & Global Market as New Trends in Marketing
Nwtwork Market & Global Market as New Trends in MarketingNwtwork Market & Global Market as New Trends in Marketing
Nwtwork Market & Global Market as New Trends in Marketing
 
Camera shots and angles
Camera shots and anglesCamera shots and angles
Camera shots and angles
 
Sistem pencernaan 1
Sistem pencernaan 1Sistem pencernaan 1
Sistem pencernaan 1
 
Food Inflation
Food InflationFood Inflation
Food Inflation
 
Lbp proposal - updated 20112013
Lbp   proposal - updated 20112013Lbp   proposal - updated 20112013
Lbp proposal - updated 20112013
 
Job application & resume
Job application & resumeJob application & resume
Job application & resume
 
Materi 1 proses berpikir
Materi 1 proses berpikirMateri 1 proses berpikir
Materi 1 proses berpikir
 
Supply chain management
Supply chain managementSupply chain management
Supply chain management
 
Sexual harassment in Workplace
Sexual harassment in WorkplaceSexual harassment in Workplace
Sexual harassment in Workplace
 
Ethics in sports
Ethics in sportsEthics in sports
Ethics in sports
 

Ähnlich wie Technological approaches to linguistic documentation and meta-documentation

Language Documentation Accessibility in Indigenous Languages: A Study in the ...
Language Documentation Accessibility in Indigenous Languages: A Study in the ...Language Documentation Accessibility in Indigenous Languages: A Study in the ...
Language Documentation Accessibility in Indigenous Languages: A Study in the ...
souvikbarua3
 
Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
 
Kern's technology on language teaching and learning
Kern's technology on language teaching and learningKern's technology on language teaching and learning
Kern's technology on language teaching and learning
ivan florez
 

Ähnlich wie Technological approaches to linguistic documentation and meta-documentation (20)

Introduction of the SLE'17 conference
Introduction of the SLE'17 conferenceIntroduction of the SLE'17 conference
Introduction of the SLE'17 conference
 
Linked data and language technologies
Linked data and language technologies Linked data and language technologies
Linked data and language technologies
 
Tell pd ppt
Tell pd pptTell pd ppt
Tell pd ppt
 
2015-11-18 research seminar
2015-11-18 research seminar2015-11-18 research seminar
2015-11-18 research seminar
 
LSDI.pptx
LSDI.pptxLSDI.pptx
LSDI.pptx
 
Language Documentation Accessibility in Indigenous Languages: A Study in the ...
Language Documentation Accessibility in Indigenous Languages: A Study in the ...Language Documentation Accessibility in Indigenous Languages: A Study in the ...
Language Documentation Accessibility in Indigenous Languages: A Study in the ...
 
Exploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech DatasetsExploring the Evolution and Diversity of Speech Datasets
Exploring the Evolution and Diversity of Speech Datasets
 
Morphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil LanguageMorphological Analyzer and Generator for Tamil Language
Morphological Analyzer and Generator for Tamil Language
 
A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...
A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...
A Questionnaire Developed For Conducting Fieldwork On Endangered And Indigeno...
 
Timo Honkela: Digital Preservation and Computational Modeling of Language and...
Timo Honkela: Digital Preservation and Computational Modeling of Language and...Timo Honkela: Digital Preservation and Computational Modeling of Language and...
Timo Honkela: Digital Preservation and Computational Modeling of Language and...
 
Annotated Bibliography Of Language Documentation
Annotated Bibliography Of Language DocumentationAnnotated Bibliography Of Language Documentation
Annotated Bibliography Of Language Documentation
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Kern's technology on language teaching and learning
Kern's technology on language teaching and learningKern's technology on language teaching and learning
Kern's technology on language teaching and learning
 
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language TechnologyPromoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
 
Disntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina ForascuDisntinguished Speaker - Corina Forascu
Disntinguished Speaker - Corina Forascu
 
September 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language ComputingSeptember 2022: Top 10 Read Articles in Natural Language Computing
September 2022: Top 10 Read Articles in Natural Language Computing
 
A Corpus-based Study of EFL Learners Errors in IELTS Essay Writing.pdf
A Corpus-based Study of EFL Learners  Errors in IELTS Essay Writing.pdfA Corpus-based Study of EFL Learners  Errors in IELTS Essay Writing.pdf
A Corpus-based Study of EFL Learners Errors in IELTS Essay Writing.pdf
 
PLE widgets using natural language processing
PLE widgets using natural language processingPLE widgets using natural language processing
PLE widgets using natural language processing
 
Chapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptxChapter One-History of Technology and Language Learning.pptx
Chapter One-History of Technology and Language Learning.pptx
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Technological approaches to linguistic documentation and meta-documentation

  • 1. Technological Approaches to Linguistic Documentation and Metadocumentation Pankaj Dwivedi Gulab Chand Somdev Kar Indian Institute of Technology Ropar Rupnagar, Punjab 140001 India 27 March 2014 1
  • 2. Language Documentation Principles and methods used for the recording and analysis of primary language and cultural materials, and metadata about them. Unlike before, with the revolution in the area of information technologies, it is now possible to maintain organized and long- lasting linguistic and cultural records. 27 March 2014 2
  • 3. Why documenting languages is IMPORTANT? Half of the world’s language may no longer to continue to exist after a few more generations as they are not being learnt by children as first languages (Austin & Sallabank, 2011). Crystal (2002) claims that the rate of language disappearance is as high as two languages each month. 27 March 2014 3
  • 4. How ?  Creating Dictionaries  Preparing Language Teaching Materials  Archiving  Language Corpora (Written & Spoken) 27 March 2014 4
  • 5. What is needed? Lot of language data and latest technology Language data: Text, Audio and Video Technology: software and tools which can handle the language data and platforms wherein these data can be effectively made use of. 27 March 2014 5
  • 6. What do we need?  Language data ( No Problem)  Platforms (will see later on)  Latest TOOLS and SOFTWARE for: 1. Recording and Capturing 2. Analysis 3. Archiving 4. Mobilization 27 March 2014 6
  • 7. ONE MOMENT!!! Is ‘Latest’ the best? or Old is gold? CHOOSE CAREFULLY !!! 27 March 2014 7
  • 8. Is ‘TECHNOLOGY’ adoption always good?  Languages may live on without orthography. But no language will be able to function as administrative language in a modern society without a developed language technology (Trosterud, 2006).  Technology changes quickly and an uncritical adoption of new tools and technologies might compromise with long-term sustainability, portability, usability and compatibility with other platforms (Bird & Simons, 2003). 27 March 2014 8
  • 9. Striking a balance  Portability: operating systems, formats, software, encodings  Sustainability: long-term preservation and usefulness  Maintenance and Distribution: finances, space, tools and reach  Access and protocols: paid or free, open or closed, research or business, full or partial 27 March 2014 9
  • 10. Capturing Audio Media 27 March 2014 10
  • 11. Why or Why not WAV? 27 March 2014 11
  • 12. Capturing Video Media 27 March 2014 12  CODECS
  • 14. Capturing Digital Text  Character Encoding: Unicode, ASCII, Windows/ANSI, Bi g5, Latin 5 etc.  Data Encoding: XML, SGML, MSWord etc.  File Encoding: plain- text, PDF, MSWord etc. 27 March 2014 14
  • 15. Digital text: An overview 27 March 2014 15
  • 16. Analysis tools  Transcription  Annotation  Translation  Metadata Management 27 March 2014 16
  • 18. Metadata Management  Cataloguing: title, speakers, collectors, time and place, language name etc.  Descriptive: information about content, relationship to other content etc.  Structural: structures and patterns  Technical: description of formats, encoding, required tools and software  Administrative: work log, access protocol etc. (Nathan &Austin, 2004) 27 March 2014 18
  • 19. Platforms 1. Online Language Archives: Examples:OLAC, ANLA, ELAR, CLA, The Language Archive, PARADISEC etc. 2. Social Media: Facebook, Twitter, Blogs, etc. Examples: ‘Indigenous Tweets’ and ‘Facebook in your language’ by Prof. Kevin Scannell 27 March 2014 19
  • 20. Conclusion In the generation when the rate of language death is at its peak, if we choose to use moribund technologies to create and preserve language data, when technologies die, unique heritage is also lost or encrypted (Bird & Simons, 2003). We must keep in mind: Purpose, Presentation, Portability and Preservation 27 March 2014 20
  • 21. References  Austin, P., & Sallabank, J. (Eds.) (2011). The Cambridge handbook of endangered languages. Cambridge University Press  Bird, S., & Simons, G. (2003). Seven dimensions of portability for language documentation and description. Language, 79(3), pp. 557-582  Crystal, D. (2002). Language death. Cambridge University Press.  Nathan, D., & Austin, P. (2004). Reconceiving metadata: language documentation through thick and thin. Language documentation and description, 2, 179-187. 27 March 2014 21
  • 22.  Trosterud, T. (2006). Grammatically based language technology for minority languages. TRENDS IN LINGUISTICS STUDIES AND MONOGRAPHS, 175, 293. 27 March 2014 22
  • 23. Thank You! Questions and Feedback. 27 March 2014 23