Role of Language Engineering to Preserve Endangered Language
1. Role of Language Engineering
to
Preserve Endangered Languages
Amit Kumar Jha
Ph.D. (Informatics and Language Engineering)
School of Language, MGAHV, Wardha
Sumit Kumar Gupta
MILE, School of Language,
MGAHV, Wardha
National Conference on the Approaches & the Methodologies on the Study of Indegnous & Endangered Language
Dr. Piyush Pratap Singh
Asst. Professor
School of Language
MGAHV, Wardha
2. Endangered Language
• Endangered language (EL) is the language community incorporates less
number of speakers of that particular language.
• EL is likely to become extinct in the near future. Many languages are failing
out of use and being substituted by others is more widely used in the region
or nation.
3. Language Engineering
• Language Engineering (LE) is the subfield of computer science which
explores the field of language related software and its feasible hardware
development.
5. Goal of Language Engineering
• The ultimate goal of LE is to develop a machine which is able to understand
and generate natural language.
• If the Approaches of LE implements on EL, then EL may be Preserve.
6. Language Endangered
• The loss of speakers in one language is the gain of speakers of another
language, except for cases of genocide. Languages are generally replaced
when an entire speech community shifts to another language. Replacing
languages are very often official state languages.
• The world is experiencing an unprecedented wave of language extinctions.
There are between 6,000 and 7,000 languages currently spoken, and between
50 to 90 per cent of those will be extinct by the year 2100.
7. Language Extinction Results
• Language extinction results in loss of cultural identities, knowledge systems,
and the variety of data needed to understand the structure of language in the
mind.
• Documenting endangered languages preserves data and stimulates language
maintenance and revitalisation.
8. Language Documentation
• Many of these languages do not have a written tradition and written data may be completely
unavailable or sparse, the languages are not used in the media, or their speakers do not use the Internet
(and if they do, they often use another language). In such cases, linguists must start from scratch and
collect as much data as possible by recording speakers of a given language.
• Ideally, language documentation contains representative samples from different speakers – representing
different age groups, different professions, of both sexes, and different origins –, but in the case of
endangered languages this may not be possible, because the number of speakers is too small and/or
there are only elder speakers. An important issue apart from the number of speakers and amount of
data concerns the communication between the linguists or other researchers who want to document a
language, and the language community.
9. Language Documentation
• In the case of endangered or minority languages, the documenters often are outsiders, not members of
the community. They may not be fluent speakers of the language in question and can communicate
with the speakers in a second or a third language. This often leads to an unnatural use of the language
that is to be documented.
10. Digitalization
• Digitlizaion is the process in which data is the store in the form of digital.
The durability of digital data is more than others types of data. To preserve
EL by Digitaliztion we convert and store data in digital forrm i.e. text, sound,
image etc. The researchers should create study meterial of EL in digital
form.
11. Application of Language Engineering
• Speech Generation
• Language Translator
• Speech-to-Text
• Text-to-Speech
• Langauge Teaching
• Translitration Tool
12. Application of Language Engineering...
• Speaker Identification
• Verification Speech Recognition
• Character and Document Image Recognition
• Question-Answering System
• Word sense Disambiguation
• Information retrieval and Information Extraction
• Film Production and Dialogue Debbing
13. Speech Generation
• With the help of language engineering we can generate the speech of
Endangered Language by a machine. If a machie will be able to generate EL
then we can preserve that Language.
14. Language Translator
• Language translator or Machine translator is a machine which is able to
translate one language to another language. The first language is called source
language and the second language is called the target language. If the Source
language or the target language is EL, EL is preventing by this Language
Translator system.
15. Speech-to-Text
• It is the process of converting speech to text. This is the task of
documentation. If we convert speech file to text file of EL then we preserve
that language.
16. Language Translator
• Language translator or Machine translator is a machine which is able to
translate one language to another language. The first language is called sourse
language and the second language is called the target language. If the Sourse
language or the target language is EL, EL is prevent by this Language
Translator system.
17. Transcription Tool
• Transcription is the process in which one script to another script.
• A person which is unknown to a specific language, its script and
pronunciation, the role of Transcription tool is importnat in this context.
• If Transcription tool for an EL will be developed then we increase the
number of people to understand that language.
18. Text-to-Speech
• Text-to-speech system is the system in which text data is input and it return
speech data as output. It plays important role in Man-Machine interaction.
19. Langauge Teaching
• Language Teaching is the process of teaching a language. With the help of
LE we can create a system for teaching a language. If EL teaching system is
created EL may be preseve. As it is known that there are some language
which has the speakers of old age and this language doesn’t transfer to the
next generation. After some that language becomes dead. To preserve this
language this system is important.
20. Question Answering System
• Question-Answering system is a Natural Language Processing system. If a
person ask a question to the system, system returns the answer of that
question.
21. Extinct Language
• An endangered language is a language that is at a risk of falling out of use,
generally because it has few surviving speakers. If it loses all of its native
speakers, it becomes an extinct language.
22. Levels of Endangerement
• UNESCO defines four levels of language endangerment between "safe" (not
endangered) and "extinct":
1. Vulnerable
2. Definitely endangered
3. Severely endangered
4. Critically endangered
23. EL in India
• Indian Goverment started a scheme to preseve EL the name of this Scheme
is SPPEL(Scheme for Protection and Preservation of Endangered
Languages).
• The SPPEL has listed 117 languages to be documented in its current phase.
The Languages are some of lesser known Indian languages which are spoken
by less than 10,000 speakers.
24. Refrence
• Refrence List :
• B. WEBBER, M. EGG and V. KORDONI (2012). Discourse structure and language technology. Natural Language
Engineering
• Jurafsky, Martin (et.al. ) Sppech and Language Processing. Prentice Hall, Englewood Cliffs, New Jersey 07632
• Reiter, E. and Dale, R. (2000). Building Natural Language Generation Systems. Cambridge University Press, Cambridge.
• Yarowsky, D. (1996). Homograph disambiguation in text-to-speech synthesis. In Progress in Speech Synthesis, pp. 159–175.
Springer-Verlag, Berlin.
• Small, S. L. and Rieger, C. (1982). Parsing and comprehending withWord Experts. In Lehnert,W. G. and Ringle, M. H.
(Eds.), Strategies for Natural Language Processing, pp. 89–147. Lawrence Erlbaum, New Jersey.
• www.sppel.org