SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Human Language Technologies for Ethiopian
Languages: Challenges and Future Directions


         Solomon Teferra Abate, Binyam Ephrem,
 Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed-
              hussen Abubeker and Taye Girma


           LIG, Université Joseph Fourier (UJF)
         ITPhD Program, Addis Ababa University
              solomon_teferra_7@yahoo.com


                  AGIS'11 Conference, Addis Ababa
Outline


●   Ethiopian Languages
●   Human Language Technology (HLT)
      –   Role in Development
      –   HLT in the World
●   HLT for Ethiopian Languages
      –   Language and Technology Coverage
      –   Challenges and limitations
      –   Future Directions and Strategies

                        AGIS'11 Conference, Addis Ababa
Ethiopian Languages


●   There are about 90 languages
●   Most belong to the Afro-Asiatic language family
●   Amharic, Afan-Oromo and Tigringa are the 3 most spoken
●   Amharic is federal working language
      –   Regions have their own working language
      –   The language policy states that everyone has the right to in
           his/her mother tongue
      –   More than 20 languages are MOI in primary (I&II) school
                        AGIS'11 Conference, Addis Ababa
Human Language Technology

●   Is an interdisciplinary field that encompasses most sub-
    disciplines of linguistics, Computational Linguistics, Natural
    Language Processing, computer science, Artificial Intelligence,
    psychology, philosophy, mathematics and statistics
                                  ✔   Morphological analysis/synthesis,
                   ✔   Stemming
Covers ASR,✔
                                  ✔   Information Extraction,
areas              ✔   MT,
       TTS,✔
                                  ✔   Text/document categorization
like:  OCR,
                   ✔   POS tagging,
                                      Spelling and Grammar checking,
           ✔
                                  ✔
                   ✔   Parsing,
                                  ✔   etc.
                        AGIS'11 Conference, Addis Ababa
Human Language Technology - Role

●   Enables ICT products to have knowledge of human language
      ●   Increases the acceptance of the technology and the
            productivity of its users in the information age
●   Helps people collaborate, conduct business, share knowledge
    and participate in social and political debates regardless of
    language barriers or computer skills
●   Relevant for the disadvantaged to have access to information:
      ✔ the illiterate,    ✔ the physically impaired population


      ✔   the rural poor,

                        AGIS'11 Conference, Addis Ababa
HLT in the World

●   Well developed for a few languages of the world like English
●   IBM Watson Computer
    ●       Passed its first test winning a QA competition with $1 M value
    ●       The goal of its design is to have intelligent computer that can
            interact in a natural language
               ✔   Understanding any question asked in a natural speech
               ✔   Answer questions as humans do
        ●    Uses a number of HLT modules such as: ASR, QA, TTS
        ✗    Requires a lot of expensive servers (about a total of $1 billion)
                                AGIS'11 Conference, Addis Ababa
HLT in the World

●   Siri is a simple iphone based system that:
      ●   Receives commands in a natural speech
             ●   Send message
             ●   Schedule meetings
             ●   Place phone calls
●   Siri has been claimed to:
      ●   understand what you say
      ●   know what you mean
      ●   speak back in a natural speech
                           AGIS'11 Conference, Addis Ababa
HLT in the World: Europe

●   Europe is a continent that is united to one multilingual
    economic country with 23 official languages
●   To enable the European languages, the European Union:
      ✔   Invested over €130 M to promote language technologies
            and language resource infrastructures in 2009-2011
      ✔   Allocated €35 M for SME action on Digital Content and
           Languages and €50 M for Language Technologies in its
           Work Program 2011-2012
      ✔   Proposed a simple platform that enables availability of any
            online content and services in all European languages
                        AGIS'11 Conference, Addis Ababa
HLT in the World: South Africa

●   South African government has identified HLT as a priority area
    to enable (technologically) its 11 official languages
➢   Various R&D projects and initiatives have been funded by
    government through:
      ●   Department of Arts and Culture (DAC),
      ●   Department of Science and Technology (DST), and
      ●   National Research Foundation (NRF)
●   The key challenge is fragmentation of R&D activities in HLT
      ●   Addressed by the South African HLT Audit (SAHLTA)
                         AGIS'11 Conference, Addis Ababa
HLT for Ethiopian Languages


●   Research on HLT for Ethiopian languages started in the 1990s
✔   There are now a lot of (>200) encouraging and valuable works
    on:                                ➢ Thesaurus contraction,
    ➢   ASR,              ➢   Stemming,
                                                ➢   Text classification
    ➢   MT                ➢   Parsing,
                                                ➢   Text categorization,
    ➢   Text-to-speech,   ➢   POS tagging,
                                                ➢   Morphological analysis,
    ➢   OCR,              ➢   Spell checking,
                                                ➢   Information Extraction
✗   Most of them are based on LRs developed for the experiment
                          AGIS'11 Conference, Addis Ababa
HLT for Ethiopian Languages

✗   HLT research covers a limited number of Ethiopian languages
                                            HLT for Ethiopian Languages (Masters theses)
                             25
                                                                                               NLP
                                                                                               Speech Processing
                                                                                               OCR
                             20                                                                CSE
            Research Areas




                             15




                             10




                              5




                              0
                                  Amharic      Afan Oromo    Tigringa        Welayta   Ge'ez            Sidama

                                                                 Languages




                                              AGIS'11 Conference, Addis Ababa
Challenges and Limitations

●   Challenges that hinder Ethiopian HLT include:
      –   lack of language resources: speech and text corpora
      –   Lack of standardized evaluation corpora and platform
      –   lack of expertise on both language and technology
      –   time shortage
           ●   done only for academic achievement in the given time
      –   absence of national HLT research plan - HLT road-map
           ● based only on individuals' interest
      –   lack of sustainable and coordinated research fund
                          AGIS'11 Conference, Addis Ababa
Challenges and Limitations

➔   They have limitations:
     –   use of insufficient and low quality language resource
          ➢   research results are not conclusive
     –   research results are not well evaluated, analyzed and
           documented
          ➢  Their achievements and gaps are vague
     –   research attempts in HLT are fragmented
          ➢   lack of integration, consolidation and continuity
               ●   Tokenizer    POS     Parser      LA       ASR/MT
                           AGIS'11 Conference, Addis Ababa
Future Directions and Strategies


●   Is there any other way to escape the cost of the language barrier
    or to cover it with out HLT in the information age? NO!!!
●   Are we rich enough to continue spending for only academic
    exercises? NO!!!
      –   6 months of at least 10 research students doing their thesis on
            any one of HLT areas every year and their supervisors
      –   3 years of at least 6 PhD research students (admitted every year)
            and their research supervisors
      –   The time of academic researchers doing research for publication
           purpose (for academic promotion)
                           AGIS'11 Conference, Addis Ababa
Future Directions and Strategies

●   Give emphasis and recognition to R&D activities in HLT
●   Develop national HLT road-map (HLT Audit)
      –   Shows research priorities
      –   Avoids duplication (even across languages)
      –   Reduces R&D cost
      –   Provides a means of evaluation/assessment
      –   Enforces consolidation, integration and continuity
      –   Inspires researchers and developers
      –   Shows the benefit areas for the HLT industry
                        AGIS'11 Conference, Addis Ababa
Future Directions and Strategies


●   Establish Institutional/National R&D units
      –   Fund, coordinate and evaluate R&D projects
      –   Store, maintain, distribute language resources and R&D
            outputs
      –   Promote the utility of R&D outputs
      –   Coordinate and support private industries
      –   Coordinate the cooperation of the academia and the industry
      –   Promote/attract international investments on HLT industries


                        AGIS'11 Conference, Addis Ababa
Conclusion


●   We have 85 living languages
●   All have speakers who need information and the right
    to get it in a language and the way they understand
              –   HLT is the way to realize it
●   We need to have a strategy to put it in place
      –       Cooperation across:
          ●    Time: past->present->future   ●   Language,
          ●    Research area,                ●   Sector: academic<->industry

                            AGIS'11 Conference, Addis Ababa
We can
           make it
             BY




AGIS'11 Conference, Addis Ababa

Weitere ähnliche Inhalte

Ähnlich wie Human Language Technologies for Ethiopian Languages: Challenges and Future Directions

Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...LangOER
 
Building Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesBuilding Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesGuy De Pauw
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Dr. Amit Kumar Jha
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)finance14
 
LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeGeorg Rehm
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP) ASWINKP11
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital agetechiaith
 
International Sign as a Conference Language
International Sign as a Conference LanguageInternational Sign as a Conference Language
International Sign as a Conference LanguageMobileDeaf
 
Bridging language acquision and language policy
Bridging language acquision and language policyBridging language acquision and language policy
Bridging language acquision and language policyLangOER
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxE.A. Draffan
 
Sustainability in OER for less used languages
Sustainability in OER for less used languagesSustainability in OER for less used languages
Sustainability in OER for less used languagesLangOER
 
Natural language processing for Albanian: a state-of-the-art survey
Natural language processing for Albanian: a state-of-the-art  surveyNatural language processing for Albanian: a state-of-the-art  survey
Natural language processing for Albanian: a state-of-the-art surveyIJECEIAES
 
OER: insights into a multilingual landscape
OER: insights into a multilingual landscapeOER: insights into a multilingual landscape
OER: insights into a multilingual landscapeLangOER
 
Applied linguístics 1
Applied linguístics 1Applied linguístics 1
Applied linguístics 1Carlos Mayora
 
Huy & robert a call to call
Huy & robert a call to callHuy & robert a call to call
Huy & robert a call to callPhung Huy
 

Ähnlich wie Human Language Technologies for Ethiopian Languages: Challenges and Future Directions (20)

Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...Policies for OER in regional and minority languages: are regional and minorit...
Policies for OER in regional and minority languages: are regional and minorit...
 
Building Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African LanguagesBuilding Capacities in Human Language Technology for African Languages
Building Capacities in Human Language Technology for African Languages
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)NTM%20Project%20-Final%20Presentation%20Revised(2)
NTM%20Project%20-Final%20Presentation%20Revised(2)
 
LangOER Conference: Welcome message
LangOER Conference: Welcome message LangOER Conference: Welcome message
LangOER Conference: Welcome message
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Natural language processing (NLP)
Natural language processing (NLP) Natural language processing (NLP)
Natural language processing (NLP)
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
International Sign as a Conference Language
International Sign as a Conference LanguageInternational Sign as a Conference Language
International Sign as a Conference Language
 
Bridging language acquision and language policy
Bridging language acquision and language policyBridging language acquision and language policy
Bridging language acquision and language policy
 
Reflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptxReflections on building a Multi-country AAC Implementation Guide.pptx
Reflections on building a Multi-country AAC Implementation Guide.pptx
 
Sustainability in OER for less used languages
Sustainability in OER for less used languagesSustainability in OER for less used languages
Sustainability in OER for less used languages
 
How can we profit from multilingualism? Good practices in Europe
How can we profit from multilingualism? Good practices in EuropeHow can we profit from multilingualism? Good practices in Europe
How can we profit from multilingualism? Good practices in Europe
 
Natural language processing for Albanian: a state-of-the-art survey
Natural language processing for Albanian: a state-of-the-art  surveyNatural language processing for Albanian: a state-of-the-art  survey
Natural language processing for Albanian: a state-of-the-art survey
 
OER: insights into a multilingual landscape
OER: insights into a multilingual landscapeOER: insights into a multilingual landscape
OER: insights into a multilingual landscape
 
Applied linguístics 1
Applied linguístics 1Applied linguístics 1
Applied linguístics 1
 
Huy & robert a call to call
Huy & robert a call to callHuy & robert a call to call
Huy & robert a call to call
 

Mehr von Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 

Mehr von Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 

Kürzlich hochgeladen

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Human Language Technologies for Ethiopian Languages: Challenges and Future Directions

  • 1. Human Language Technologies for Ethiopian Languages: Challenges and Future Directions Solomon Teferra Abate, Binyam Ephrem, Enchalew Yifru, Kassa Tilahun, Lemlem Hagos, Mohammed- hussen Abubeker and Taye Girma LIG, Université Joseph Fourier (UJF) ITPhD Program, Addis Ababa University solomon_teferra_7@yahoo.com AGIS'11 Conference, Addis Ababa
  • 2. Outline ● Ethiopian Languages ● Human Language Technology (HLT) – Role in Development – HLT in the World ● HLT for Ethiopian Languages – Language and Technology Coverage – Challenges and limitations – Future Directions and Strategies AGIS'11 Conference, Addis Ababa
  • 3. Ethiopian Languages ● There are about 90 languages ● Most belong to the Afro-Asiatic language family ● Amharic, Afan-Oromo and Tigringa are the 3 most spoken ● Amharic is federal working language – Regions have their own working language – The language policy states that everyone has the right to in his/her mother tongue – More than 20 languages are MOI in primary (I&II) school AGIS'11 Conference, Addis Ababa
  • 4. Human Language Technology ● Is an interdisciplinary field that encompasses most sub- disciplines of linguistics, Computational Linguistics, Natural Language Processing, computer science, Artificial Intelligence, psychology, philosophy, mathematics and statistics ✔ Morphological analysis/synthesis, ✔ Stemming Covers ASR,✔ ✔ Information Extraction, areas ✔ MT, TTS,✔ ✔ Text/document categorization like: OCR, ✔ POS tagging, Spelling and Grammar checking, ✔ ✔ ✔ Parsing, ✔ etc. AGIS'11 Conference, Addis Ababa
  • 5. Human Language Technology - Role ● Enables ICT products to have knowledge of human language ● Increases the acceptance of the technology and the productivity of its users in the information age ● Helps people collaborate, conduct business, share knowledge and participate in social and political debates regardless of language barriers or computer skills ● Relevant for the disadvantaged to have access to information: ✔ the illiterate, ✔ the physically impaired population ✔ the rural poor, AGIS'11 Conference, Addis Ababa
  • 6. HLT in the World ● Well developed for a few languages of the world like English ● IBM Watson Computer ● Passed its first test winning a QA competition with $1 M value ● The goal of its design is to have intelligent computer that can interact in a natural language ✔ Understanding any question asked in a natural speech ✔ Answer questions as humans do ● Uses a number of HLT modules such as: ASR, QA, TTS ✗ Requires a lot of expensive servers (about a total of $1 billion) AGIS'11 Conference, Addis Ababa
  • 7. HLT in the World ● Siri is a simple iphone based system that: ● Receives commands in a natural speech ● Send message ● Schedule meetings ● Place phone calls ● Siri has been claimed to: ● understand what you say ● know what you mean ● speak back in a natural speech AGIS'11 Conference, Addis Ababa
  • 8. HLT in the World: Europe ● Europe is a continent that is united to one multilingual economic country with 23 official languages ● To enable the European languages, the European Union: ✔ Invested over €130 M to promote language technologies and language resource infrastructures in 2009-2011 ✔ Allocated €35 M for SME action on Digital Content and Languages and €50 M for Language Technologies in its Work Program 2011-2012 ✔ Proposed a simple platform that enables availability of any online content and services in all European languages AGIS'11 Conference, Addis Ababa
  • 9. HLT in the World: South Africa ● South African government has identified HLT as a priority area to enable (technologically) its 11 official languages ➢ Various R&D projects and initiatives have been funded by government through: ● Department of Arts and Culture (DAC), ● Department of Science and Technology (DST), and ● National Research Foundation (NRF) ● The key challenge is fragmentation of R&D activities in HLT ● Addressed by the South African HLT Audit (SAHLTA) AGIS'11 Conference, Addis Ababa
  • 10. HLT for Ethiopian Languages ● Research on HLT for Ethiopian languages started in the 1990s ✔ There are now a lot of (>200) encouraging and valuable works on: ➢ Thesaurus contraction, ➢ ASR, ➢ Stemming, ➢ Text classification ➢ MT ➢ Parsing, ➢ Text categorization, ➢ Text-to-speech, ➢ POS tagging, ➢ Morphological analysis, ➢ OCR, ➢ Spell checking, ➢ Information Extraction ✗ Most of them are based on LRs developed for the experiment AGIS'11 Conference, Addis Ababa
  • 11. HLT for Ethiopian Languages ✗ HLT research covers a limited number of Ethiopian languages HLT for Ethiopian Languages (Masters theses) 25 NLP Speech Processing OCR 20 CSE Research Areas 15 10 5 0 Amharic Afan Oromo Tigringa Welayta Ge'ez Sidama Languages AGIS'11 Conference, Addis Ababa
  • 12. Challenges and Limitations ● Challenges that hinder Ethiopian HLT include: – lack of language resources: speech and text corpora – Lack of standardized evaluation corpora and platform – lack of expertise on both language and technology – time shortage ● done only for academic achievement in the given time – absence of national HLT research plan - HLT road-map ● based only on individuals' interest – lack of sustainable and coordinated research fund AGIS'11 Conference, Addis Ababa
  • 13. Challenges and Limitations ➔ They have limitations: – use of insufficient and low quality language resource ➢ research results are not conclusive – research results are not well evaluated, analyzed and documented ➢ Their achievements and gaps are vague – research attempts in HLT are fragmented ➢ lack of integration, consolidation and continuity ● Tokenizer POS Parser LA ASR/MT AGIS'11 Conference, Addis Ababa
  • 14. Future Directions and Strategies ● Is there any other way to escape the cost of the language barrier or to cover it with out HLT in the information age? NO!!! ● Are we rich enough to continue spending for only academic exercises? NO!!! – 6 months of at least 10 research students doing their thesis on any one of HLT areas every year and their supervisors – 3 years of at least 6 PhD research students (admitted every year) and their research supervisors – The time of academic researchers doing research for publication purpose (for academic promotion) AGIS'11 Conference, Addis Ababa
  • 15. Future Directions and Strategies ● Give emphasis and recognition to R&D activities in HLT ● Develop national HLT road-map (HLT Audit) – Shows research priorities – Avoids duplication (even across languages) – Reduces R&D cost – Provides a means of evaluation/assessment – Enforces consolidation, integration and continuity – Inspires researchers and developers – Shows the benefit areas for the HLT industry AGIS'11 Conference, Addis Ababa
  • 16. Future Directions and Strategies ● Establish Institutional/National R&D units – Fund, coordinate and evaluate R&D projects – Store, maintain, distribute language resources and R&D outputs – Promote the utility of R&D outputs – Coordinate and support private industries – Coordinate the cooperation of the academia and the industry – Promote/attract international investments on HLT industries AGIS'11 Conference, Addis Ababa
  • 17. Conclusion ● We have 85 living languages ● All have speakers who need information and the right to get it in a language and the way they understand – HLT is the way to realize it ● We need to have a strategy to put it in place – Cooperation across: ● Time: past->present->future ● Language, ● Research area, ● Sector: academic<->industry AGIS'11 Conference, Addis Ababa
  • 18. We can make it BY AGIS'11 Conference, Addis Ababa