SlideShare a Scribd company logo
1 of 26
IFE-MT: An English-to-Yorùbá
       Machine Translation System

*Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and
                 *Agbeyangi, A.O.




      *Department of Computer Science & Engineering

         +Dept. of Linguistics & African Languages

               Obafemi Awolowo University,
                     Ile-Ife, Nigeria



                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   1
In this Presentation..
1) Introduction
2) Theoretical Issues
   a) Features of English &    ba languages
   b) Machine translation process
3) Practical issues
   a) Data acquisition
   b) system design
   c) software development
   d) system implementation

                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   2
Introduction


Machine translation (MT): is the application of
computers to the task of translating texts or speeches
from one natural language to another (Blank, 1998).



An English to     ba (E-Y) MT system translates
English text to    ba text.


                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   3
MT Conceptualisation




     AGIS'11 UNECA CONFERENCE 1-2 DEC.
                                         4
                    2011
MT Paradigm

                         1)Text → Text
                         2)Speech → Speech
                         3)Text → Speech
                         4)Speech → Text




AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011       5
Research Theory
Theories/Assumptions

a)        ba expression moves from concrete to
     abstract, but English expression moves from
     abstract to concrete.

b) Natural language has at most 400 active words.

c) Turing test theory for Evaluation (is a test of a
   machine’s ability to exhibit intelligent behavior):
   Using Mean opinion score


                AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   6
Features of English & Yoruba languages
ENGLISH
Stressed                                             Tone language
    Record(N) Record(V)
                                                          Agba
    Commit(N) commit(V)
    Read(pr ) read (past)                                  gba
                                                           mọ
Intonation time                                      Syllable timed

    He found it on the street?                            Baba
    How did you ever escape?

Orthography                                          Orthography
Non –phonetic                                        Almost phonetic
o   enough                                                gba
                                                         Ẹdẹ
    Fish

Large resources language                             Low resources language
Inflectional                                         Non-Inflectional
     Wait | Waits | waited | waiting                 o   ro | ti  ro   ro
     Go | Goes | Went | Gone | going                 o lọ | ti lọ  lọ

Grammatical Structure                                Grammatical Structure
Subject Verb Object (SVO)
                                                     Subject Verb Object (SVO)
    The boy
                                                                 nrin     a
o   old man
                                                                 lagba

                                       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011    7
English to      ba Machine
    Translation System Challenges
1) The translation process
   the two languages are SVO, but not straight forward
   (cultural bounded words and concepts)

2) Domain selection problem

3) Lack of language resources

4) Orthography typesetting problem

                   AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   8
Language resources challenge
Sources        Correct                  Parallel            Digital       Domain               annotated        size           Textual
               orthography              Corporal/quali                    Specific
                                        ty
Resources      Not              fully   Available/poor      Available     General       (Not   Not annotated    Large enough   Text form
on       the   dialectically            quality e.g. The                  domain
Internet       marked           and     Jehovah                           specific)
               punctuated               Witness


Religious      Divergent                Contextually        Mostly        Specific             Not annotated    large          Mostly text
books     or                            deficient    e.g.   hardcopy      (religious)
documents                               The     Jehovah
                                        Witness

Nigerian       Poor                     Not available       Not all are   Not     domain       Not annotated    small          All are in
newspapers                                                  digitalized   specific                                             text form


The radio &    Not in text form         Speech/poor         Available     General              Not applicable   Large enough   Non-
TV (Media)                              translation         in                                                                 textual
                                                            magnetic
                                                            disc
Government     Mostly English           Not available       Available     Multiple             Not annotated    Sizeable       Text form
documents                                                   in English    domains                               volume

Textbooks/     Mostly                   Not available       Not all are   Specific             Not       POS    Sizeable       Text form
manuals/rep    English                                      digitized                          annotated
orts




                                                   AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011                                                  9
Database Design Cont.
Data 1: Sentences are systematically collected using
  home environment terminologies (Domain)

Data 2: Lexical items extracted from Data 1

Data 3: Data 1 and Data 2 annotations : POS tags

Data 4: Data 3 represented using the format
  designed for MT translation Database
                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   10
Lexicon database




   AGIS'11 UNECA CONFERENCE 1-2 DEC.
                                       11
                  2011
Database Design Cont.




     AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   12
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   13
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   14
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   15
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   16
Software Development and
 Implementation Process

Software tools:
  a) Python

  b) PyQt

  c) NLTK



            AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   17
Parser
                       Natural Language Toolkit (NLTK)
Ade sat on the chair                                           Ade jokoo sori aga naa
(S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the)        (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori))
(N chair))))                                            (N aga) (Det naa))))




                                       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011                      18
Program Coding

Software Modules:

 a) Library
 b) Parser
 c) GUI




                    AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   19
Software Demonstration

a) basic SVO sentences

b)qualified subject/object SVO sentences

c) modified verb SVO sentences




                 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   20
Software Demonstration
http://www.ifecisrg.org/IfeMT




        AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   21
Conclusion
In this presentation, I have discussed:

Theoretical and practical issues relating to our IFE-MT
  development

Database design, Library design

Software development process, and Program coding

The IFE-MT software was demonstrated

We are now updating the database and evaluating the MT system
  using mean opinion score.


                            AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   22
Some Related Work
 Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to
  tackle agreement and word-ordering in english-arabic machine
  translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings
  /Accepted%20Refereed%20Papers/C43.pdf

 Anand, K. M., Dhanalakshmi, V., Soman, K.P. and
  Rajendran, S., (2010), A Sequence Labeling Approach to
  Morphological Analyzer for Tamil Language, International Journal
  on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP
  1944-1951

 Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine
  Translation semantic mapper”, International Journal of Engineering
  Science and Technology Vol. 2(10), PP 5313-5318



                        AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011      23
Related Work Cont.
 Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of
  Noun Phrases from Punjabi to English”, International Journal of
  Computer Science Issues, Vol. 7, Issue 5, September, ISSN
  (Online):1694-0814


 Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based
  machine translation for Swedish to Danish”, In Proceedings of the First
  International Workshop on Free/Open-Source Rule-Based Machine
  Translation, pages 27–33, Alicante.


 Tyers, F. M. (2010), “Rule-based Breton to French machine
  translation”, European Association for Machine Translation, EAMT May
  2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010-
  Tyers.pdf)



                         AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011         24
References
Blank, D. (1998), Definition of Machine
  Translation, http://www.macalester.edu/courses/russ65
  /definiti.htm [Accessed 02/10/2010]




                  AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   25
Thank you for listening




       AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011   26

More Related Content

What's hot

Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.enClarkTony
 
Language Identification: A neural network approach
Language Identification: A neural network approachLanguage Identification: A neural network approach
Language Identification: A neural network approachAlberto Simões
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
Internationalization & localization testing
Internationalization & localization testingInternationalization & localization testing
Internationalization & localization testingRobin0590
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...ESEM 2014
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionArif A.
 

What's hot (7)

Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
 
Language Identification: A neural network approach
Language Identification: A neural network approachLanguage Identification: A neural network approach
Language Identification: A neural network approach
 
Antlr rafaelpsouza
Antlr rafaelpsouzaAntlr rafaelpsouza
Antlr rafaelpsouza
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
Internationalization & localization testing
Internationalization & localization testingInternationalization & localization testing
Internationalization & localization testing
 
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
65 - An Empirical Simulation-based Study of Real-Time Speech Translation for ...
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 

Similar to IFE-MT: An English-to-Yorùbá Machine Translation System

A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingGuy De Pauw
 
Bilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesBilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesKeith Tam
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesDr. Amit Kumar Jha
 
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Association for Computational Linguistics
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba languageAlexander Decker
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionKepa J. Rodriguez
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...Erin Lyons
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for OooJaganadh Gopinadhan
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf SVTaylor123
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
B047006011
B047006011B047006011
B047006011inventy
 
B047006011
B047006011B047006011
B047006011inventy
 
Corpus study design
Corpus study designCorpus study design
Corpus study designbikashtaly
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...ijnlc
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningKv Sagar
 

Similar to IFE-MT: An English-to-Yorùbá Machine Translation System (20)

A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
 
Bilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studiesBilingual typography: Hong Kong case studies
Bilingual typography: Hong Kong case studies
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languages
 
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
Alp Öktem - 2017 - Automatic Extraction of Parallel Speech Corpora from Dubbe...
 
Development of text to speech system for yoruba language
Development of text to speech system for yoruba languageDevelopment of text to speech system for yoruba language
Development of text to speech system for yoruba language
 
Resources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora ResolutionResources for linguistically motivated Multilingual Anaphora Resolution
Resources for linguistically motivated Multilingual Anaphora Resolution
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
The Simple Life: Using Plain and Controlled Language to Improve Translation Q...
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Su2012 ss lg week one full pp
Su2012 ss lg week one full ppSu2012 ss lg week one full pp
Su2012 ss lg week one full pp
 
5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf  5810 oral lang anly transcr wkshp (fall 2014) pdf
5810 oral lang anly transcr wkshp (fall 2014) pdf
 
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-...
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
B047006011
B047006011B047006011
B047006011
 
B047006011
B047006011B047006011
B047006011
 
Corpus study design
Corpus study designCorpus study design
Corpus study design
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learning
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Guy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
Human Language Technologies for Ethiopian Languages: Challenges and Future Di...
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

IFE-MT: An English-to-Yorùbá Machine Translation System

  • 1. IFE-MT: An English-to-Yorùbá Machine Translation System *Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and *Agbeyangi, A.O. *Department of Computer Science & Engineering +Dept. of Linguistics & African Languages Obafemi Awolowo University, Ile-Ife, Nigeria AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 1
  • 2. In this Presentation.. 1) Introduction 2) Theoretical Issues a) Features of English & ba languages b) Machine translation process 3) Practical issues a) Data acquisition b) system design c) software development d) system implementation AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 2
  • 3. Introduction Machine translation (MT): is the application of computers to the task of translating texts or speeches from one natural language to another (Blank, 1998). An English to ba (E-Y) MT system translates English text to ba text. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 3
  • 4. MT Conceptualisation AGIS'11 UNECA CONFERENCE 1-2 DEC. 4 2011
  • 5. MT Paradigm 1)Text → Text 2)Speech → Speech 3)Text → Speech 4)Speech → Text AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 5
  • 6. Research Theory Theories/Assumptions a) ba expression moves from concrete to abstract, but English expression moves from abstract to concrete. b) Natural language has at most 400 active words. c) Turing test theory for Evaluation (is a test of a machine’s ability to exhibit intelligent behavior): Using Mean opinion score AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 6
  • 7. Features of English & Yoruba languages ENGLISH Stressed Tone language Record(N) Record(V) Agba Commit(N) commit(V) Read(pr ) read (past) gba mọ Intonation time Syllable timed He found it on the street? Baba How did you ever escape? Orthography Orthography Non –phonetic Almost phonetic o enough gba Ẹdẹ Fish Large resources language Low resources language Inflectional Non-Inflectional Wait | Waits | waited | waiting o ro | ti ro ro Go | Goes | Went | Gone | going o lọ | ti lọ lọ Grammatical Structure Grammatical Structure Subject Verb Object (SVO) Subject Verb Object (SVO) The boy nrin a o old man lagba AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 7
  • 8. English to ba Machine Translation System Challenges 1) The translation process the two languages are SVO, but not straight forward (cultural bounded words and concepts) 2) Domain selection problem 3) Lack of language resources 4) Orthography typesetting problem AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 8
  • 9. Language resources challenge Sources Correct Parallel Digital Domain annotated size Textual orthography Corporal/quali Specific ty Resources Not fully Available/poor Available General (Not Not annotated Large enough Text form on the dialectically quality e.g. The domain Internet marked and Jehovah specific) punctuated Witness Religious Divergent Contextually Mostly Specific Not annotated large Mostly text books or deficient e.g. hardcopy (religious) documents The Jehovah Witness Nigerian Poor Not available Not all are Not domain Not annotated small All are in newspapers digitalized specific text form The radio & Not in text form Speech/poor Available General Not applicable Large enough Non- TV (Media) translation in textual magnetic disc Government Mostly English Not available Available Multiple Not annotated Sizeable Text form documents in English domains volume Textbooks/ Mostly Not available Not all are Specific Not POS Sizeable Text form manuals/rep English digitized annotated orts AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 9
  • 10. Database Design Cont. Data 1: Sentences are systematically collected using home environment terminologies (Domain) Data 2: Lexical items extracted from Data 1 Data 3: Data 1 and Data 2 annotations : POS tags Data 4: Data 3 represented using the format designed for MT translation Database AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 10
  • 11. Lexicon database AGIS'11 UNECA CONFERENCE 1-2 DEC. 11 2011
  • 12. Database Design Cont. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 12
  • 13. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 13
  • 14. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 14
  • 15. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 15
  • 16. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 16
  • 17. Software Development and Implementation Process Software tools: a) Python b) PyQt c) NLTK AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 17
  • 18. Parser Natural Language Toolkit (NLTK) Ade sat on the chair Ade jokoo sori aga naa (S (NP (N Ade)) (VP (V sat) (NP (P on) (Det the) (S (NP (N Ade)) (VP (V jokoo) (NP (PP (P sori)) (N chair)))) (N aga) (Det naa)))) AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 18
  • 19. Program Coding Software Modules: a) Library b) Parser c) GUI AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 19
  • 20. Software Demonstration a) basic SVO sentences b)qualified subject/object SVO sentences c) modified verb SVO sentences AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 20
  • 21. Software Demonstration http://www.ifecisrg.org/IfeMT AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 21
  • 22. Conclusion In this presentation, I have discussed: Theoretical and practical issues relating to our IFE-MT development Database design, Library design Software development process, and Program coding The IFE-MT software was demonstrated We are now updating the database and evaluating the MT system using mean opinion score. AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 22
  • 23. Some Related Work  Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to tackle agreement and word-ordering in english-arabic machine translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings /Accepted%20Refereed%20Papers/C43.pdf  Anand, K. M., Dhanalakshmi, V., Soman, K.P. and Rajendran, S., (2010), A Sequence Labeling Approach to Morphological Analyzer for Tamil Language, International Journal on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP 1944-1951  Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine Translation semantic mapper”, International Journal of Engineering Science and Technology Vol. 2(10), PP 5313-5318 AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 23
  • 24. Related Work Cont.  Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of Noun Phrases from Punjabi to English”, International Journal of Computer Science Issues, Vol. 7, Issue 5, September, ISSN (Online):1694-0814  Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based machine translation for Swedish to Danish”, In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation, pages 27–33, Alicante.  Tyers, F. M. (2010), “Rule-based Breton to French machine translation”, European Association for Machine Translation, EAMT May 2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010- Tyers.pdf) AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 24
  • 25. References Blank, D. (1998), Definition of Machine Translation, http://www.macalester.edu/courses/russ65 /definiti.htm [Accessed 02/10/2010] AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 25
  • 26. Thank you for listening AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 26