IFE-MT: An English-to-Yorùbá Machine Translation System
1. IFE-MT: An English-to-Yorùbá
Machine Translation System
*Eludiora, S. I., +Salawu, S. A., *Odejobi, O. A. and
*Agbeyangi, A.O.
*Department of Computer Science & Engineering
+Dept. of Linguistics & African Languages
Obafemi Awolowo University,
Ile-Ife, Nigeria
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 1
2. In this Presentation..
1) Introduction
2) Theoretical Issues
a) Features of English & ba languages
b) Machine translation process
3) Practical issues
a) Data acquisition
b) system design
c) software development
d) system implementation
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 2
3. Introduction
Machine translation (MT): is the application of
computers to the task of translating texts or speeches
from one natural language to another (Blank, 1998).
An English to ba (E-Y) MT system translates
English text to ba text.
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 3
6. Research Theory
Theories/Assumptions
a) ba expression moves from concrete to
abstract, but English expression moves from
abstract to concrete.
b) Natural language has at most 400 active words.
c) Turing test theory for Evaluation (is a test of a
machine’s ability to exhibit intelligent behavior):
Using Mean opinion score
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 6
7. Features of English & Yoruba languages
ENGLISH
Stressed Tone language
Record(N) Record(V)
Agba
Commit(N) commit(V)
Read(pr ) read (past) gba
mọ
Intonation time Syllable timed
He found it on the street? Baba
How did you ever escape?
Orthography Orthography
Non –phonetic Almost phonetic
o enough gba
Ẹdẹ
Fish
Large resources language Low resources language
Inflectional Non-Inflectional
Wait | Waits | waited | waiting o ro | ti ro ro
Go | Goes | Went | Gone | going o lọ | ti lọ lọ
Grammatical Structure Grammatical Structure
Subject Verb Object (SVO)
Subject Verb Object (SVO)
The boy
nrin a
o old man
lagba
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 7
8. English to ba Machine
Translation System Challenges
1) The translation process
the two languages are SVO, but not straight forward
(cultural bounded words and concepts)
2) Domain selection problem
3) Lack of language resources
4) Orthography typesetting problem
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 8
9. Language resources challenge
Sources Correct Parallel Digital Domain annotated size Textual
orthography Corporal/quali Specific
ty
Resources Not fully Available/poor Available General (Not Not annotated Large enough Text form
on the dialectically quality e.g. The domain
Internet marked and Jehovah specific)
punctuated Witness
Religious Divergent Contextually Mostly Specific Not annotated large Mostly text
books or deficient e.g. hardcopy (religious)
documents The Jehovah
Witness
Nigerian Poor Not available Not all are Not domain Not annotated small All are in
newspapers digitalized specific text form
The radio & Not in text form Speech/poor Available General Not applicable Large enough Non-
TV (Media) translation in textual
magnetic
disc
Government Mostly English Not available Available Multiple Not annotated Sizeable Text form
documents in English domains volume
Textbooks/ Mostly Not available Not all are Specific Not POS Sizeable Text form
manuals/rep English digitized annotated
orts
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 9
10. Database Design Cont.
Data 1: Sentences are systematically collected using
home environment terminologies (Domain)
Data 2: Lexical items extracted from Data 1
Data 3: Data 1 and Data 2 annotations : POS tags
Data 4: Data 3 represented using the format
designed for MT translation Database
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 10
20. Software Demonstration
a) basic SVO sentences
b)qualified subject/object SVO sentences
c) modified verb SVO sentences
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 20
22. Conclusion
In this presentation, I have discussed:
Theoretical and practical issues relating to our IFE-MT
development
Database design, Library design
Software development process, and Program coding
The IFE-MT software was demonstrated
We are now updating the database and evaluating the MT system
using mean opinion score.
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 22
23. Some Related Work
Shquier, M. A. and AL-Nabhan, M. (2010), “Rule-based approach to
tackle agreement and word-ordering in english-arabic machine
translation”, http://www.iseing.org/emcis/EMCIS2010/Proceedings
/Accepted%20Refereed%20Papers/C43.pdf
Anand, K. M., Dhanalakshmi, V., Soman, K.P. and
Rajendran, S., (2010), A Sequence Labeling Approach to
Morphological Analyzer for Tamil Language, International Journal
on Computer Science and Engineering, Vol. 02, No. 06, 2010, PP
1944-1951
Barkade, V. M. and Devale, P.R. (2010), “English to Sanskrit machine
Translation semantic mapper”, International Journal of Engineering
Science and Technology Vol. 2(10), PP 5313-5318
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 23
24. Related Work Cont.
Batra, K. K. and Lehal, G. S. (2010), “Rule Based Machine Translation of
Noun Phrases from Punjabi to English”, International Journal of
Computer Science Issues, Vol. 7, Issue 5, September, ISSN
(Online):1694-0814
Tyers, F. M. and Nordfalk, J. (2009), “Shallow transfer rule-based
machine translation for Swedish to Danish”, In Proceedings of the First
International Workshop on Free/Open-Source Rule-Based Machine
Translation, pages 27–33, Alicante.
Tyers, F. M. (2010), “Rule-based Breton to French machine
translation”, European Association for Machine Translation, EAMT May
2010, St Raphael, France (http://www.mt-archive.info/EAMT-2010-
Tyers.pdf)
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 24
25. References
Blank, D. (1998), Definition of Machine
Translation, http://www.macalester.edu/courses/russ65
/definiti.htm [Accessed 02/10/2010]
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 25
26. Thank you for listening
AGIS'11 UNECA CONFERENCE 1-2 DEC. 2011 26