SlideShare a Scribd company logo
1 of 27
AGIS’11 Action week for Information sharing


Modeling Improved Syllabification Algorithm for Amharic


         Nirayo Hailu and Sebsbie Hailemariam (PhD.)

     nirayo2000@yahoo.com       sebsibe2004@yahoo.com


                   December 01, 2011
                  Addis Ababa, Ethiopia
OUTLINE


  Introduction

  Amharic Syllable Structure & Syllabification

  Design of Syllabification model

  Experimental results & evaluation

  Conclusion & Recommendations




Modeling Improved Amharic Syllabification Algorithm     2
INTRODUCTION

  A syllable is a unit of sound composed of a central peak of
sonority (usually a vowel), & the consonants that cluster around
this central peak.

  Syllabification: is the task of segmenting words whether
spoken or written into syllables

  Technically, the basic elements of the syllable are:
     Onset
     Rhyme (Nucleus + Coda)

  A syllable can be described by a series of grammars.
    consonant-vowel-consonant (CVC) sequence.
     onset, nucleus & coda (ONC)

   Modeling Improved Amharic Syllabification Algorithm         3
INTRODUCTION
Example: Syllable (σ) structure for the word (ብብብብ
                                                 ) /bil-hat/
          σ                                                      σ

  Onset         rhyme                                    Onset       rhyme


    b      nucleus             coda                        h         nucleus   coda

               i               l                                       a         t
Importance of identification of syllables structures:
   Speech synthesis (in G2P module, prosody module, Synthesis
  module)
       Improve the synthesized speech intonation
   Speech recognition (pronunciations dictionary)
       To build recognizer which represents pronunciations in
     terms of syllables/phoneme rather than grapheme.

   Modeling Improved Amharic Syllabification Algorithm                                4
INTRDUCTION

Amharic Language

  Amharic is a syllabic language in which every grapheme
represent Consonant-Vowel assimilation.

  All the syllables are not uttered as expected.

 Amharic orthography did not show epenthetic vowel &
geminated consonants.

  In this project we developed appropriate syllabification model
for Amharic text.



   Modeling Improved Amharic Syllabification Algorithm       5
AUTOMATIC SYLLABIFICATION

Approaches
Rule-based:
     Effectively embodies some theoretical position regarding the
  syllable
     Rules are used as gold standard.
     requires linguistic expert
     Implementing notions such as maximal onset principle & sonority
  hierarchy.
  . Example: sonority curve for the word ምምምም/milkit/
                                            -




   Modeling Improved Amharic Syllabification Algorithm          6
AUTOMATIC SYLLABIFICATION
Approaches

Data-driven:
  Infer new syllabifications from an evidence base on already-
syllabified words (a dictionary or lexicon).

 Requires large training corpus to attain better performance.

   Examples:

           Look-up procedure
           Syllabification by analogy
           Decision tree-based syllabification




    Modeling Improved Amharic Syllabification Algorithm         7
AUTOMATIC SYLLABIFICATION
   Related works
Title                                             Approach used Dataset used             Accuracy
Unit Selection Voice For Amharic                  Rule-based                 -              -
Using FESTVOX
A Rule based Syllabification                      Rule-based     30,000 distinct words   99.95%
Algorithm for Sinhala
Automatic detection of syllable                   Rule-based     speaker1: 653 words     97.77%
boundaries in spontaneous speech                                 speaker2:1,238 words
(for French language)
Automatic Word Stress Marking &                   Rule-based     test 1: 1000 words      99.7%
Syllabification for Catalan TTS                                  test 2: 223 words       99.8%
Automatic Syllabification for Danish              Rule-based &   test1: 1000 (randomly   RB(96.9%
Text-to-Speech Systems                            (ANN)          selected)               &98.7% )
                                                                 test2: 1000             ANN(94.1
                                                                 (from newspapers)       %&
                                                                                         94.5%)
A Syllabification Algorithm for                   Rule-based     316 words               98.4%
Spanish

        Modeling Improved Amharic Syllabification Algorithm                                      8
AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
    The main syllable templates: V , VC , VCC , CV, CVC, CVCC .
 Gemination & syllable structures
   Traditionally, it is represented either as /C:/ or /CC/ to
   indicate its length.

   Gemination happens when a spoken consonant is pronounced
   for an audibly longer period of time than a short consonant.

   Gemination occurs frequently Amharic words except for
   phoneme (ም )/h/ & (ም /ax/.
                        )




      Modeling Improved Amharic Syllabification Algorithm         9
AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
Gemination & syllable structures
  Example: ምምም /kift/ & with gemination /kiffitt/
              




                                                          /Kift/




                                                          /Kiffitt/




    Modeling Improved Amharic Syllabification Algorithm               10
AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
 Consonant clusters & their syllable structures
   In Amharic, the maximum number of allowable consonant
   sequences in a cluster is two.

    Onset cluster is not allowed in Amharic.

    Sonority hierarchy help us to deal with consonant clusters.

   Example: if stops & liquids appear together in cluster word
   finally, an epenthesis vowel is inserted.

    – Sonority of the final liquid is greater than that of the preceding phoneme.




      Modeling Improved Amharic Syllabification Algorithm                           11
AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
 Consonant clusters & Epenthesis

    Epenthesis: the process of inserting epenthetic vowel to split
   impermissible consonant clusters.

   General rules regarding epenthesis in Amharic:

       Word initially no consonant cluster
       CCC CCiC, in a word like /fendto/ /fendito/
       C:C C:iC, in a word like /fellgo//felligo/
       CC:CiC:, in a word like /sebrre//sebirre/
       C:C: C:iC:
       Final position sonority hierarchy principle is applied




      Modeling Improved Amharic Syllabification Algorithm            12
DESIGN OF SLLABIFICATION MODEL


Syllabification
   Having gemination handling rules, syllabification rules,
  epenthesis rules & syllable templates of the language it is
  possible to syllabify (mark syllable boundaries) given the text.




     Modeling Improved Amharic Syllabification Algorithm       13
DESIGN OF SLLABIFICATION MODEL



Design of rule-based syllabification Model
   The over all architecture of automatic syllabification includes five
   modules:

      o Transliteration module

      o Gemination handling module (expert knowledge)

      o Epenthesis module

      o Syllabification module

      o Stress assignment module




  Modeling Improved Amharic Syllabification Algorithm                     14
GENERAL ARCHITECTURE FOR AUTOMATIC SYLLABIFICATION
                                          Amharic Text

                                           Transliteration


         Expert’s                            Gemination
        knowledge

                                               Epenthesis
                                    Consonant cluster Identification

                                  Geminated consonant identification
                                                                       Sonority Scale
                                      Epenthetic Vowel Insertion       of phonemes &
                                                                         Epenthesis
                                                                            Rules
                                             Syllabification
           Syllable                     Consonant-Vowel parsing
         templates &
                                       Syllable template matching
        Syllabification
             rules                     Syllable boundary marking



                                          Stress Assignment
                                       Syllable weight Assignment        Syllable
                                                                       Weight (Rules)
                                             Stress Marker



                                     Syllable & stress marked
                                     Phoneme sequence

Modeling Improved Amharic Syllabification Algorithm                                     15
DESIGN OF SYLLABIFICATION MODEL
Proposed Epenthesis procedure

1. Accept input word & scan from left to right.
2. If consonant cluster occurs at word initial position, insert epenthetic
   vowel between them.
      Exception: If the first phoneme is consonant & the next consonant
   is glide /w/.
     (Rule #1)
3. If three consonants are appeared in sequence word medially or word
   final position, insert epenthetic vowel before the third consonant.( Rule
   #2).
      Exception: If the middle consonant sonority is greater than the rest
   insert epenthetic vowel after the first consonant in the cluster.
4. If a cluster of consonants contains the geminate & singleton in
   sequence, insert epenthetic vowel after the geminated consonants.( Rule
   #3)

     Modeling Improved Amharic Syllabification Algorithm                16
DESIGN OF SYLLABIFICATION MODEL

Proposed Epenthesis procedure

5. If a cluster of consonants contains the singleton & geminate in
   sequence, insert epenthetic vowel after the singleton consonants. (Rule
   #4)
6. If a cluster of consonants contains two different geminates in
   sequence, insert epenthetic vowel between the two geminate consonants.
   (Rule #5)
7. If the sonority of the final consonant is greater than that of the
   preceding consonant, the epenthetic vowel is inserted between the final
   consonant clusters. (Rule #6)
8. Repeat 2 through 7 until all the phonemes are parsed in the phonemes
   list.


     Modeling Improved Amharic Syllabification Algorithm              17
DESIGN OF SYLLABIFICATION MODEL

Proposed syllabification procedure

1. Accept the input from epenthesis algorithm & scan from left to right.
2. At word initial position if two vowels phonemes (VV) occurs in
   sequence, mark syllable boundary between them.
3. If the initial phoneme is vowel & the next two phonemes are consonant
   & vowels respectively; mark the syllable boundary just at the second
4. If (VCCV) pattern occurs at any position, mark syllable boundary
   between the two consonant clusters.
5. If (VCVC) pattern occurs at word initial position, mark syllable
   boundary before the second vowel.
6. If (CVV) type sequence occurs at any position, mark syllable boundary
   between the two vowels.

     Modeling Improved Amharic Syllabification Algorithm               18
DESIGN OF SYLLABIFICATION MODEL
Proposed syllabification procedure

7. If (CVCCV) phoneme sequence occurs at word initial position mark
   syllable boundary between the middle consonant clusters (CVC-CV).
8. If (CVCC) pattern occurs at word final position & if there is phoneme
   before the first consonant mark syllable boundary before the initial
   consonant in this pattern.
9. If (CVCV) pattern occurs at any position, mark syllable boundary after
   the vowels, but if it occurs at word final position the syllable boundary
   becomes CV - CV pattern.
10. If (CVC1C1VC or CVCCVC) pattern occurs in a word mark syllable
   boundary between the geminated consonants. (CVC1- C1VC).
11. If (VVCC) syllable pattern occurs at word final or initial position
   mark syllable boundary between the two vowels.
12. Repeat 2 throgh11 until all phonemes are parsed.

     Modeling Improved Amharic Syllabification Algorithm                19
DESIGN OF SYLLAFICATION MODEL

The algorithms were implemented using C# programming language




   Modeling Improved Amharic Syllabification Algorithm          20
EXPERIMENTAL RESULTS & EVALUTATION
The test corpus:
   Each word contains three to four syllables on average.
   The corpus contains a total of 3,099 syllables
   779 consonant clusters including geminated consonants.



 Cluster
                          Phonemes Position in a
 type
                Type      word                         Total Number   Percentage (%)
 Consonant      #CC       initial                            256      45.47
 Consonant      CC#       final                              78       13.85
 Consonant      CCC       medial or final                    55       9.77
 Gemination C:C           medial or final                    65       11.54
 Gemination CC:           medial or final                    61       10.83
 Gemination C:C:          medial or final                    48       8.53
                        Total                               563       100



 Modeling Improved Amharic Syllabification Algorithm                                   21
EXPERIMENTAL RESULTS & EVALUTATION
Epenthesis performance
                                      Epenthesis insertion performance
                              Algorithm Prediction
                                            Insert          Don’t insert Total    Rate
                                            epenthesis      epenthesis
  Expert decision




                              Insert       528                     17    545      69.97%
                              epenthesis
                              Don’t insert 2                       232   234      30.03%
                              epenthesis
                              Total        530                     249   779      100%

                                         # correct insertion correct neglection
                        Accuracy                                                100%
                                                # total consonant clusters




             Modeling Improved Amharic Syllabification Algorithm                           22
EXPERIMENTAL RESULTS & EVALUTATION
Syllabification
  Distribution of syllable templates over the test result
         Syllable         Frequencies                                   Percentage
         Pattern         Word                             Word    Total (%)
                         Initial           Word medial    final
    V                           36                   0       2     38        1.27
    CV                         352                  690     579   1621      53.59
    VC                          64                   25      5     94        3.10
    VCC                         74                   0       0     74        2.44
    CVC                        450                  310     342   1102      36.43
    CVCC                        24                   0      72     96        3.17
    Total                     1000                 1025    1000   3025      100%



   Syllabification performance: evaluation by Amharic Linguist
  Expert shows an overall accuracy of the syllabifier 98.1%,
  Word accuracy 98.1% & the same figure tends to juncture
  accuracy.
    Modeling Improved Amharic Syllabification Algorithm                              23
EXPERIMENTAL RESULTS & EVALUTATION


Syllabification errors
    Most of the errors occurred due to missed epenthesis vowel or wrongly
   inserted epenthetic vowel

   Problem #.
                                                                           Total # in the test
                    Problem Descriptions                                   result corpus
   1                Words which have wrong epenthetic insertions           2
   2
                    Words with neglected epenthetic vowel insertions       11
   3
                    Syllabification problem from neglected epenthesis in
                    CC sequence at word medial position                    6


   Total syllabification error                                             19




 Modeling Improved Amharic Syllabification Algorithm                                             24
CONCLUTION & RECEMMONDATIONS


Conclusion
    Automatic syllabification algorithm in considering frequently
    occurring epithetic vowel /i/ and gemination

    Algorithm for the frequently occurring epenthetic vowel.
.
    Result showed 98.1% word accuracy

    Rule-based syllabification & linguistic syllabification principles
    are important in implementing automatic syllabification &
    epenthesis.



     Modeling Improved Amharic Syllabification Algorithm            25
CONCLUTION & RECOMMENDATIONS
Recommendations
  the researchers in the area can use the algorithm in Amharic TTS & in
 Amharic ASR – the source code can be found in C++(GNU C++
 compiler) and C#.
 Future works
    Gemination handling algorithm.

    Study on final consonant cluster to improve the performance
    of the syllabifier.

    Stress assignment algorithm.

    Performance investigation on both Amharic TTS & ASR.

    A comparison study using data-driven approaches
     Modeling Improved Amharic Syllabification Algorithm          26
ብብብብብብ
     Thank you!
     ብ !




Modeling Improved Amharic Syllabification Algorithm   27

More Related Content

Similar to Modeling Improved Syllabification Algorithm for Amharic

Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingGuy De Pauw
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingGuy De Pauw
 
Chapter4 natural classes of sounds distinctive features
Chapter4 natural classes of sounds distinctive featuresChapter4 natural classes of sounds distinctive features
Chapter4 natural classes of sounds distinctive featuresQamraSaud AlOtaibi
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicCSCJournals
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionWaqas Tariq
 
Bangla spell checker & suggestion generator
Bangla spell checker & suggestion generatorBangla spell checker & suggestion generator
Bangla spell checker & suggestion generatorMdAlAmin187
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Guy De Pauw
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzerahmed51236
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesissipij
 
Criterio de evaluacion rubrica y lista de cotejo
Criterio de evaluacion rubrica y lista de cotejoCriterio de evaluacion rubrica y lista de cotejo
Criterio de evaluacion rubrica y lista de cotejodeyamartinez
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 

Similar to Modeling Improved Syllabification Algorithm for Amharic (15)

Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic ProgrammingLearning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
Learning Morphological Rules for Amharic Verbs Using Inductive Logic Programming
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
 
Chapter4 natural classes of sounds distinctive features
Chapter4 natural classes of sounds distinctive featuresChapter4 natural classes of sounds distinctive features
Chapter4 natural classes of sounds distinctive features
 
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard ArabicAutomatic Phonetization-based Statistical Linguistic Study of Standard Arabic
Automatic Phonetization-based Statistical Linguistic Study of Standard Arabic
 
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech RecognitionHybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
Hybrid Phonemic and Graphemic Modeling for Arabic Speech Recognition
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Bangla spell checker & suggestion generator
Bangla spell checker & suggestion generatorBangla spell checker & suggestion generator
Bangla spell checker & suggestion generator
 
Speech recognition for arabic
Speech recognition for arabicSpeech recognition for arabic
Speech recognition for arabic
 
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
Setswana Tokenisation and Computational Verb Morphology: Facing the Challenge...
 
I3 madankarky2 karthika
I3 madankarky2 karthikaI3 madankarky2 karthika
I3 madankarky2 karthika
 
Syntax analyzer
Syntax analyzerSyntax analyzer
Syntax analyzer
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesis
 
Criterio de evaluacion rubrica y lista de cotejo
Criterio de evaluacion rubrica y lista de cotejoCriterio de evaluacion rubrica y lista de cotejo
Criterio de evaluacion rubrica y lista de cotejo
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

More from Guy De Pauw

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Guy De Pauw
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Guy De Pauw
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingGuy De Pauw
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageGuy De Pauw
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguageGuy De Pauw
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)Guy De Pauw
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Guy De Pauw
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusGuy De Pauw
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of SantomeGuy De Pauw
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Guy De Pauw
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTGuy De Pauw
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionGuy De Pauw
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishGuy De Pauw
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsGuy De Pauw
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersGuy De Pauw
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentGuy De Pauw
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersGuy De Pauw
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemGuy De Pauw
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemGuy De Pauw
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Guy De Pauw
 

More from Guy De Pauw (20)

Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...Technological Tools for Dictionary and Corpora Building for Minority Language...
Technological Tools for Dictionary and Corpora Building for Minority Language...
 
Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...Semi-automated extraction of morphological grammars for Nguni with special re...
Semi-automated extraction of morphological grammars for Nguni with special re...
 
Resource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech TaggingResource-Light Bantu Part-of-Speech Tagging
Resource-Light Bantu Part-of-Speech Tagging
 
Natural Language Processing for Amazigh Language
Natural Language Processing for Amazigh LanguageNatural Language Processing for Amazigh Language
Natural Language Processing for Amazigh Language
 
POS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik LanguagePOS Annotated 50m Corpus of Tajik Language
POS Annotated 50m Corpus of Tajik Language
 
The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)The Tagged Icelandic Corpus (MÍM)
The Tagged Icelandic Corpus (MÍM)
 
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
Describing Morphologically Rich Languages Using Metagrammars a Look at Verbs ...
 
Tagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News CorpusTagging and Verifying an Amharic News Corpus
Tagging and Verifying an Amharic News Corpus
 
A Corpus of Santome
A Corpus of SantomeA Corpus of Santome
A Corpus of Santome
 
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
Automatic Structuring and Correction Suggestion System for Hungarian Clinical...
 
Compiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFSTCompiling Apertium Dictionaries with HFST
Compiling Apertium Dictionaries with HFST
 
The Database of Modern Icelandic Inflection
The Database of Modern Icelandic InflectionThe Database of Modern Icelandic Inflection
The Database of Modern Icelandic Inflection
 
Issues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken IrishIssues in Designing a Corpus of Spoken Irish
Issues in Designing a Corpus of Spoken Irish
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Towards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound AnalysersTowards Standardizing Evaluation Test Sets for Compound Analysers
Towards Standardizing Evaluation Test Sets for Compound Analysers
 
The PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource DevelopmentThe PALDO Concept - New Paradigms for African Language Resource Development
The PALDO Concept - New Paradigms for African Language Resource Development
 
A System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá CharactersA System for the Recognition of Handwritten Yorùbá Characters
A System for the Recognition of Handwritten Yorùbá Characters
 
IFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation SystemIFE-MT: An English-to-Yorùbá Machine Translation System
IFE-MT: An English-to-Yorùbá Machine Translation System
 
A Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription SystemA Number to Yorùbá Text Transcription System
A Number to Yorùbá Text Transcription System
 
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
Bilingual Data Mining for the English-Amharic Statistical Machine Translation...
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Modeling Improved Syllabification Algorithm for Amharic

  • 1. AGIS’11 Action week for Information sharing Modeling Improved Syllabification Algorithm for Amharic Nirayo Hailu and Sebsbie Hailemariam (PhD.) nirayo2000@yahoo.com sebsibe2004@yahoo.com December 01, 2011 Addis Ababa, Ethiopia
  • 2. OUTLINE Introduction Amharic Syllable Structure & Syllabification Design of Syllabification model Experimental results & evaluation Conclusion & Recommendations Modeling Improved Amharic Syllabification Algorithm 2
  • 3. INTRODUCTION A syllable is a unit of sound composed of a central peak of sonority (usually a vowel), & the consonants that cluster around this central peak. Syllabification: is the task of segmenting words whether spoken or written into syllables Technically, the basic elements of the syllable are: Onset Rhyme (Nucleus + Coda) A syllable can be described by a series of grammars. consonant-vowel-consonant (CVC) sequence. onset, nucleus & coda (ONC) Modeling Improved Amharic Syllabification Algorithm 3
  • 4. INTRODUCTION Example: Syllable (σ) structure for the word (ብብብብ ) /bil-hat/ σ σ Onset rhyme Onset rhyme b nucleus coda h nucleus coda i l a t Importance of identification of syllables structures: Speech synthesis (in G2P module, prosody module, Synthesis module) Improve the synthesized speech intonation Speech recognition (pronunciations dictionary) To build recognizer which represents pronunciations in terms of syllables/phoneme rather than grapheme. Modeling Improved Amharic Syllabification Algorithm 4
  • 5. INTRDUCTION Amharic Language Amharic is a syllabic language in which every grapheme represent Consonant-Vowel assimilation. All the syllables are not uttered as expected. Amharic orthography did not show epenthetic vowel & geminated consonants. In this project we developed appropriate syllabification model for Amharic text. Modeling Improved Amharic Syllabification Algorithm 5
  • 6. AUTOMATIC SYLLABIFICATION Approaches Rule-based: Effectively embodies some theoretical position regarding the syllable Rules are used as gold standard. requires linguistic expert Implementing notions such as maximal onset principle & sonority hierarchy. . Example: sonority curve for the word ምምምም/milkit/ - Modeling Improved Amharic Syllabification Algorithm 6
  • 7. AUTOMATIC SYLLABIFICATION Approaches Data-driven: Infer new syllabifications from an evidence base on already- syllabified words (a dictionary or lexicon). Requires large training corpus to attain better performance. Examples: Look-up procedure Syllabification by analogy Decision tree-based syllabification Modeling Improved Amharic Syllabification Algorithm 7
  • 8. AUTOMATIC SYLLABIFICATION Related works Title Approach used Dataset used Accuracy Unit Selection Voice For Amharic Rule-based - - Using FESTVOX A Rule based Syllabification Rule-based 30,000 distinct words 99.95% Algorithm for Sinhala Automatic detection of syllable Rule-based speaker1: 653 words 97.77% boundaries in spontaneous speech speaker2:1,238 words (for French language) Automatic Word Stress Marking & Rule-based test 1: 1000 words 99.7% Syllabification for Catalan TTS test 2: 223 words 99.8% Automatic Syllabification for Danish Rule-based & test1: 1000 (randomly RB(96.9% Text-to-Speech Systems (ANN) selected) &98.7% ) test2: 1000 ANN(94.1 (from newspapers) %& 94.5%) A Syllabification Algorithm for Rule-based 316 words 98.4% Spanish Modeling Improved Amharic Syllabification Algorithm 8
  • 9. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION Syllable structure of Amharic words The main syllable templates: V , VC , VCC , CV, CVC, CVCC . Gemination & syllable structures Traditionally, it is represented either as /C:/ or /CC/ to indicate its length. Gemination happens when a spoken consonant is pronounced for an audibly longer period of time than a short consonant. Gemination occurs frequently Amharic words except for phoneme (ም )/h/ & (ም /ax/. ) Modeling Improved Amharic Syllabification Algorithm 9
  • 10. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION Syllable structure of Amharic words Gemination & syllable structures Example: ምምም /kift/ & with gemination /kiffitt/  /Kift/ /Kiffitt/ Modeling Improved Amharic Syllabification Algorithm 10
  • 11. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION Syllable structure of Amharic words Consonant clusters & their syllable structures In Amharic, the maximum number of allowable consonant sequences in a cluster is two. Onset cluster is not allowed in Amharic. Sonority hierarchy help us to deal with consonant clusters. Example: if stops & liquids appear together in cluster word finally, an epenthesis vowel is inserted. – Sonority of the final liquid is greater than that of the preceding phoneme. Modeling Improved Amharic Syllabification Algorithm 11
  • 12. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION Syllable structure of Amharic words Consonant clusters & Epenthesis Epenthesis: the process of inserting epenthetic vowel to split impermissible consonant clusters. General rules regarding epenthesis in Amharic: Word initially no consonant cluster CCC CCiC, in a word like /fendto/ /fendito/ C:C C:iC, in a word like /fellgo//felligo/ CC:CiC:, in a word like /sebrre//sebirre/ C:C: C:iC: Final position sonority hierarchy principle is applied Modeling Improved Amharic Syllabification Algorithm 12
  • 13. DESIGN OF SLLABIFICATION MODEL Syllabification Having gemination handling rules, syllabification rules, epenthesis rules & syllable templates of the language it is possible to syllabify (mark syllable boundaries) given the text. Modeling Improved Amharic Syllabification Algorithm 13
  • 14. DESIGN OF SLLABIFICATION MODEL Design of rule-based syllabification Model The over all architecture of automatic syllabification includes five modules: o Transliteration module o Gemination handling module (expert knowledge) o Epenthesis module o Syllabification module o Stress assignment module Modeling Improved Amharic Syllabification Algorithm 14
  • 15. GENERAL ARCHITECTURE FOR AUTOMATIC SYLLABIFICATION Amharic Text Transliteration Expert’s Gemination knowledge Epenthesis Consonant cluster Identification Geminated consonant identification Sonority Scale Epenthetic Vowel Insertion of phonemes & Epenthesis Rules Syllabification Syllable Consonant-Vowel parsing templates & Syllable template matching Syllabification rules Syllable boundary marking Stress Assignment Syllable weight Assignment Syllable Weight (Rules) Stress Marker Syllable & stress marked Phoneme sequence Modeling Improved Amharic Syllabification Algorithm 15
  • 16. DESIGN OF SYLLABIFICATION MODEL Proposed Epenthesis procedure 1. Accept input word & scan from left to right. 2. If consonant cluster occurs at word initial position, insert epenthetic vowel between them. Exception: If the first phoneme is consonant & the next consonant is glide /w/. (Rule #1) 3. If three consonants are appeared in sequence word medially or word final position, insert epenthetic vowel before the third consonant.( Rule #2). Exception: If the middle consonant sonority is greater than the rest insert epenthetic vowel after the first consonant in the cluster. 4. If a cluster of consonants contains the geminate & singleton in sequence, insert epenthetic vowel after the geminated consonants.( Rule #3) Modeling Improved Amharic Syllabification Algorithm 16
  • 17. DESIGN OF SYLLABIFICATION MODEL Proposed Epenthesis procedure 5. If a cluster of consonants contains the singleton & geminate in sequence, insert epenthetic vowel after the singleton consonants. (Rule #4) 6. If a cluster of consonants contains two different geminates in sequence, insert epenthetic vowel between the two geminate consonants. (Rule #5) 7. If the sonority of the final consonant is greater than that of the preceding consonant, the epenthetic vowel is inserted between the final consonant clusters. (Rule #6) 8. Repeat 2 through 7 until all the phonemes are parsed in the phonemes list. Modeling Improved Amharic Syllabification Algorithm 17
  • 18. DESIGN OF SYLLABIFICATION MODEL Proposed syllabification procedure 1. Accept the input from epenthesis algorithm & scan from left to right. 2. At word initial position if two vowels phonemes (VV) occurs in sequence, mark syllable boundary between them. 3. If the initial phoneme is vowel & the next two phonemes are consonant & vowels respectively; mark the syllable boundary just at the second 4. If (VCCV) pattern occurs at any position, mark syllable boundary between the two consonant clusters. 5. If (VCVC) pattern occurs at word initial position, mark syllable boundary before the second vowel. 6. If (CVV) type sequence occurs at any position, mark syllable boundary between the two vowels. Modeling Improved Amharic Syllabification Algorithm 18
  • 19. DESIGN OF SYLLABIFICATION MODEL Proposed syllabification procedure 7. If (CVCCV) phoneme sequence occurs at word initial position mark syllable boundary between the middle consonant clusters (CVC-CV). 8. If (CVCC) pattern occurs at word final position & if there is phoneme before the first consonant mark syllable boundary before the initial consonant in this pattern. 9. If (CVCV) pattern occurs at any position, mark syllable boundary after the vowels, but if it occurs at word final position the syllable boundary becomes CV - CV pattern. 10. If (CVC1C1VC or CVCCVC) pattern occurs in a word mark syllable boundary between the geminated consonants. (CVC1- C1VC). 11. If (VVCC) syllable pattern occurs at word final or initial position mark syllable boundary between the two vowels. 12. Repeat 2 throgh11 until all phonemes are parsed. Modeling Improved Amharic Syllabification Algorithm 19
  • 20. DESIGN OF SYLLAFICATION MODEL The algorithms were implemented using C# programming language Modeling Improved Amharic Syllabification Algorithm 20
  • 21. EXPERIMENTAL RESULTS & EVALUTATION The test corpus: Each word contains three to four syllables on average. The corpus contains a total of 3,099 syllables 779 consonant clusters including geminated consonants. Cluster Phonemes Position in a type Type word Total Number Percentage (%) Consonant #CC initial 256 45.47 Consonant CC# final 78 13.85 Consonant CCC medial or final 55 9.77 Gemination C:C medial or final 65 11.54 Gemination CC: medial or final 61 10.83 Gemination C:C: medial or final 48 8.53 Total 563 100 Modeling Improved Amharic Syllabification Algorithm 21
  • 22. EXPERIMENTAL RESULTS & EVALUTATION Epenthesis performance Epenthesis insertion performance Algorithm Prediction Insert Don’t insert Total Rate epenthesis epenthesis Expert decision Insert 528 17 545 69.97% epenthesis Don’t insert 2 232 234 30.03% epenthesis Total 530 249 779 100% # correct insertion correct neglection Accuracy 100% # total consonant clusters Modeling Improved Amharic Syllabification Algorithm 22
  • 23. EXPERIMENTAL RESULTS & EVALUTATION Syllabification Distribution of syllable templates over the test result Syllable Frequencies Percentage Pattern Word Word Total (%) Initial Word medial final V 36 0 2 38 1.27 CV 352 690 579 1621 53.59 VC 64 25 5 94 3.10 VCC 74 0 0 74 2.44 CVC 450 310 342 1102 36.43 CVCC 24 0 72 96 3.17 Total 1000 1025 1000 3025 100% Syllabification performance: evaluation by Amharic Linguist Expert shows an overall accuracy of the syllabifier 98.1%, Word accuracy 98.1% & the same figure tends to juncture accuracy. Modeling Improved Amharic Syllabification Algorithm 23
  • 24. EXPERIMENTAL RESULTS & EVALUTATION Syllabification errors Most of the errors occurred due to missed epenthesis vowel or wrongly inserted epenthetic vowel Problem #. Total # in the test Problem Descriptions result corpus 1 Words which have wrong epenthetic insertions 2 2 Words with neglected epenthetic vowel insertions 11 3 Syllabification problem from neglected epenthesis in CC sequence at word medial position 6 Total syllabification error 19 Modeling Improved Amharic Syllabification Algorithm 24
  • 25. CONCLUTION & RECEMMONDATIONS Conclusion Automatic syllabification algorithm in considering frequently occurring epithetic vowel /i/ and gemination Algorithm for the frequently occurring epenthetic vowel. . Result showed 98.1% word accuracy Rule-based syllabification & linguistic syllabification principles are important in implementing automatic syllabification & epenthesis. Modeling Improved Amharic Syllabification Algorithm 25
  • 26. CONCLUTION & RECOMMENDATIONS Recommendations the researchers in the area can use the algorithm in Amharic TTS & in Amharic ASR – the source code can be found in C++(GNU C++ compiler) and C#. Future works Gemination handling algorithm. Study on final consonant cluster to improve the performance of the syllabifier. Stress assignment algorithm. Performance investigation on both Amharic TTS & ASR. A comparison study using data-driven approaches Modeling Improved Amharic Syllabification Algorithm 26
  • 27. ብብብብብብ Thank you! ብ ! Modeling Improved Amharic Syllabification Algorithm 27