3. INTRODUCTION
A syllable is a unit of sound composed of a central peak of
sonority (usually a vowel), & the consonants that cluster around
this central peak.
Syllabification: is the task of segmenting words whether
spoken or written into syllables
Technically, the basic elements of the syllable are:
Onset
Rhyme (Nucleus + Coda)
A syllable can be described by a series of grammars.
consonant-vowel-consonant (CVC) sequence.
onset, nucleus & coda (ONC)
Modeling Improved Amharic Syllabification Algorithm 3
4. INTRODUCTION
Example: Syllable (σ) structure for the word (ብብብብ
) /bil-hat/
σ σ
Onset rhyme Onset rhyme
b nucleus coda h nucleus coda
i l a t
Importance of identification of syllables structures:
Speech synthesis (in G2P module, prosody module, Synthesis
module)
Improve the synthesized speech intonation
Speech recognition (pronunciations dictionary)
To build recognizer which represents pronunciations in
terms of syllables/phoneme rather than grapheme.
Modeling Improved Amharic Syllabification Algorithm 4
5. INTRDUCTION
Amharic Language
Amharic is a syllabic language in which every grapheme
represent Consonant-Vowel assimilation.
All the syllables are not uttered as expected.
Amharic orthography did not show epenthetic vowel &
geminated consonants.
In this project we developed appropriate syllabification model
for Amharic text.
Modeling Improved Amharic Syllabification Algorithm 5
6. AUTOMATIC SYLLABIFICATION
Approaches
Rule-based:
Effectively embodies some theoretical position regarding the
syllable
Rules are used as gold standard.
requires linguistic expert
Implementing notions such as maximal onset principle & sonority
hierarchy.
. Example: sonority curve for the word ምምምም/milkit/
-
Modeling Improved Amharic Syllabification Algorithm 6
7. AUTOMATIC SYLLABIFICATION
Approaches
Data-driven:
Infer new syllabifications from an evidence base on already-
syllabified words (a dictionary or lexicon).
Requires large training corpus to attain better performance.
Examples:
Look-up procedure
Syllabification by analogy
Decision tree-based syllabification
Modeling Improved Amharic Syllabification Algorithm 7
8. AUTOMATIC SYLLABIFICATION
Related works
Title Approach used Dataset used Accuracy
Unit Selection Voice For Amharic Rule-based - -
Using FESTVOX
A Rule based Syllabification Rule-based 30,000 distinct words 99.95%
Algorithm for Sinhala
Automatic detection of syllable Rule-based speaker1: 653 words 97.77%
boundaries in spontaneous speech speaker2:1,238 words
(for French language)
Automatic Word Stress Marking & Rule-based test 1: 1000 words 99.7%
Syllabification for Catalan TTS test 2: 223 words 99.8%
Automatic Syllabification for Danish Rule-based & test1: 1000 (randomly RB(96.9%
Text-to-Speech Systems (ANN) selected) &98.7% )
test2: 1000 ANN(94.1
(from newspapers) %&
94.5%)
A Syllabification Algorithm for Rule-based 316 words 98.4%
Spanish
Modeling Improved Amharic Syllabification Algorithm 8
9. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
The main syllable templates: V , VC , VCC , CV, CVC, CVCC .
Gemination & syllable structures
Traditionally, it is represented either as /C:/ or /CC/ to
indicate its length.
Gemination happens when a spoken consonant is pronounced
for an audibly longer period of time than a short consonant.
Gemination occurs frequently Amharic words except for
phoneme (ም )/h/ & (ም /ax/.
)
Modeling Improved Amharic Syllabification Algorithm 9
11. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
Consonant clusters & their syllable structures
In Amharic, the maximum number of allowable consonant
sequences in a cluster is two.
Onset cluster is not allowed in Amharic.
Sonority hierarchy help us to deal with consonant clusters.
Example: if stops & liquids appear together in cluster word
finally, an epenthesis vowel is inserted.
– Sonority of the final liquid is greater than that of the preceding phoneme.
Modeling Improved Amharic Syllabification Algorithm 11
12. AMHARIC SYLLABLE STRUCTURE & SYLLABIFICATION
Syllable structure of Amharic words
Consonant clusters & Epenthesis
Epenthesis: the process of inserting epenthetic vowel to split
impermissible consonant clusters.
General rules regarding epenthesis in Amharic:
Word initially no consonant cluster
CCC CCiC, in a word like /fendto/ /fendito/
C:C C:iC, in a word like /fellgo//felligo/
CC:CiC:, in a word like /sebrre//sebirre/
C:C: C:iC:
Final position sonority hierarchy principle is applied
Modeling Improved Amharic Syllabification Algorithm 12
13. DESIGN OF SLLABIFICATION MODEL
Syllabification
Having gemination handling rules, syllabification rules,
epenthesis rules & syllable templates of the language it is
possible to syllabify (mark syllable boundaries) given the text.
Modeling Improved Amharic Syllabification Algorithm 13
14. DESIGN OF SLLABIFICATION MODEL
Design of rule-based syllabification Model
The over all architecture of automatic syllabification includes five
modules:
o Transliteration module
o Gemination handling module (expert knowledge)
o Epenthesis module
o Syllabification module
o Stress assignment module
Modeling Improved Amharic Syllabification Algorithm 14
16. DESIGN OF SYLLABIFICATION MODEL
Proposed Epenthesis procedure
1. Accept input word & scan from left to right.
2. If consonant cluster occurs at word initial position, insert epenthetic
vowel between them.
Exception: If the first phoneme is consonant & the next consonant
is glide /w/.
(Rule #1)
3. If three consonants are appeared in sequence word medially or word
final position, insert epenthetic vowel before the third consonant.( Rule
#2).
Exception: If the middle consonant sonority is greater than the rest
insert epenthetic vowel after the first consonant in the cluster.
4. If a cluster of consonants contains the geminate & singleton in
sequence, insert epenthetic vowel after the geminated consonants.( Rule
#3)
Modeling Improved Amharic Syllabification Algorithm 16
17. DESIGN OF SYLLABIFICATION MODEL
Proposed Epenthesis procedure
5. If a cluster of consonants contains the singleton & geminate in
sequence, insert epenthetic vowel after the singleton consonants. (Rule
#4)
6. If a cluster of consonants contains two different geminates in
sequence, insert epenthetic vowel between the two geminate consonants.
(Rule #5)
7. If the sonority of the final consonant is greater than that of the
preceding consonant, the epenthetic vowel is inserted between the final
consonant clusters. (Rule #6)
8. Repeat 2 through 7 until all the phonemes are parsed in the phonemes
list.
Modeling Improved Amharic Syllabification Algorithm 17
18. DESIGN OF SYLLABIFICATION MODEL
Proposed syllabification procedure
1. Accept the input from epenthesis algorithm & scan from left to right.
2. At word initial position if two vowels phonemes (VV) occurs in
sequence, mark syllable boundary between them.
3. If the initial phoneme is vowel & the next two phonemes are consonant
& vowels respectively; mark the syllable boundary just at the second
4. If (VCCV) pattern occurs at any position, mark syllable boundary
between the two consonant clusters.
5. If (VCVC) pattern occurs at word initial position, mark syllable
boundary before the second vowel.
6. If (CVV) type sequence occurs at any position, mark syllable boundary
between the two vowels.
Modeling Improved Amharic Syllabification Algorithm 18
19. DESIGN OF SYLLABIFICATION MODEL
Proposed syllabification procedure
7. If (CVCCV) phoneme sequence occurs at word initial position mark
syllable boundary between the middle consonant clusters (CVC-CV).
8. If (CVCC) pattern occurs at word final position & if there is phoneme
before the first consonant mark syllable boundary before the initial
consonant in this pattern.
9. If (CVCV) pattern occurs at any position, mark syllable boundary after
the vowels, but if it occurs at word final position the syllable boundary
becomes CV - CV pattern.
10. If (CVC1C1VC or CVCCVC) pattern occurs in a word mark syllable
boundary between the geminated consonants. (CVC1- C1VC).
11. If (VVCC) syllable pattern occurs at word final or initial position
mark syllable boundary between the two vowels.
12. Repeat 2 throgh11 until all phonemes are parsed.
Modeling Improved Amharic Syllabification Algorithm 19
20. DESIGN OF SYLLAFICATION MODEL
The algorithms were implemented using C# programming language
Modeling Improved Amharic Syllabification Algorithm 20
21. EXPERIMENTAL RESULTS & EVALUTATION
The test corpus:
Each word contains three to four syllables on average.
The corpus contains a total of 3,099 syllables
779 consonant clusters including geminated consonants.
Cluster
Phonemes Position in a
type
Type word Total Number Percentage (%)
Consonant #CC initial 256 45.47
Consonant CC# final 78 13.85
Consonant CCC medial or final 55 9.77
Gemination C:C medial or final 65 11.54
Gemination CC: medial or final 61 10.83
Gemination C:C: medial or final 48 8.53
Total 563 100
Modeling Improved Amharic Syllabification Algorithm 21
23. EXPERIMENTAL RESULTS & EVALUTATION
Syllabification
Distribution of syllable templates over the test result
Syllable Frequencies Percentage
Pattern Word Word Total (%)
Initial Word medial final
V 36 0 2 38 1.27
CV 352 690 579 1621 53.59
VC 64 25 5 94 3.10
VCC 74 0 0 74 2.44
CVC 450 310 342 1102 36.43
CVCC 24 0 72 96 3.17
Total 1000 1025 1000 3025 100%
Syllabification performance: evaluation by Amharic Linguist
Expert shows an overall accuracy of the syllabifier 98.1%,
Word accuracy 98.1% & the same figure tends to juncture
accuracy.
Modeling Improved Amharic Syllabification Algorithm 23
24. EXPERIMENTAL RESULTS & EVALUTATION
Syllabification errors
Most of the errors occurred due to missed epenthesis vowel or wrongly
inserted epenthetic vowel
Problem #.
Total # in the test
Problem Descriptions result corpus
1 Words which have wrong epenthetic insertions 2
2
Words with neglected epenthetic vowel insertions 11
3
Syllabification problem from neglected epenthesis in
CC sequence at word medial position 6
Total syllabification error 19
Modeling Improved Amharic Syllabification Algorithm 24
25. CONCLUTION & RECEMMONDATIONS
Conclusion
Automatic syllabification algorithm in considering frequently
occurring epithetic vowel /i/ and gemination
Algorithm for the frequently occurring epenthetic vowel.
.
Result showed 98.1% word accuracy
Rule-based syllabification & linguistic syllabification principles
are important in implementing automatic syllabification &
epenthesis.
Modeling Improved Amharic Syllabification Algorithm 25
26. CONCLUTION & RECOMMENDATIONS
Recommendations
the researchers in the area can use the algorithm in Amharic TTS & in
Amharic ASR – the source code can be found in C++(GNU C++
compiler) and C#.
Future works
Gemination handling algorithm.
Study on final consonant cluster to improve the performance
of the syllabifier.
Stress assignment algorithm.
Performance investigation on both Amharic TTS & ASR.
A comparison study using data-driven approaches
Modeling Improved Amharic Syllabification Algorithm 26