SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Comparative study of Text-to-Speech
Synthesis for Indian Languages by using
Syllable Approach
CLASS:M.E I COMPUTER
GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA
ROLL NO: 15311
CONTENT
• INTRODUCTION
• MOTIVATION
• LITERATURE SURVEY
• DATA TABLE
• SYSYEM ARCHITECTURE
• MATHEMATICAL MODEL
• ALGORITHM
• ADVANTAGES
• DISADVANTAGES
• APPLICATION
• CONCLUSION
INTRODUCTION
• Text to Speech Synthesis-
A system which takes as input a sequence of words and converts
them to speech
•Parts of Speech Synthesizers
Speech Synthesizers usually consist of two parts.
First Part- The first part has two major tasks.
• First it takes the raw text and converts things like numbers and
abbreviations into their written-out word equivalents. This process is
often called text normalization.
• Then it assigns phonetic transcriptions to each word, and divides and
marks the text into various linguistic units like phrases, clauses, and
sentences.
• Second Part- The other part, the back end, takes the symbolic
linguistic representation and converts it into actual sound output
Text-to-phoneme challenges
• Speech synthesis systems use two basic approaches to determine the
pronunciation of a word based on its spelling, a process which is often
called text-to-phoneme conversion.
Dictionary Based approach
• The simplest approach to text-to-phoneme conversion is the
dictionary-based approach, where a large dictionary containing all
the words of a language and their correct pronunciation is stored by
the program.
• Determining the correct pronunciation of each word is a matter of
looking up each word in the dictionary and replacing the spelling with
the pronunciation specified in the dictionary
Rule based approach
• The other approach used for text-to-phoneme conversion is the rule-
based approach, where rules for the pronunciations of words are
applied to words to work out their pronunciations based on their
spellings. This is similar to the "sounding out" approach to learning
reading.
• SYLLABLE RULES-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
1. Single vowel can act as a syllable. (I.e. V).
2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et .
3. Consonant efore o el is alled „O set‟. i.e. C*V
4. Consonant after o el is alled „Coda‟. i.e. V*C
Syllable Rules-
1. When asals su h as / ’/, half pro ou ed / / or / / sou d
succeed a vowel immediately, they would be treated as a part of
the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa
will be a part of syllable containing /sa/
2. When there are three or more consonants between two
consecutive vowels, the first consonant would be a part of the coda
of the previous syllable while the remaining consonants would be
onset of the next syllable .
Syllable Rules-
3. When there are exactly two consonants between two vowels, the first consonant
would be part of coda of previous syllable and the second would be onset of the
next syllable
4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the
consonants would be a part of onset of the next syllable
HMM synthesis
• A quite new technology is speech synthesis based on HMM, a
mathematical concept called Hidden Markov models.
• It is a statistical method where the text-to-speech system is based on
a model that is not known beforehand but it is refined by continuous
training.
• The technique consumes large CPU resources but very little memory.
• This approach seems to give a better prosody, without glitches, and
still produces very natural sounding, human-like speech
MOTIVATION
• There are 1652 languages in India
• Building a TTS system for each of them is time-consuming and
exhausting. Thus a more generic approach towards system building is
required. A common framework is first designed, using which
language- spe ifi systems are then built.
LITERATURE SURVEY
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
1.
An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity
2. Design and Development of
a Text-To-Speech Synthesizer
for Indian Languages
The design and
implementation of a unit
selection based text-to-
speech synthesizer with
syllables and polysyllables
as units of concatenation
improves synthesis
quality and it
reduces search
space improving the
synthesis timing.
it is not clear at the
time of writing, how
spectral
interpolation will be
performed at the
boundaries
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
3. Development of Speech
Database for Hindi Text-To-
Speech System Considering
Syllable as a Basic Unit
convert an orthographic
text into intelligible and
natural sounding speech
This technique
provides very high
quality speech
output which is
reasonably natural
and equivalent to
voice of the original
speaker.
before synthesizing
pre-processing of
text is required
4. Text-to-Speech Synthesis
using syllable-like units
the design of a syllable
based concatenative
waveform synthesizer for
Indian languages.
the automatic
segmentation
algorithm has in-
deed created a
useful speech unit
that has low target
and concatenation
costs.
current work uses a
single unique
syllable-like unit
from the repository
for synthesis.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
5. Statistical parametric speech
synthesis
generating acceptable
speech synthesis
a variety of speaking
styles or emotional
speech can be
synthesized
using the small
amount of speech
data.
quality of
synthesized speech
factors which
degrade the
Quality: vocoder,
modeling accuracy,
and over-
smoothing.
6. Unit selection in a
concatenative speech
synthesis system using a
large speech database
the generation of natural-
sounding synthesized
speech waveforms
produce more
natural speech
there is little
difference in the
quality of out- put
using the two
training method
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
7. An Unit Selection based
Hindi Text To Speech
Synthesis System Using
Syllable as a Basic Unit
quality of this system is the
improved naturalness in the
synthesized speech and
gives very high quality
speech output when
compared to other
synthesizing techniques
An important
advantage of this
approach leads to
reduced prosody
mismatch and
spectral
discontinuity that
occurs during
syllable
concatenation.
Large concatenation
points. This large
concatenation
results in glitch at
the output which is
hard to eliminate
prosody mismatch
and spectral
discontinuity.
SR.
NO
PAPER TITLE Aim of the Paper Advantages Disadvantages
8. A Common Attribute based
Unified HTS framework for
Speech
Synthesis in Indian
Languages
high-quality synthetic
speech
concatenates pre-
recorded speech units
in
the database such that
the target and
concatenation costs
are minimized.
to obtain high-
quality synthetic
speech, the size of
the database
required is large, to
ensure that
sufficient examples
for each unit in
every
possible context is
available
DATA TABLE
TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores
Target Language Marathi Bengali Tamil Tamil Telugu Malayalam
Source Language Hindi Hindi Tamil Hindi Tamil Tamil
Numbers of hours of
target language
3 2 3 3 3 3
DMOS 2.79 2.50 2.97 2.53 2.63 2.88
WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
SYSTEM ARCHITECTURE
Fig.2.Training and Synthesis phases of HMM-based speech synthesis
MATHEMATICAL MODEL
Let I = Set of Language
I = {T, S}
Where,
T is the text which is input and
S is the sound is output.
D (I) = arg max p(o/w, lambda)
Where,
Lambda represents the model parameters
o represents speech parameters and
w is the transcription of the test sentence
Syllable Rules-
Syllable is a cluster of consonants and vowel
Syllable should contain one vowel and any number of consonants.
Single vowel can act as a syllable. (I.e. V).
V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc.
Consonant before vowel is called „Onset‟. i.e.(C*V)
Consonant after vowel is called „Coda‟. i.e.(V*C)
Output = Pk
Where D(I) = dictionary Fuction
Pk is Phonetics
ALGORITHM
• PARAMETER GENERATION ALGORITHM
• DELAY BASED SEGMENTATION ALGORITHM
ADVANTAGES
• For people wanting to learn a new language
• For educational institutions looking to enhance student learning,
recall and comprehension
• For people wanting to learn through multiple mediums to solidify
learning
• For people with physical disabilities
• Difficulty handling a book or paper
• Visual Issue (Difficulty seeing text)
DISADVANTAGES
• Despite large improvements, Speech Synthesis can still sound a little
unnatural.
• The approaches to Speech Synthesis that yield the most natural
speech need considerable resources in terms of data storage and
processing power.
• pronunciation analysis from written text is also a major problem
APPLICATION
• Systems that provide voice synthesis output for blind users are
generally referred to as screen readers.
• Applications for the Blind
• Applications for the Deafened and Vocally Handicapped
• Educational Applications
CONCLUSION
This paper explores syllable approach to building language independent
text to speech systems for Indian Languages. The use of common
phone set, common question set and borrowing context-independent
monophone models along with syllable approach across languages
makes the procedure easier and less time-consuming, without
compromising the synthesized speech quality. Systems can be built
without even knowing the language. This is especially quite beneficial
in the Indian scenario.
REFERENCES
• [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a
large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1,
1996, pp. 373–376.
• [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h
Communication, vol. 51, no. 3, pp. 1039–1064, November 2009.
• [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone,
a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s,
Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032.
• [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g
data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al
Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier,
First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i
A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. –
824.
• [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian
la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e
and Engg., Madras, India, 1991
THANK YOU…!!!

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computationprasadmvreddy
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal languageRabia Khalid
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Looping statement in vb.net
Looping statement in vb.netLooping statement in vb.net
Looping statement in vb.netilakkiya
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaEdureka!
 
Aspects of connected speech1
Aspects of connected speech1Aspects of connected speech1
Aspects of connected speech1Imana amini
 
Operating System Lab Manual
Operating System Lab ManualOperating System Lab Manual
Operating System Lab ManualBilal Mirza
 
Handout listening 2
Handout  listening 2Handout  listening 2
Handout listening 2JOSEPHINE SU
 
Closure properties of context free grammar
Closure properties of context free grammarClosure properties of context free grammar
Closure properties of context free grammarAfshanKhan51
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear TransformationsCeni Babaoglu, PhD
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsShiraz316
 

Was ist angesagt? (20)

Introduction to the theory of computation
Introduction to the theory of computationIntroduction to the theory of computation
Introduction to the theory of computation
 
Theory of automata and formal language
Theory of automata and formal languageTheory of automata and formal language
Theory of automata and formal language
 
Skip gram and cbow
Skip gram and cbowSkip gram and cbow
Skip gram and cbow
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Macro
MacroMacro
Macro
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Looping statement in vb.net
Looping statement in vb.netLooping statement in vb.net
Looping statement in vb.net
 
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
Tutorial on end-to-end text-to-speech synthesis: Part 2 – Tactron and related...
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Compiler construction
Compiler constructionCompiler construction
Compiler construction
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Aspects of connected speech1
Aspects of connected speech1Aspects of connected speech1
Aspects of connected speech1
 
Operating System Lab Manual
Operating System Lab ManualOperating System Lab Manual
Operating System Lab Manual
 
Handout listening 2
Handout  listening 2Handout  listening 2
Handout listening 2
 
Closure properties of context free grammar
Closure properties of context free grammarClosure properties of context free grammar
Closure properties of context free grammar
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 

Ähnlich wie Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorWaqas Tariq
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognitionAditya Kumar Khare
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...rahulmonikasharma
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET Journal
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...IJECEIAES
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicijnlc
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technologyKalluri Madhuri
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesEditor IJCATR
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...Syeful Islam
 

Ähnlich wie Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach (20)

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Ey4301913917
Ey4301913917Ey4301913917
Ey4301913917
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Sequence to sequence model speech recognition
Sequence to sequence model speech recognitionSequence to sequence model speech recognition
Sequence to sequence model speech recognition
 
FYPReport
FYPReportFYPReport
FYPReport
 
On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...On Developing an Automatic Speech Recognition System for Commonly used Englis...
On Developing an Automatic Speech Recognition System for Commonly used Englis...
 
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival FrameworkIRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
IRJET- Text to Speech Synthesis for Hindi Language using Festival Framework
 
English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...English speaking proficiency assessment using speech and electroencephalograp...
English speaking proficiency assessment using speech and electroencephalograp...
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
An expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabicAn expert system for automatic reading of a text written in standard arabic
An expert system for automatic reading of a text written in standard arabic
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Permasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdfPermasalahan penyerta Stuttering.pdf
Permasalahan penyerta Stuttering.pdf
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class PhonemesAutomatic Speech Recognition of Malayalam Language Nasal Class Phonemes
Automatic Speech Recognition of Malayalam Language Nasal Class Phonemes
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 

Kürzlich hochgeladen

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Kürzlich hochgeladen (20)

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 

Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach

  • 1. Comparative study of Text-to-Speech Synthesis for Indian Languages by using Syllable Approach CLASS:M.E I COMPUTER GUIDED BY : PROF. ASHISH MANWATKAR PRESENTED BY : RAVI SHARMA ROLL NO: 15311
  • 2. CONTENT • INTRODUCTION • MOTIVATION • LITERATURE SURVEY • DATA TABLE • SYSYEM ARCHITECTURE • MATHEMATICAL MODEL • ALGORITHM • ADVANTAGES • DISADVANTAGES • APPLICATION • CONCLUSION
  • 3. INTRODUCTION • Text to Speech Synthesis- A system which takes as input a sequence of words and converts them to speech
  • 4. •Parts of Speech Synthesizers Speech Synthesizers usually consist of two parts. First Part- The first part has two major tasks. • First it takes the raw text and converts things like numbers and abbreviations into their written-out word equivalents. This process is often called text normalization. • Then it assigns phonetic transcriptions to each word, and divides and marks the text into various linguistic units like phrases, clauses, and sentences.
  • 5. • Second Part- The other part, the back end, takes the symbolic linguistic representation and converts it into actual sound output
  • 6. Text-to-phoneme challenges • Speech synthesis systems use two basic approaches to determine the pronunciation of a word based on its spelling, a process which is often called text-to-phoneme conversion.
  • 7. Dictionary Based approach • The simplest approach to text-to-phoneme conversion is the dictionary-based approach, where a large dictionary containing all the words of a language and their correct pronunciation is stored by the program. • Determining the correct pronunciation of each word is a matter of looking up each word in the dictionary and replacing the spelling with the pronunciation specified in the dictionary
  • 8. Rule based approach • The other approach used for text-to-phoneme conversion is the rule- based approach, where rules for the pronunciations of words are applied to words to work out their pronunciations based on their spellings. This is similar to the "sounding out" approach to learning reading.
  • 9. • SYLLABLE RULES- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. 1. Single vowel can act as a syllable. (I.e. V). 2. V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……et . 3. Consonant efore o el is alled „O set‟. i.e. C*V 4. Consonant after o el is alled „Coda‟. i.e. V*C
  • 10. Syllable Rules- 1. When asals su h as / ’/, half pro ou ed / / or / / sou d succeed a vowel immediately, they would be treated as a part of the o el a d also the sa e s lla le. For e a ple, / ’/ i sa ’sthaa will be a part of syllable containing /sa/ 2. When there are three or more consonants between two consecutive vowels, the first consonant would be a part of the coda of the previous syllable while the remaining consonants would be onset of the next syllable .
  • 11. Syllable Rules- 3. When there are exactly two consonants between two vowels, the first consonant would be part of coda of previous syllable and the second would be onset of the next syllable 4. When the second consonant is a member of the set {/r/ /s/ /sh/ /shh/}, both the consonants would be a part of onset of the next syllable
  • 12. HMM synthesis • A quite new technology is speech synthesis based on HMM, a mathematical concept called Hidden Markov models. • It is a statistical method where the text-to-speech system is based on a model that is not known beforehand but it is refined by continuous training. • The technique consumes large CPU resources but very little memory. • This approach seems to give a better prosody, without glitches, and still produces very natural sounding, human-like speech
  • 13.
  • 14. MOTIVATION • There are 1652 languages in India • Building a TTS system for each of them is time-consuming and exhausting. Thus a more generic approach towards system building is required. A common framework is first designed, using which language- spe ifi systems are then built.
  • 15. LITERATURE SURVEY SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 1. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity 2. Design and Development of a Text-To-Speech Synthesizer for Indian Languages The design and implementation of a unit selection based text-to- speech synthesizer with syllables and polysyllables as units of concatenation improves synthesis quality and it reduces search space improving the synthesis timing. it is not clear at the time of writing, how spectral interpolation will be performed at the boundaries
  • 16. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 3. Development of Speech Database for Hindi Text-To- Speech System Considering Syllable as a Basic Unit convert an orthographic text into intelligible and natural sounding speech This technique provides very high quality speech output which is reasonably natural and equivalent to voice of the original speaker. before synthesizing pre-processing of text is required 4. Text-to-Speech Synthesis using syllable-like units the design of a syllable based concatenative waveform synthesizer for Indian languages. the automatic segmentation algorithm has in- deed created a useful speech unit that has low target and concatenation costs. current work uses a single unique syllable-like unit from the repository for synthesis.
  • 17. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 5. Statistical parametric speech synthesis generating acceptable speech synthesis a variety of speaking styles or emotional speech can be synthesized using the small amount of speech data. quality of synthesized speech factors which degrade the Quality: vocoder, modeling accuracy, and over- smoothing. 6. Unit selection in a concatenative speech synthesis system using a large speech database the generation of natural- sounding synthesized speech waveforms produce more natural speech there is little difference in the quality of out- put using the two training method
  • 18. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 7. An Unit Selection based Hindi Text To Speech Synthesis System Using Syllable as a Basic Unit quality of this system is the improved naturalness in the synthesized speech and gives very high quality speech output when compared to other synthesizing techniques An important advantage of this approach leads to reduced prosody mismatch and spectral discontinuity that occurs during syllable concatenation. Large concatenation points. This large concatenation results in glitch at the output which is hard to eliminate prosody mismatch and spectral discontinuity.
  • 19. SR. NO PAPER TITLE Aim of the Paper Advantages Disadvantages 8. A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages high-quality synthetic speech concatenates pre- recorded speech units in the database such that the target and concatenation costs are minimized. to obtain high- quality synthetic speech, the size of the database required is large, to ensure that sufficient examples for each unit in every possible context is available
  • 20. DATA TABLE TABLE I: Degradation MOS (DMOS) and Word error rate (WER) scores Target Language Marathi Bengali Tamil Tamil Telugu Malayalam Source Language Hindi Hindi Tamil Hindi Tamil Tamil Numbers of hours of target language 3 2 3 3 3 3 DMOS 2.79 2.50 2.97 2.53 2.63 2.88 WER 3.48% 15.06% 6.61% 5.16% 16.14% 3.13%
  • 21. SYSTEM ARCHITECTURE Fig.2.Training and Synthesis phases of HMM-based speech synthesis
  • 22. MATHEMATICAL MODEL Let I = Set of Language I = {T, S} Where, T is the text which is input and S is the sound is output. D (I) = arg max p(o/w, lambda) Where, Lambda represents the model parameters o represents speech parameters and w is the transcription of the test sentence
  • 23. Syllable Rules- Syllable is a cluster of consonants and vowel Syllable should contain one vowel and any number of consonants. Single vowel can act as a syllable. (I.e. V). V, C*V, V*C, C*V*C, C*C*V, C*C*C*V*C*C*C……etc. Consonant before vowel is called „Onset‟. i.e.(C*V) Consonant after vowel is called „Coda‟. i.e.(V*C) Output = Pk Where D(I) = dictionary Fuction Pk is Phonetics
  • 24. ALGORITHM • PARAMETER GENERATION ALGORITHM • DELAY BASED SEGMENTATION ALGORITHM
  • 25. ADVANTAGES • For people wanting to learn a new language • For educational institutions looking to enhance student learning, recall and comprehension • For people wanting to learn through multiple mediums to solidify learning • For people with physical disabilities • Difficulty handling a book or paper • Visual Issue (Difficulty seeing text)
  • 26. DISADVANTAGES • Despite large improvements, Speech Synthesis can still sound a little unnatural. • The approaches to Speech Synthesis that yield the most natural speech need considerable resources in terms of data storage and processing power. • pronunciation analysis from written text is also a major problem
  • 27. APPLICATION • Systems that provide voice synthesis output for blind users are generally referred to as screen readers. • Applications for the Blind • Applications for the Deafened and Vocally Handicapped • Educational Applications
  • 28. CONCLUSION This paper explores syllable approach to building language independent text to speech systems for Indian Languages. The use of common phone set, common question set and borrowing context-independent monophone models along with syllable approach across languages makes the procedure easier and less time-consuming, without compromising the synthesized speech quality. Systems can be built without even knowing the language. This is especially quite beneficial in the Indian scenario.
  • 29. REFERENCES • [ ] A. J. Hu t a d A. W. Bla k, U it sele tio i a concatenative speech synthesis system using a large spee h data ase, i A ousti s, Spee h, a d Sig al Pro essi g, ICASSP-96), vol. 1, 1996, pp. 373–376. • [2] H. Zen, K. Tokuda, a d A. W. Bla k, Statisti al para etri spee h s thesis, Spee h Communication, vol. 51, no. 3, pp. 1039–1064, November 2009. • [3] A. Beyerlein, W. Byrne, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone, a d W. Wa g, To ards la guage i depe de t a ousti odeli g, i Pro eedi g o A ousti s, Speech, and Signal Processing (ICASSP), vol. 2, 2000, pp. 1029–1032. • [4] R. Bayeh, S. Lin, G. Chollet, and C. Mokbel, To ards ultili gual spee h re og itio usi g data dri e sour e/target a ousti al u its asso iatio , i A ousti s, Spee h, a d Sig al Processing, 2004. Pro- ceedings ICASSP ’ , ol. , , pp. I–521–4. [5] V. B. Le and L. Besacier, First steps i fast a ousti odeli g for a e target la guage: Appli atio to Viet a ese, i A ousti s, Spee h, a d Sig al Pro essi g, . Pro eedi gs ICASSP ’ , ol. , , pp. – 824. • [5] P. Eswar, A rule ased approa h for spotti g hara ters fro contin- uous speech in Indian la guages, PhD Dissertatio , I dia I stitute of Te h olog , Depart e t of Co puter S ie e and Engg., Madras, India, 1991