SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Progress on
Bangla Text-To-Speech System
Presented By:
Dr. M. Shahidur Rahman
Professor, Dept. of Computer Science & Engg.
Shahjalal University of Science & Technology
rahmanms@sust.edu
Outline
• Introduction to TTS
• How TTS works
• Present Bangla TTS systems
• Problems of the present Bangla TTS
• Directions to improve the performance of
Bangla TTS
• Discussion…
2
What is a TTS?
• The goal of text-to-speech (TTS) synthesis is to convert an
arbitrary input text into intelligible and natural sounding
speech
– TTS is not a “cut-and-paste” approach that strings together
isolated words
– Instead, TTS employs linguistic analysis to infer correct
pronunciation and prosody (i.e., NLP) and acoustic
representations of speech to generate waveforms (i.e.,
DSP)
3
TTS Applications
Applications:
 Services for the visually impaired community
 Services for the Illiterate people with difficulties in reading
 Enable use of Computers and IT services
 Reading email aloud
 Using Word processor
 Using Internet
Commercial TTS Systems:
 Festival
 Bell Labs TTS
4
How TTS Works
5
Different TTS Systems
Phoneme-Based TTS System
• Phonemes are:
– The minimal distinctive phonetic units
– Relatively small in number (39 phonemes in English)
• Disadvantage
– Phonemes ignore transitional sound !!!
6
Different TTS Systems (cont’d)
Diphone-Based TTS System:
 Diphones are:
– Made up of 2 phonemes
– Incorporate transitional sound
– Produce better sounding speech
– Ex. কক = ক + কঅ + অক + ক
Disadvantage:
• Over 1500 diphones in English language !!!
7
Text Pre-Processing
• Convert raw text, which may include numbers, abbreviations,
etc., into the equivalent of written-out words
8
Word to Diphone Converter
(Phonetization)
 Purpose
 Translate words to their diphone representations
(Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})
 mark the text into prosodic units such as phrases,
clauses and sentences
 Resource
– Dictionary of words and their diphones
9
Prosody
Diphone
Retrieval
ConcatenationAcoustic
Manipulation
Diphone
Database
Prosody
Param.
10
Properties of Speech
PeriodicNon-
Periodic
Non-
Periodic
eg. cat.wav
11
Altering Pitch/Duration/Amplitude
• For smooth concatenation, altering pitch,
duration and amplitude at the concatenation
point is very important.
12
Altering Pitch
Hanning
window
Original diphone Extracted
pitch period
Hanned
pitch period
X
=
13
PSOLA – Pitch Synchronous Overlap
and Add
=
50% Overlap + Add
Pitch Up > 50%
Pitch Down < 50%
14
Altering Duration
• Increase number of PSOLA iterations
(overlaps) to increase duration
• Decrease number of PSOLA iterations
(overlaps) to decrease duration
15
Altering Amplitude
 Multiplying the signal by a constant
 If constant > 1, amplitude increase
 If constant < 1, amplitude decrease
16
Concatenation
Diphones  Word
• Using PSOLA at the joining ends
• Ensures smooth transition
Words  Sentence
• Straight joining at the end points due to
presence of pauses
17
Putting All Together
TTS System
Text
Pre-processing Prosody Concatenation
words
18
Types of Concatenative speech
synthesis
• Concatenative synthesis with a fixed inventory
– contain one sample for each unit, and perform
prosodic modification to match the required
prosody
• Unit-selection-based synthesis
– store several instances of each unit, thus
improving the chances of finding a well-matched
unit
19
Progress of Bangla TTS
• KATHA
 Developed in BRAC university
 Unit based system using Festival framework
 4355 Diphones
 Takes 2 sec to generate a 10 sec utterance
• BANGLA VAANI
 syllable based synthesis system
 Developed in Kolkata
• SUBACHAN
 Developed by SUST people
 Diphone based synthesis system
 527 Diphones
 Takes 45ms to generate a 10 sec utterance
20
Speech Signal From Kotha and Subachan
• (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-
তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু
প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি
• (Voice of kotha) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
• (Voice of Subachan) জীবনানন্দ দাশ ববিংশ
শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব
21
Problems: Homograph Ambiguity
• Homographs are words that share the same spelling
but differ in meaning and pronunciation
22
Solution: Homograph Disambiguation
 Collect allpossible homograph words
 Determine POS tag of the homograph words
Ex. বছলেরামালেিে (bol) বেেলছ।
িু তম যালি তক িা িে (bolo)।
• Bayes Theorem can also be applied to determine the
likelihood of a word.
23
Problems: Improper Concatenation
24
Not concatenated
properly
Signal from the the
utterance of রাশেদ
Solution: Improper Concatenation
• PSOLA
• Reducing number of concatenation point
– Ex 1. Sentence-> কামাে ভাে বছলে।
Diphones-> কা + আমা + আে ভা+আলো বছ+এলে
Instead of ক + কআ +আম + মআ +আে + ে …
– Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী
• Vowel sound is periodic, thus suitable for
appropriate concatenation
• Use 1000 most frequently spoken word
25
Duration Modeling
26
Duration Modeling
27
Thank you all!
Suggestions??
28
Sound Synthesized by Katha
• Katha
29
Sound Synthesized by Subachan
• Subachan
30

Weitere ähnliche Inhalte

Andere mochten auch

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Sonali Jannat
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentationshahinmehr
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksSJones87
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaArabicOntology
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaMuhammad Haroon
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
BIODERMA
BIODERMABIODERMA
BIODERMAIeva_S
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseemDr. Aseem Sharma
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognitionMark Williams
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NETMandeep Cheema
 

Andere mochten auch (17)

Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla Arabic_Verb_Mizansus sorf o munshayib bangla
Arabic_Verb_Mizansus sorf o munshayib bangla
 
Voice To Text Presentation
Voice To Text PresentationVoice To Text Presentation
Voice To Text Presentation
 
Voice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinksVoice to text voice to sign with hyperlinks
Voice to text voice to sign with hyperlinks
 
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan YahyaTools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
Tools For Arabic People Names Processing And Retrieval - Ali Salhi & Adnan Yahya
 
Mp3englishreview
Mp3englishreviewMp3englishreview
Mp3englishreview
 
Vocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and KannadaVocabulary List in Arabic: Side-by-side with English and Kannada
Vocabulary List in Arabic: Side-by-side with English and Kannada
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
BIODERMA
BIODERMABIODERMA
BIODERMA
 
Bangla OCR
Bangla OCRBangla OCR
Bangla OCR
 
парки легені міст і сіл
парки   легені міст і сілпарки   легені міст і сіл
парки легені міст і сіл
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Psoriasis treatment by aseem
Psoriasis treatment by aseemPsoriasis treatment by aseem
Psoriasis treatment by aseem
 
Physics (NSC013)
Physics (NSC013)Physics (NSC013)
Physics (NSC013)
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
General principles of drug action
General principles of drug actionGeneral principles of drug action
General principles of drug action
 

Ähnlich wie Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能Amazon Web Services
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Amazon Web Services
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)Amazon Web Services
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!台灣資料科學年會
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsIJCI JOURNAL
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...ravi sharma
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalcaptainmactavish1996
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speechNikolay Karpov
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 

Ähnlich wie Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman (20)

Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能透過 Amazon Polly 為你的應用程式加入語音功能
透過 Amazon Polly 為你的應用程式加入語音功能
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
Announcing Amazon Polly - Turn Text into Lifelike Speech - December 2016 Mont...
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
G1803013542
G1803013542G1803013542
G1803013542
 
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
AWS re:Invent 2016: NEW LAUNCH! Introducing Amazon Polly (MAC204)
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!江振宇/It's Not What You Say: It's How You Say It!
江振宇/It's Not What You Say: It's How You Say It!
 
Direct Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete UnitsDirect Punjabi to English Speech Translation using Discrete Units
Direct Punjabi to English Speech Translation using Discrete Units
 
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
Comparative study of Text-to-Speech Synthesis for Indian Languages by using S...
 
Chapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrievalChapter 2: Text Operation in information stroage and retrieval
Chapter 2: Text Operation in information stroage and retrieval
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Translation
TranslationTranslation
Translation
 
Principal characteristics of speech
Principal characteristics of speechPrincipal characteristics of speech
Principal characteristics of speech
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Kürzlich hochgeladen

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Kürzlich hochgeladen (20)

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Progress on Bangla Text-To-Speech System by Dr. M. Shahidur Rahman

  • 1. Progress on Bangla Text-To-Speech System Presented By: Dr. M. Shahidur Rahman Professor, Dept. of Computer Science & Engg. Shahjalal University of Science & Technology rahmanms@sust.edu
  • 2. Outline • Introduction to TTS • How TTS works • Present Bangla TTS systems • Problems of the present Bangla TTS • Directions to improve the performance of Bangla TTS • Discussion… 2
  • 3. What is a TTS? • The goal of text-to-speech (TTS) synthesis is to convert an arbitrary input text into intelligible and natural sounding speech – TTS is not a “cut-and-paste” approach that strings together isolated words – Instead, TTS employs linguistic analysis to infer correct pronunciation and prosody (i.e., NLP) and acoustic representations of speech to generate waveforms (i.e., DSP) 3
  • 4. TTS Applications Applications:  Services for the visually impaired community  Services for the Illiterate people with difficulties in reading  Enable use of Computers and IT services  Reading email aloud  Using Word processor  Using Internet Commercial TTS Systems:  Festival  Bell Labs TTS 4
  • 6. Different TTS Systems Phoneme-Based TTS System • Phonemes are: – The minimal distinctive phonetic units – Relatively small in number (39 phonemes in English) • Disadvantage – Phonemes ignore transitional sound !!! 6
  • 7. Different TTS Systems (cont’d) Diphone-Based TTS System:  Diphones are: – Made up of 2 phonemes – Incorporate transitional sound – Produce better sounding speech – Ex. কক = ক + কঅ + অক + ক Disadvantage: • Over 1500 diphones in English language !!! 7
  • 8. Text Pre-Processing • Convert raw text, which may include numbers, abbreviations, etc., into the equivalent of written-out words 8
  • 9. Word to Diphone Converter (Phonetization)  Purpose  Translate words to their diphone representations (Ex. রাজা -> Diphones: {র + রআ + আজ + জআ})  mark the text into prosodic units such as phrases, clauses and sentences  Resource – Dictionary of words and their diphones 9
  • 12. Altering Pitch/Duration/Amplitude • For smooth concatenation, altering pitch, duration and amplitude at the concatenation point is very important. 12
  • 13. Altering Pitch Hanning window Original diphone Extracted pitch period Hanned pitch period X = 13
  • 14. PSOLA – Pitch Synchronous Overlap and Add = 50% Overlap + Add Pitch Up > 50% Pitch Down < 50% 14
  • 15. Altering Duration • Increase number of PSOLA iterations (overlaps) to increase duration • Decrease number of PSOLA iterations (overlaps) to decrease duration 15
  • 16. Altering Amplitude  Multiplying the signal by a constant  If constant > 1, amplitude increase  If constant < 1, amplitude decrease 16
  • 17. Concatenation Diphones  Word • Using PSOLA at the joining ends • Ensures smooth transition Words  Sentence • Straight joining at the end points due to presence of pauses 17
  • 18. Putting All Together TTS System Text Pre-processing Prosody Concatenation words 18
  • 19. Types of Concatenative speech synthesis • Concatenative synthesis with a fixed inventory – contain one sample for each unit, and perform prosodic modification to match the required prosody • Unit-selection-based synthesis – store several instances of each unit, thus improving the chances of finding a well-matched unit 19
  • 20. Progress of Bangla TTS • KATHA  Developed in BRAC university  Unit based system using Festival framework  4355 Diphones  Takes 2 sec to generate a 10 sec utterance • BANGLA VAANI  syllable based synthesis system  Developed in Kolkata • SUBACHAN  Developed by SUST people  Diphone based synthesis system  527 Diphones  Takes 45ms to generate a 10 sec utterance 20
  • 21. Speech Signal From Kotha and Subachan • (Voice of kotha) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ- তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of Subachan) তিতি প্রধািি কতি হলেও বিশ তকছু প্রিন্ধ-তিিন্ধ রচিা ও প্রকাশ কলরলছি • (Voice of kotha) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব • (Voice of Subachan) জীবনানন্দ দাশ ববিংশ শতাব্দীর অনযতম প্রধান আধুবনক বািংলা কবব 21
  • 22. Problems: Homograph Ambiguity • Homographs are words that share the same spelling but differ in meaning and pronunciation 22
  • 23. Solution: Homograph Disambiguation  Collect allpossible homograph words  Determine POS tag of the homograph words Ex. বছলেরামালেিে (bol) বেেলছ। িু তম যালি তক িা িে (bolo)। • Bayes Theorem can also be applied to determine the likelihood of a word. 23
  • 24. Problems: Improper Concatenation 24 Not concatenated properly Signal from the the utterance of রাশেদ
  • 25. Solution: Improper Concatenation • PSOLA • Reducing number of concatenation point – Ex 1. Sentence-> কামাে ভাে বছলে। Diphones-> কা + আমা + আে ভা+আলো বছ+এলে Instead of ক + কআ +আম + মআ +আে + ে … – Ex 2. ফলাাঃ পৃবিবী -> পৃ + ইবি + ইবী • Vowel sound is periodic, thus suitable for appropriate concatenation • Use 1000 most frequently spoken word 25
  • 29. Sound Synthesized by Katha • Katha 29
  • 30. Sound Synthesized by Subachan • Subachan 30