SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
2015©Shinnosuke TAKAMICHI
09/19/2015
Prosody-Controllable HMM-Based
Speech Synthesis Using Speech Input
Yuri Nishigaki, Shinnosuke Takamichi, Tomoki Toda,
Graham Neubig, Sakriani Sakti, Satoshi Nakamura (NAIST)
MLSLP2015 in Aizu Univ.
/17
Speech-based creative activities
and HMM-based speech synthesis
2
Singing voice Speech
Advertisement Live concert Narration Next?
Video avatar
Voice actor


Useful method: HMM-based speech synthesis [Tokuda et al., 2013.]
“Synthesize!”
Synthetic speech parameters
text speech
/17
Manual control of synthetic speech
Laugh
Sad
Regression
Multi-Regression HMM [Nose et al., 2007.]
Manually manipulating HMM parameters
User
User
They are very useful, but difficult to control as the user wants.
/17
Motivation of this study
 Functions we want
– Original capability of HMM-based TTS
– Speech-based control
‱ Intuitive to control
‱ Make synthetic speech mimic input speech prosody
 Our work
– Speech synthesis having both functions
4
Synthesize
System
Synthesize“Synthesize.”
MR-HMM etc.
Similar to VOCALISTENER
for singing voice control
/17
Overview of the proposed system
(Only text is input.)
5
Input text
Text analysis
Waveform generation
Synthetic speech
Parameter
generation
Synthesis
HMM
Original HMM-based
speech synthesis
/17
Overview of the proposed system
(Text & speech are input.)
6
Input textInput speech
Speech analysis Text analysis
Waveform generation
Synthetic speech
F0
modification
Duration
extraction
Parameter
generation
Alignment
HMM
Synthesis
HMM
/17
Duration extraction module
7
Alignment
HMM
Synthesis
HMM
Feature of
input speech
Context of
Input text
HMM
alignment
Duration
generation
State duration of
synthetic speech
Parm. Gen.
Duration of input speech
/17
Alignment accuracy & duration unit
 How to build alignment HMMs suitable for input speech?
– → The use of pre-recorded speech uttered by users
– Large amounts → user-dependent HMMs
– Small amounts → HMMs adapted from original alignment HMMs
 How to map the input speech duration to synthetic speech?
– Alignment/synthesis HMM-states represent different speech segments.
– Which is better, HMM-state, phone, or mora-level duration unit?
8
/17
Speech parameter generation module
9
Synthesis
HMM
Context of
Input text
Parameter
generation
Spectrum of
synthetic speech
F0 generated
From HMMs
Dur. ext.
State duration
F0 mod. Wav. Gen.
/17
F0 modification module
10
Feature of
input speech
F0 generated
from HMMs
F0
conversion
U/V region
modification
Parm. gen.
F0 of
synthetic speech
Wav. Gen.
/17
F0 conversion &
unvoiced/voiced modification
11
F0
Time
Reference
generated from HMMs
Input speech
F0-converted
U/V-modified
 F0 conversion fixes F0 range of input speech to fit to reference.
 U/V modification fixes the U/V region of input speech to fit to reference.
Linear
conversion
Spline
interpolation
EXPERIMENTAL EVALUATION
12
/17
Experimental Setup
13
Content Value/Setting
User 4 Japanese speakers (2 male & 2 female)
Target speaker 1 Japanese female speaker
Training data of
synthesis HMMs
450 phoneme-balanced sentences,
16 kHz-sampled, 5 ms shift, reading style
Evaluation data 53 phoneme-balanced sentences
Speech features 25-dim. mel-cestrum, log F0, 5-band aperiodicity
Speech analyzer STRAIGHT [Kawahara et al., 1999.]
Text analyzer Open-jtalk
Acoustic model 5-state HSMM [Zen et al., 2007.]
 1. duration unit & alignment HMM adaptation
 2. synthesis HMM adaptation
 3. effect of U/V modification
/17
Evaluation 1: duration unit &
alignment HMM adaptation
 3 duration units
– State / phoneme / mora-level duration
 4 HMMs using different amounts of pre-recorded speech
– 0 
 target-speaker-dependent HMMs (= synthesis HMM)
– 1 
 HMMs adapted using 1 utterance uttered by the user
– 56 
 HMMs adapted using 56 utterances
– 450 
 user-dependent HMMs
 Evaluation
– MOS test on naturalness of synthetic speech
– DMOS test on prosody mimicking ability of synthetic speech
‱ Input speech is presented as reference.
14
/17
Result 1: duration unit &
alignment HMM adaptation
15
1
2
3
4
5
MOS on naturalness DMOS on prosody mimicking ability
0 1 56 450utts.
We can confirm (1) adaptation is effective, and
(2) phoneme-level dur. is relatively robust.
No significant diff. No significant diff.
state phone mora
/17
Experiment 2: Effectiveness of U/V
modification in naturalness
Preferencescoreonnaturalness[%]
0
20
40
60
80
100
Spkr1 Spkr2 Spkr3 Spkr4
U/Vmodificationratio[%]
0
5
10
15
20
Spkr1 Spkr2 Spkr3 Spkr4
w/o or w/ modification U->V or V->U modification
U/V modification can improve the naturalness!
(especially when many U frames of input speech are fixed.)
/17
Conclusion
 2 functions to control synthetic speech
– An original function of HMM-based TTS
‱ MR-HMM or manual control
– Speech-based control
‱ Intuitive for users
 2 main modules of our system
– Mimic duration.
‱ Copy duration of input speech to synthetic speech.
– Mimic F0 patterns.
‱ Copy dynamic F0 pattern of input speech to synthetic speech.
 Future work
– HMM selection using text & speech 17

Weitere Àhnliche Inhalte

Was ist angesagt?

Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
International Journal of Science and Research (IJSR)
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
himadrigupta
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
Husain master
 

Was ist angesagt? (12)

Limited Data Speaker VeriïŹcation: Fusion of Features
Limited Data Speaker VeriïŹcation: Fusion of FeaturesLimited Data Speaker VeriïŹcation: Fusion of Features
Limited Data Speaker VeriïŹcation: Fusion of Features
 
A Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis SystemA Marathi Hidden-Markov Model Based Speech Synthesis System
A Marathi Hidden-Markov Model Based Speech Synthesis System
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
Baum2
Baum2Baum2
Baum2
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for VietnameseThe first FOSD-tacotron-2-based text-to-speech application for Vietnamese
The first FOSD-tacotron-2-based text-to-speech application for Vietnamese
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
 
Speech user interface
Speech user interfaceSpeech user interface
Speech user interface
 
Matlab: Speech Signal Analysis
Matlab: Speech Signal AnalysisMatlab: Speech Signal Analysis
Matlab: Speech Signal Analysis
 

Andere mochten auch

GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœGMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
Shinnosuke Takamichi
 
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
Shinnosuke Takamichi
 
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽMoment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
Shinnosuke Takamichi
 

Andere mochten auch (16)

æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ”Moment-matching networkにćŸșă„ăäž€æœŸäž€äŒšéŸłćŁ°ćˆæˆă«ăŠă‘ă‚‹ç™șè©±é–“ć€‰ć‹•ăźè©•äŸĄâ€
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ”Moment-matching networkにćŸșă„ăäž€æœŸäž€äŒšéŸłćŁ°ćˆæˆă«ăŠă‘ă‚‹ç™șè©±é–“ć€‰ć‹•ăźè©•äŸĄâ€æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ”Moment-matching networkにćŸșă„ăäž€æœŸäž€äŒšéŸłćŁ°ćˆæˆă«ăŠă‘ă‚‹ç™șè©±é–“ć€‰ć‹•ăźè©•äŸĄâ€
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ”Moment-matching networkにćŸșă„ăäž€æœŸäž€äŒšéŸłćŁ°ćˆæˆă«ăŠă‘ă‚‹ç™șè©±é–“ć€‰ć‹•ăźè©•äŸĄâ€
 
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 â€ă‚Żăƒ©ă‚Šăƒ‰ă‚œăƒŒă‚·ăƒłă‚°ă‚’ćˆ©ç”šă—ăŸćŻŸèšłæ–čèš€éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čăźæ§‹çŻ‰â€
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 â€ă‚Żăƒ©ă‚Šăƒ‰ă‚œăƒŒă‚·ăƒłă‚°ă‚’ćˆ©ç”šă—ăŸćŻŸèšłæ–čèš€éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čăźæ§‹çŻ‰â€æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 â€ă‚Żăƒ©ă‚Šăƒ‰ă‚œăƒŒă‚·ăƒłă‚°ă‚’ćˆ©ç”šă—ăŸćŻŸèšłæ–čèš€éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čăźæ§‹çŻ‰â€
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 â€ă‚Żăƒ©ă‚Šăƒ‰ă‚œăƒŒă‚·ăƒłă‚°ă‚’ćˆ©ç”šă—ăŸćŻŸèšłæ–čèš€éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čăźæ§‹çŻ‰â€
 
GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœGMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
GMMにćŸșă„ăć›șæœ‰ćŁ°ć€‰æ›ăźăŸă‚ăźć€‰èȘżă‚čăƒšă‚Żăƒˆăƒ«ćˆ¶çŽ„ä»˜ăăƒˆăƒ©ă‚žă‚§ă‚ŻăƒˆăƒȘć­Šçż’ăƒ»é©ćżœ
 
DNNéŸłéŸżăƒąăƒ‡ăƒ«ă«ăŠă‘ă‚‹ç‰čćŸŽé‡æŠœć‡șăźè«žç›ž
DNNéŸłéŸżăƒąăƒ‡ăƒ«ă«ăŠă‘ă‚‹ç‰čćŸŽé‡æŠœć‡șăźè«žç›žDNNéŸłéŸżăƒąăƒ‡ăƒ«ă«ăŠă‘ă‚‹ç‰čćŸŽé‡æŠœć‡șăźè«žç›ž
DNNéŸłéŸżăƒąăƒ‡ăƒ«ă«ăŠă‘ă‚‹ç‰čćŸŽé‡æŠœć‡șăźè«žç›ž
 
ICASSP2017èȘ­ăżäŒš (Deep Learning III) [電通性 äž­éčżć…ˆç”Ÿ]
ICASSP2017èȘ­ăżäŒš (Deep Learning III) [電通性 äž­éčżć…ˆç”Ÿ]ICASSP2017èȘ­ăżäŒš (Deep Learning III) [電通性 äž­éčżć…ˆç”Ÿ]
ICASSP2017èȘ­ăżäŒš (Deep Learning III) [電通性 äž­éčżć…ˆç”Ÿ]
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
音棰ぼ棰èłȘă‚’ć€‰æ›ă™ă‚‹æŠ€èĄ“ăšăăźćżœç”š
音棰ぼ棰èłȘă‚’ć€‰æ›ă™ă‚‹æŠ€èĄ“ăšăăźćżœç”šéŸłćŁ°ăźćŁ°èłȘă‚’ć€‰æ›ă™ă‚‹æŠ€èĄ“ăšăăźćżœç”š
音棰ぼ棰èłȘă‚’ć€‰æ›ă™ă‚‹æŠ€èĄ“ăšăăźćżœç”š
 
ICASSP2017èȘ­ăżäŒš (acoustic modeling and adaptation)
ICASSP2017èȘ­ăżäŒš (acoustic modeling and adaptation)ICASSP2017èȘ­ăżäŒš (acoustic modeling and adaptation)
ICASSP2017èȘ­ăżäŒš (acoustic modeling and adaptation)
 
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ăƒ“ă‚źăƒŠăƒŒă‚șă‚»ăƒŸăƒŠăƒŒ "æ·±ć±€ć­Šçż’ă‚’æ·±ăć­Šçż’ă™ă‚‹ăŸă‚ăźćŸș瀎"
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ăƒ“ă‚źăƒŠăƒŒă‚șă‚»ăƒŸăƒŠăƒŒ "æ·±ć±€ć­Šçż’ă‚’æ·±ăć­Šçż’ă™ă‚‹ăŸă‚ăźćŸș瀎"æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ăƒ“ă‚źăƒŠăƒŒă‚șă‚»ăƒŸăƒŠăƒŒ "æ·±ć±€ć­Šçż’ă‚’æ·±ăć­Šçż’ă™ă‚‹ăŸă‚ăźćŸș瀎"
æ—„æœŹéŸłéŸżć­ŠäŒš2017秋 ăƒ“ă‚źăƒŠăƒŒă‚șă‚»ăƒŸăƒŠăƒŒ "æ·±ć±€ć­Šçż’ă‚’æ·±ăć­Šçż’ă™ă‚‹ăŸă‚ăźćŸș瀎"
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
 
MIRU2016 ăƒăƒ„ăƒŒăƒˆăƒȘă‚ąăƒ«
MIRU2016 ăƒăƒ„ăƒŒăƒˆăƒȘă‚ąăƒ«MIRU2016 ăƒăƒ„ăƒŒăƒˆăƒȘă‚ąăƒ«
MIRU2016 ăƒăƒ„ăƒŒăƒˆăƒȘă‚ąăƒ«
 
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
é›‘éŸłç’°ćąƒäž‹éŸłćŁ°ă‚’ç”šă„ăŸéŸłćŁ°ćˆæˆăźăŸă‚ăźé›‘éŸłç”Ÿæˆăƒąăƒ‡ăƒ«ăźæ•”ćŻŸçš„ć­Šçż’
 
äżĄć·ć‡Šç†ăƒ»ç”»ćƒć‡Šç†ă«ăŠă‘ă‚‹ć‡žæœ€é©ćŒ–
äżĄć·ć‡Šç†ăƒ»ç”»ćƒć‡Šç†ă«ăŠă‘ă‚‹ć‡žæœ€é©ćŒ–äżĄć·ć‡Šç†ăƒ»ç”»ćƒć‡Šç†ă«ăŠă‘ă‚‹ć‡žæœ€é©ćŒ–
äżĄć·ć‡Šç†ăƒ»ç”»ćƒć‡Šç†ă«ăŠă‘ă‚‹ć‡žæœ€é©ćŒ–
 
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽMoment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
Moment matching networkă‚’ç”šă„ăŸéŸłćŁ°ăƒ‘ăƒ©ăƒĄăƒŒă‚żăźăƒ©ăƒłăƒ€ăƒ ç”Ÿæˆăźæ€œèšŽ
 
ICASSP2017èȘ­ăżäŒšïŒˆé–ąæ±ç·šïŒ‰ăƒ»AASP_L3ïŒˆćŒ—æ‘æ‹…ćœ“ćˆ†ïŒ‰
ICASSP2017èȘ­ăżäŒšïŒˆé–ąæ±ç·šïŒ‰ăƒ»AASP_L3ïŒˆćŒ—æ‘æ‹…ćœ“ćˆ†ïŒ‰ICASSP2017èȘ­ăżäŒšïŒˆé–ąæ±ç·šïŒ‰ăƒ»AASP_L3ïŒˆćŒ—æ‘æ‹…ćœ“ćˆ†ïŒ‰
ICASSP2017èȘ­ăżäŒšïŒˆé–ąæ±ç·šïŒ‰ăƒ»AASP_L3ïŒˆćŒ—æ‘æ‹…ćœ“ćˆ†ïŒ‰
 
ăƒ€ăƒ•ăƒŒéŸłćŁ°èȘè­˜ă‚”ăƒŒăƒ’ă‚™ă‚čăŠă‚™ăźăƒ†ă‚™ă‚ŁăƒŒăƒ•ă‚šăƒ©ăƒŒăƒ‹ăƒłă‚Żă‚™ăšGPUćˆ©ç”šäș‹äŸ‹
ăƒ€ăƒ•ăƒŒéŸłćŁ°èȘè­˜ă‚”ăƒŒăƒ’ă‚™ă‚čăŠă‚™ăźăƒ†ă‚™ă‚ŁăƒŒăƒ•ă‚šăƒ©ăƒŒăƒ‹ăƒłă‚Żă‚™ăšGPUćˆ©ç”šäș‹äŸ‹ăƒ€ăƒ•ăƒŒéŸłćŁ°èȘè­˜ă‚”ăƒŒăƒ’ă‚™ă‚čăŠă‚™ăźăƒ†ă‚™ă‚ŁăƒŒăƒ•ă‚šăƒ©ăƒŒăƒ‹ăƒłă‚Żă‚™ăšGPUćˆ©ç”šäș‹äŸ‹
ăƒ€ăƒ•ăƒŒéŸłćŁ°èȘè­˜ă‚”ăƒŒăƒ’ă‚™ă‚čăŠă‚™ăźăƒ†ă‚™ă‚ŁăƒŒăƒ•ă‚šăƒ©ăƒŒăƒ‹ăƒłă‚Żă‚™ăšGPUćˆ©ç”šäș‹äŸ‹
 

Ähnlich wie Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input

voice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdfvoice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdf
DeepthiDeepu668278
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
behzad66
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
kevig
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
kevig
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
Madhu Babu
 

Ähnlich wie Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input (20)

Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis SystemEvaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
Evaluation of Hidden Markov Model based Marathi Text-ToSpeech Synthesis System
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
voice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdfvoice-morphing-101113123852-phpapp011-151211104638.pdf
voice-morphing-101113123852-phpapp011-151211104638.pdf
 
What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication? What can GAN and GMMN do for augmented speech communication?
What can GAN and GMMN do for augmented speech communication?
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Survey On Speech Synthesis
Survey On Speech SynthesisSurvey On Speech Synthesis
Survey On Speech Synthesis
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...Hindi digits recognition system on speech data collected in different natural...
Hindi digits recognition system on speech data collected in different natural...
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTSEFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
EFFECT OF MFCC BASED FEATURES FOR SPEECH SIGNAL ALIGNMENTS
 
Effect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal AlignmentsEffect of MFCC Based Features for Speech Signal Alignments
Effect of MFCC Based Features for Speech Signal Alignments
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
 

Mehr von Shinnosuke Takamichi

éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
Shinnosuke Takamichi
 
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čJ-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
Shinnosuke Takamichi
 
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural Sequence-to-Sequ...
Shinnosuke Takamichi
 
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźšć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
Shinnosuke Takamichi
 
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź  ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠéŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź  ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
Shinnosuke Takamichi
 

Mehr von Shinnosuke Takamichi (20)

JTubeSpeech: 音棰èȘè­˜ăšè©±è€…ç…§ćˆăźăŸă‚ă« YouTube ă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹æ—„æœŹèȘžéŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
JTubeSpeech:  音棰èȘè­˜ăšè©±è€…ç…§ćˆăźăŸă‚ă« YouTube ă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹æ—„æœŹèȘžéŸłćŁ°ă‚łăƒŒăƒ‘ă‚čJTubeSpeech:  音棰èȘè­˜ăšè©±è€…ç…§ćˆăźăŸă‚ă« YouTube ă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹æ—„æœŹèȘžéŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
JTubeSpeech: 音棰èȘè­˜ăšè©±è€…ç…§ćˆăźăŸă‚ă« YouTube ă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹æ—„æœŹèȘžéŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
 
éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
éŸłćŁ°ćˆæˆăźă‚łăƒŒăƒ‘ă‚čă‚’ă€ăă‚ă†
 
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚čJ-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
J-KACïŒšæ—„æœŹèȘžă‚ȘăƒŒăƒ‡ă‚Łă‚Șăƒ–ăƒƒă‚Żăƒ»çŽ™èŠć±…æœ—èȘ­éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
 
短時間ç™șè©±ă‚’ç”šă„ăŸè©±è€…ç…§ćˆăźăŸă‚ăźéŸłćŁ°ćŠ ć·„ăźćŠčæžœă«é–ąă™ă‚‹æ€œèšŽ
短時間ç™șè©±ă‚’ç”šă„ăŸè©±è€…ç…§ćˆăźăŸă‚ăźéŸłćŁ°ćŠ ć·„ăźćŠčæžœă«é–ąă™ă‚‹æ€œèšŽçŸ­æ™‚é–“ç™șè©±ă‚’ç”šă„ăŸè©±è€…ç…§ćˆăźăŸă‚ăźéŸłćŁ°ćŠ ć·„ăźćŠčæžœă«é–ąă™ă‚‹æ€œèšŽ
短時間ç™șè©±ă‚’ç”šă„ăŸè©±è€…ç…§ćˆăźăŸă‚ăźéŸłćŁ°ćŠ ć·„ăźćŠčæžœă«é–ąă™ă‚‹æ€œèšŽ
 
ăƒȘă‚ąăƒ«ă‚żă‚€ăƒ DNNéŸłćŁ°ć€‰æ›ăƒ•ă‚ŁăƒŒăƒ‰ăƒăƒƒă‚Żă«ă‚ˆă‚‹ă‚­ăƒŁăƒ©ă‚Żă‚żæ€§ăźçČćŸ—æ‰‹æł•
ăƒȘă‚ąăƒ«ă‚żă‚€ăƒ DNNéŸłćŁ°ć€‰æ›ăƒ•ă‚ŁăƒŒăƒ‰ăƒăƒƒă‚Żă«ă‚ˆă‚‹ă‚­ăƒŁăƒ©ă‚Żă‚żæ€§ăźçČćŸ—æ‰‹æł•ăƒȘă‚ąăƒ«ă‚żă‚€ăƒ DNNéŸłćŁ°ć€‰æ›ăƒ•ă‚ŁăƒŒăƒ‰ăƒăƒƒă‚Żă«ă‚ˆă‚‹ă‚­ăƒŁăƒ©ă‚Żă‚żæ€§ăźçČćŸ—æ‰‹æł•
ăƒȘă‚ąăƒ«ă‚żă‚€ăƒ DNNéŸłćŁ°ć€‰æ›ăƒ•ă‚ŁăƒŒăƒ‰ăƒăƒƒă‚Żă«ă‚ˆă‚‹ă‚­ăƒŁăƒ©ă‚Żă‚żæ€§ăźçČćŸ—æ‰‹æł•
 
ă“ă“ăŸă§æ„ăŸïŒ†ă“ă‚Œă‹ă‚‰æ„ă‚‹éŸłćŁ°ćˆæˆ (明æȻ性歊 ć…ˆç«ŻăƒĄăƒ‡ă‚Łă‚ąă‚łăƒ­ă‚­ă‚Šăƒ )
ă“ă“ăŸă§æ„ăŸïŒ†ă“ă‚Œă‹ă‚‰æ„ă‚‹éŸłćŁ°ćˆæˆ (明æȻ性歊 ć…ˆç«ŻăƒĄăƒ‡ă‚Łă‚ąă‚łăƒ­ă‚­ă‚Šăƒ )ă“ă“ăŸă§æ„ăŸïŒ†ă“ă‚Œă‹ă‚‰æ„ă‚‹éŸłćŁ°ćˆæˆ (明æȻ性歊 ć…ˆç«ŻăƒĄăƒ‡ă‚Łă‚ąă‚łăƒ­ă‚­ă‚Šăƒ )
ă“ă“ăŸă§æ„ăŸïŒ†ă“ă‚Œă‹ă‚‰æ„ă‚‹éŸłćŁ°ćˆæˆ (明æȻ性歊 ć…ˆç«ŻăƒĄăƒ‡ă‚Łă‚ąă‚łăƒ­ă‚­ă‚Šăƒ )
 
ć›œéš›äŒšè­° interspeech 2020 栱摊
ć›œéš›äŒšè­° interspeech 2020 ć ±ć‘Šć›œéš›äŒšè­° interspeech 2020 栱摊
ć›œéš›äŒšè­° interspeech 2020 栱摊
 
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural Sequence-to-Sequ...
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural  Sequence-to-Sequ...Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural  Sequence-to-Sequ...
Interspeech 2020 èȘ­ăżäŒš "Incremental Text to Speech for Neural Sequence-to-Sequ...
 
ă‚”ăƒ–ăƒăƒłăƒ‰ăƒ•ă‚Łăƒ«ă‚żăƒȘăƒłă‚°ă«ćŸșă„ăăƒȘă‚ąăƒ«ă‚żă‚€ăƒ ćșƒćžŻćŸŸDNN棰èłȘć€‰æ›ăźćźŸèŁ…ăšè©•äŸĄ
ă‚”ăƒ–ăƒăƒłăƒ‰ăƒ•ă‚Łăƒ«ă‚żăƒȘăƒłă‚°ă«ćŸșă„ăăƒȘă‚ąăƒ«ă‚żă‚€ăƒ ćșƒćžŻćŸŸDNN棰èłȘć€‰æ›ăźćźŸèŁ…ăšè©•äŸĄă‚”ăƒ–ăƒăƒłăƒ‰ăƒ•ă‚Łăƒ«ă‚żăƒȘăƒłă‚°ă«ćŸșă„ăăƒȘă‚ąăƒ«ă‚żă‚€ăƒ ćșƒćžŻćŸŸDNN棰èłȘć€‰æ›ăźćźŸèŁ…ăšè©•äŸĄ
ă‚”ăƒ–ăƒăƒłăƒ‰ăƒ•ă‚Łăƒ«ă‚żăƒȘăƒłă‚°ă«ćŸșă„ăăƒȘă‚ąăƒ«ă‚żă‚€ăƒ ćșƒćžŻćŸŸDNN棰èłȘć€‰æ›ăźćźŸèŁ…ăšè©•äŸĄ
 
P J S éŸłçŽ ăƒăƒ©ăƒłă‚čă‚’è€ƒæ…źă—ăŸæ—„æœŹèȘžæ­ŒćŁ°ă‚łăƒŒăƒ‘ă‚č
P J S éŸłçŽ ăƒăƒ©ăƒłă‚čă‚’è€ƒæ…źă—ăŸæ—„æœŹèȘžæ­ŒćŁ°ă‚łăƒŒăƒ‘ă‚čP J S éŸłçŽ ăƒăƒ©ăƒłă‚čă‚’è€ƒæ…źă—ăŸæ—„æœŹèȘžæ­ŒćŁ°ă‚łăƒŒăƒ‘ă‚č
P J S éŸłçŽ ăƒăƒ©ăƒłă‚čă‚’è€ƒæ…źă—ăŸæ—„æœŹèȘžæ­ŒćŁ°ă‚łăƒŒăƒ‘ă‚č
 
éŸłéŸżăƒąăƒ‡ăƒ«ć°€ćșŠă«ćŸșă„ăsubword戆ć‰ČăźéŸ»ćŸ‹æŽšćźšçČŸćșŠă«ăŠă‘ă‚‹è©•äŸĄ
éŸłéŸżăƒąăƒ‡ăƒ«ć°€ćșŠă«ćŸșă„ăsubword戆ć‰ČăźéŸ»ćŸ‹æŽšćźšçČŸćșŠă«ăŠă‘ă‚‹è©•äŸĄéŸłéŸżăƒąăƒ‡ăƒ«ć°€ćșŠă«ćŸșă„ăsubword戆ć‰ČăźéŸ»ćŸ‹æŽšćźšçČŸćșŠă«ăŠă‘ă‚‹è©•äŸĄ
éŸłéŸżăƒąăƒ‡ăƒ«ć°€ćșŠă«ćŸșă„ăsubword戆ć‰ČăźéŸ»ćŸ‹æŽšćźšçČŸćșŠă«ăŠă‘ă‚‹è©•äŸĄ
 
éŸłćŁ°ćˆæˆç ”ç©¶ă‚’ćŠ é€Ÿă•ă›ă‚‹ăŸă‚ăźă‚łăƒŒăƒ‘ă‚čăƒ‡ă‚¶ă‚€ăƒł
éŸłćŁ°ćˆæˆç ”ç©¶ă‚’ćŠ é€Ÿă•ă›ă‚‹ăŸă‚ăźă‚łăƒŒăƒ‘ă‚čăƒ‡ă‚¶ă‚€ăƒłéŸłćŁ°ćˆæˆç ”ç©¶ă‚’ćŠ é€Ÿă•ă›ă‚‹ăŸă‚ăźă‚łăƒŒăƒ‘ă‚čăƒ‡ă‚¶ă‚€ăƒł
éŸłćŁ°ćˆæˆç ”ç©¶ă‚’ćŠ é€Ÿă•ă›ă‚‹ăŸă‚ăźă‚łăƒŒăƒ‘ă‚čăƒ‡ă‚¶ă‚€ăƒł
 
論文çŽč介 Unsupervised training of neural mask-based beamforming
論文çŽč介 Unsupervised training of neural  mask-based beamforming論文çŽč介 Unsupervised training of neural  mask-based beamforming
論文çŽč介 Unsupervised training of neural mask-based beamforming
 
論文çŽč介 Building the Singapore English National Speech Corpus
論文çŽč介 Building the Singapore English National Speech Corpus論文çŽč介 Building the Singapore English National Speech Corpus
論文çŽč介 Building the Singapore English National Speech Corpus
 
論文çŽč介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文çŽč介 SANTLR: Speech Annotation Toolkit for Low Resource Languages論文çŽč介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
論文çŽč介 SANTLR: Speech Annotation Toolkit for Low Resource Languages
 
話者V2S攻撃 話者èȘèšŒă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹ 棰èłȘć€‰æ›ăšăăźéŸłćŁ°ăȘă‚Šă™ăŸă—ćŻèƒœæ€§ăźè©•äŸĄ
話者V2S攻撃 話者èȘèšŒă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹ 棰èłȘć€‰æ›ăšăăźéŸłćŁ°ăȘă‚Šă™ăŸă—ćŻèƒœæ€§ăźè©•äŸĄè©±è€…V2S攻撃 話者èȘèšŒă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹ 棰èłȘć€‰æ›ăšăăźéŸłćŁ°ăȘă‚Šă™ăŸă—ćŻèƒœæ€§ăźè©•äŸĄ
話者V2S攻撃 話者èȘèšŒă‹ă‚‰æ§‹çŻ‰ă•ă‚Œă‚‹ 棰èłȘć€‰æ›ăšăăźéŸłćŁ°ăȘă‚Šă™ăŸă—ćŻèƒœæ€§ăźè©•äŸĄ
 
JVSïŒšăƒ•ăƒȘăƒŒăźæ—„æœŹèȘžć€šæ•°è©±è€…éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
JVSïŒšăƒ•ăƒȘăƒŒăźæ—„æœŹèȘžć€šæ•°è©±è€…éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č JVSïŒšăƒ•ăƒȘăƒŒăźæ—„æœŹèȘžć€šæ•°è©±è€…éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
JVSïŒšăƒ•ăƒȘăƒŒăźæ—„æœŹèȘžć€šæ•°è©±è€…éŸłćŁ°ă‚łăƒŒăƒ‘ă‚č
 
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźšć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
ć·źćˆ†ă‚čăƒšă‚Żăƒˆăƒ«æł•ă«ćŸșă„ă DNN 棰èłȘć€‰æ›ăźèšˆçź—é‡ć‰Šæž›ă«ć‘ă‘ăŸăƒ•ă‚Łăƒ«ă‚żæŽšćźš
 
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź  ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠéŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź  ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
éŸłćŁ°ćˆæˆăƒ»ć€‰æ›ăźć›œéš›ă‚łăƒłăƒšăƒ†ă‚Łă‚·ăƒ§ăƒłăžăź ć‚ćŠ ă‚’æŒŻă‚Šèż”ăŁăŠ
 
ăƒŠăƒŒă‚¶æ­Œć”±ăźăŸă‚ăź generative moment matching network にćŸșă„ă neural double-tracking
ăƒŠăƒŒă‚¶æ­Œć”±ăźăŸă‚ăź generative moment matching network にćŸșă„ă neural double-trackingăƒŠăƒŒă‚¶æ­Œć”±ăźăŸă‚ăź generative moment matching network にćŸșă„ă neural double-tracking
ăƒŠăƒŒă‚¶æ­Œć”±ăźăŸă‚ăź generative moment matching network にćŸșă„ă neural double-tracking
 

KĂŒrzlich hochgeladen

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
SĂ©rgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

KĂŒrzlich hochgeladen (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
High Class Escorts in Hyderabad â‚č7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad â‚č7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad â‚č7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad â‚č7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRLKochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
❀Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💩✅.
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 

Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input

  • 1. 2015©Shinnosuke TAKAMICHI 09/19/2015 Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input Yuri Nishigaki, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura (NAIST) MLSLP2015 in Aizu Univ.
  • 2. /17 Speech-based creative activities and HMM-based speech synthesis 2 Singing voice Speech Advertisement Live concert Narration Next? Video avatar Voice actor 
 Useful method: HMM-based speech synthesis [Tokuda et al., 2013.] “Synthesize!” Synthetic speech parameters text speech
  • 3. /17 Manual control of synthetic speech Laugh Sad Regression Multi-Regression HMM [Nose et al., 2007.] Manually manipulating HMM parameters User User They are very useful, but difficult to control as the user wants.
  • 4. /17 Motivation of this study  Functions we want – Original capability of HMM-based TTS – Speech-based control ‱ Intuitive to control ‱ Make synthetic speech mimic input speech prosody  Our work – Speech synthesis having both functions 4 Synthesize System Synthesize“Synthesize.” MR-HMM etc. Similar to VOCALISTENER for singing voice control
  • 5. /17 Overview of the proposed system (Only text is input.) 5 Input text Text analysis Waveform generation Synthetic speech Parameter generation Synthesis HMM Original HMM-based speech synthesis
  • 6. /17 Overview of the proposed system (Text & speech are input.) 6 Input textInput speech Speech analysis Text analysis Waveform generation Synthetic speech F0 modification Duration extraction Parameter generation Alignment HMM Synthesis HMM
  • 7. /17 Duration extraction module 7 Alignment HMM Synthesis HMM Feature of input speech Context of Input text HMM alignment Duration generation State duration of synthetic speech Parm. Gen. Duration of input speech
  • 8. /17 Alignment accuracy & duration unit  How to build alignment HMMs suitable for input speech? – → The use of pre-recorded speech uttered by users – Large amounts → user-dependent HMMs – Small amounts → HMMs adapted from original alignment HMMs  How to map the input speech duration to synthetic speech? – Alignment/synthesis HMM-states represent different speech segments. – Which is better, HMM-state, phone, or mora-level duration unit? 8
  • 9. /17 Speech parameter generation module 9 Synthesis HMM Context of Input text Parameter generation Spectrum of synthetic speech F0 generated From HMMs Dur. ext. State duration F0 mod. Wav. Gen.
  • 10. /17 F0 modification module 10 Feature of input speech F0 generated from HMMs F0 conversion U/V region modification Parm. gen. F0 of synthetic speech Wav. Gen.
  • 11. /17 F0 conversion & unvoiced/voiced modification 11 F0 Time Reference generated from HMMs Input speech F0-converted U/V-modified  F0 conversion fixes F0 range of input speech to fit to reference.  U/V modification fixes the U/V region of input speech to fit to reference. Linear conversion Spline interpolation
  • 13. /17 Experimental Setup 13 Content Value/Setting User 4 Japanese speakers (2 male & 2 female) Target speaker 1 Japanese female speaker Training data of synthesis HMMs 450 phoneme-balanced sentences, 16 kHz-sampled, 5 ms shift, reading style Evaluation data 53 phoneme-balanced sentences Speech features 25-dim. mel-cestrum, log F0, 5-band aperiodicity Speech analyzer STRAIGHT [Kawahara et al., 1999.] Text analyzer Open-jtalk Acoustic model 5-state HSMM [Zen et al., 2007.]  1. duration unit & alignment HMM adaptation  2. synthesis HMM adaptation  3. effect of U/V modification
  • 14. /17 Evaluation 1: duration unit & alignment HMM adaptation  3 duration units – State / phoneme / mora-level duration  4 HMMs using different amounts of pre-recorded speech – 0 
 target-speaker-dependent HMMs (= synthesis HMM) – 1 
 HMMs adapted using 1 utterance uttered by the user – 56 
 HMMs adapted using 56 utterances – 450 
 user-dependent HMMs  Evaluation – MOS test on naturalness of synthetic speech – DMOS test on prosody mimicking ability of synthetic speech ‱ Input speech is presented as reference. 14
  • 15. /17 Result 1: duration unit & alignment HMM adaptation 15 1 2 3 4 5 MOS on naturalness DMOS on prosody mimicking ability 0 1 56 450utts. We can confirm (1) adaptation is effective, and (2) phoneme-level dur. is relatively robust. No significant diff. No significant diff. state phone mora
  • 16. /17 Experiment 2: Effectiveness of U/V modification in naturalness Preferencescoreonnaturalness[%] 0 20 40 60 80 100 Spkr1 Spkr2 Spkr3 Spkr4 U/Vmodificationratio[%] 0 5 10 15 20 Spkr1 Spkr2 Spkr3 Spkr4 w/o or w/ modification U->V or V->U modification U/V modification can improve the naturalness! (especially when many U frames of input speech are fixed.)
  • 17. /17 Conclusion  2 functions to control synthetic speech – An original function of HMM-based TTS ‱ MR-HMM or manual control – Speech-based control ‱ Intuitive for users  2 main modules of our system – Mimic duration. ‱ Copy duration of input speech to synthetic speech. – Mimic F0 patterns. ‱ Copy dynamic F0 pattern of input speech to synthetic speech.  Future work – HMM selection using text & speech 17