SlideShare a Scribd company logo
1 of 18
Download to read offline
©Yuki Saito, 07/03/2017
TRAINING ALGORITHM TO DECEIVE
ANTI-SPOOFING VERIFICATION
FOR DNN-BASED SPEECH SYNTHESIS
Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
(The University of Tokyo)
ICASSP 2017 SP-L4.2
/17
 Issue: quality degradation in statistical parametric speech
synthesis due to over-smoothing of the speech params.
 Countermeasures: reproducing natural statistics
– 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.]
– Histogram[Ohtani et al., 2012.]
 Proposed: training algorithm to deceive an Anti-Spoofing
Verification (ASV) for DNN-based speech synthesis
– Tries to deceive the ASV which distinguishes natural / synthetic speech.
– Compensates distribution difference betw. natural / synthetic speech.
 Results:
– Improves the synthetic speech quality.
– Works comparably robustly against its hyper-parameter setting.
1
Outline of This Talk
/17
Conventional Training Algorithm:
Minimum Generation Error (MGE) Training
2
Generation
error
𝐿G 𝒄, ො𝒄
Linguistic
feats.
[Wu et al., 2016.]
Natural
speech
params.
𝐿G 𝒄, ො𝒄 =
1
𝑇
ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize
𝒄
ML-based
parameter
generation
Generated
speech
params.ො𝒄
Acoustic models
⋯
⋯
⋯
Frame
𝑡 = 1
Static-dynamic
mean vectors
Frame
𝑡 = 𝑇
/173
Issue of MGE Training:
Over-smoothing of Generated Speech Parameters
Natural MGE
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
These distributions are significantly different...
(GV [Toda et al., 2007.] explicitly compensates the 2nd moment.)
Narrow
/174
Proposed algorithm:
Training Algorithm to Deceive
Anti-Spoofing Verification (ASV)
/17
Anti-Spoofing Verification (ASV):
Discriminator to Prevent Spoofing Attacks w/ Speech
5
[Wu et al., 2016.] [Chen et al., 2015.]
𝐿D,1 𝒄 𝐿D,0 ො𝒄
𝐿D 𝒄, ො𝒄 = → Minimize−
1
𝑇
෍
𝑡=1
𝑇
log 𝐷 𝒄 𝑡 −
1
𝑇
෍
𝑡=1
𝑇
log 1 − 𝐷 ො𝒄 𝑡
ො𝒄
Cross entropy
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Generated
speech params.
𝒄Natural
speech params.
Feature
function
𝝓 ⋅
Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅
or
Loss to recognize
generated speech as generated
Loss to recognize
natural speech as natural
/17
Training Algorithm to Deceive ASV
6
𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D
𝐸 𝐿G
𝐸 𝐿D
𝐿D,1 ො𝒄 → Minimize
𝐿G 𝒄, ො𝒄
Linguistic
feats.
Natural
speech params. 𝒄
ML-based
parameter
generation
Generated
speech params.ො𝒄
Acoustic models
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
ASV 𝐷 ⋅
Loss to recognize
generated speech as natural
𝜔D: weight, 𝐸𝐿G
, 𝐸𝐿D
: expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄
Static-dynamic
mean vectors
/17
 ① Update the acoustic models
 ② Update the ASV
Iterative Optimization of Acoustic models and ASV
7
By iterating ① and ②, we construct the final acoustic models!
Fixed
Fixed
𝐿G 𝒄, ො𝒄
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Feature
function
𝝓 ⋅
or
/17
 Compensations of speech feats. through the feature function:
– Automatically-derived feats. such as auto-encoded feats.
– Conventional analytically-derived feats. such as GV
 Loss function for training the acoustic models:
– Combination of MGE and adversarial training [Goodfellow et al., 2014.]
 The effect of the adversarial training:
– Minimizes the Jensen-Shannon divergence betw. the distributions of
the natural data / generated data.
8
Discussions of Proposed Algorithm
/179
Distributions of Speech Parameters
Our algorithm alleviates the over-smoothing effect!
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
Natural MGE Proposed
Narrow
Wide as
natural speech
/17
 Global Variance (GV): [Toda et al., 2007.]
– 2nd moment of the parameter distribution
10
Compensation of Global Variance
Feature index
0 5 10 15 20
10-3
10-1
101
Globalvariance
Proposed
Natural
MGE
10-2
100
10-4
GV is NOT used for training, but compensated by the ASV!
/17
 Maximal Information Coefficient (MIC): [Reshef et al., 2011.]
– Values to quantify a nonlinear correlation b/w two variables
– Natural speech params. tend to have weak correlation [Ijima et al., 2016.]
11
Additional Effect:
Alleviation of Unnaturally Strong Correlation
Natural MGE
0 6 12 18 24
0.0
0.2
0.4
0.6
0.8
1.0
Strong
Weak
Proposed
0 6 12 18 24 0 6 12 18 24
Proposed algorithm not only compensates the GV,
but also makes the correlations among speech params. natural!
/1712
Experimental Evaluations
/17
Experimental Conditions
13
Dataset
ATR Japanese speech database
(phonetic balanced 503 sentences)
Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling)
Linguistic feats.
274-dimensional vector
(phoneme, accent type, frame position, etc...)
Speech params.
Mel-cepstral coefficients (0th-through-24th),
𝐹0, 5-band aperiodicity
Prediction params.
Mel-cepstral coefficients
(the others were NOT predicted)
Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01)
Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear)
ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
/17
Initialization, Training, and Objective Evaluation
14
 Initialization:
– Acoustic models: conventional MGE training
– ASV: distinguish natural / generated speech after the MGE training
 Training:
– Acoustic models: update with the proposed algorithm
– ASV: distinguish natural / generated speech after updating the acoustic
models
 Objective evaluation:
– Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate
Spoofing rate =
# of the spoofing synthetic speech params.
Total # of the synthetic speech params.
We calculated these values w/ various 𝜔D.
/17
Results of Objective Evaluations
15
Generation loss Spoofing rate
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
0.45
0.50
0.55
0.60
0.65
0.70
0.75
1.0
0.8
0.6
0.4
0.2
0.0
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
Got
worse when 𝜔D > 0.3,
spoofing rate > 99%
Got
better
Our algorithm makes the generation loss worse
but
can train the acoustic models to deceive the ASV!
/17
Results of Subjective Evaluations
in Terms of Speech Quality
16
Proposed
𝜔D = 1.0
Proposed
𝜔D = 0.3
MGE
𝜔D = 0.0
Preference score (w/ 8 listeners)
0.0 0.2 0.4 0.6 0.8 1.0
Got
better
NO
significant
difference
Our algorithm improves the synthetic speech quality
and
works comparably robustly against its hyper-parameter setting!
Error bars denote 95% confidence intervals.
Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
/17
Conclusion
 Purpose:
– Improving the speech quality of statistical parametric speech synthesis
 Proposed:
– Training algorithm to deceive an ASV
• Compensates the difference b/w distributions of natural /
generated speech params. using adversarial training
 Results:
– Improved the speech quality compared to conventional training
– Worked comparably robustly against its hyper-parameter setting
 Future work:
– Devising temporal- and linguistic-dependent ASV
– Extending our algorithm to generate 𝐹0 and duration
17

More Related Content

What's hot

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...Daichi Kitamura
 
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...Tatsuya Yokota
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...Daichi Kitamura
 
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~Yui Sudo
 
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...Daichi Kitamura
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)Daichi Kitamura
 
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介 wav2vec: Unsupervised Pre-training for Speech RecognitionYosukeKashiwagi1
 
異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例NU_I_TODALAB
 
深層学習と音響信号処理
深層学習と音響信号処理深層学習と音響信号処理
深層学習と音響信号処理Yuma Koizumi
 
環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類Keisuke Imoto
 
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...Daichi Kitamura
 
Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Tatsuya Yokota
 
音情報処理における特徴表現
音情報処理における特徴表現音情報処理における特徴表現
音情報処理における特徴表現NU_I_TODALAB
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022NU_I_TODALAB
 
z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)Shinnosuke Takamichi
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術Yuma Koizumi
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向Yuma Koizumi
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元NU_I_TODALAB
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 

What's hot (20)

独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
独立低ランク行列分析に基づく音源分離とその発展(Audio source separation based on independent low-rank...
 
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...
低ランク性および平滑性を用いたテンソル補完 (Tensor Completion based on Low-rank and Smooth Structu...
 
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
独立低ランク行列分析に基づくブラインド音源分離(Blind source separation based on independent low-rank...
 
ILRMA 20170227 danwakai
ILRMA 20170227 danwakaiILRMA 20170227 danwakai
ILRMA 20170227 danwakai
 
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
深層学習を用いた音源定位、音源分離、クラス分類の統合~環境音セグメンテーション手法の紹介~
 
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
独立深層学習行列分析に基づく多チャネル音源分離の実験的評価(Experimental evaluation of multichannel audio s...
 
音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)音源分離における音響モデリング(Acoustic modeling in audio source separation)
音源分離における音響モデリング(Acoustic modeling in audio source separation)
 
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition論文紹介  wav2vec: Unsupervised Pre-training for Speech Recognition
論文紹介 wav2vec: Unsupervised Pre-training for Speech Recognition
 
異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例異常音検知に対する深層学習適用事例
異常音検知に対する深層学習適用事例
 
深層学習と音響信号処理
深層学習と音響信号処理深層学習と音響信号処理
深層学習と音響信号処理
 
環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類
 
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...統計的独立性と低ランク行列分解理論に基づくブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
統計的独立性と低ランク行列分解理論に基づく ブラインド音源分離 –独立低ランク行列分析– Blind source separation based on...
 
Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元Priorに基づく画像/テンソルの復元
Priorに基づく画像/テンソルの復元
 
音情報処理における特徴表現
音情報処理における特徴表現音情報処理における特徴表現
音情報処理における特徴表現
 
The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022The VoiceMOS Challenge 2022
The VoiceMOS Challenge 2022
 
z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)z変換をやさしく教えて下さい (音響学入門ペディア)
z変換をやさしく教えて下さい (音響学入門ペディア)
 
実環境音響信号処理における収音技術
実環境音響信号処理における収音技術実環境音響信号処理における収音技術
実環境音響信号処理における収音技術
 
ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向ICASSP 2019での音響信号処理分野の世界動向
ICASSP 2019での音響信号処理分野の世界動向
 
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
時間領域低ランクスペクトログラム近似法に基づくマスキング音声の欠損成分復元
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 

Viewers also liked

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asjYuki Saito
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"Shinnosuke Takamichi
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputShinnosuke Takamichi
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用NU_I_TODALAB
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”Shinnosuke Takamichi
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応Shinnosuke Takamichi
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”Shinnosuke Takamichi
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)Shinnosuke Takamichi
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Shinnosuke Takamichi
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習Shinnosuke Takamichi
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアルShunsuke Ono
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化Shunsuke Ono
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Shinnosuke Takamichi
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)Daichi Kitamura
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例Yahoo!デベロッパーネットワーク
 

Viewers also liked (18)

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 

Similar to Saito2017icassp

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfYuki Saito
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdfrahul mishra
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...IJECEIAES
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpomosi2005
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmtJAEMINJEONG5
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion RecognitionSeoul National University
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...ssuser849b73
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_Dia Abdulkerim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 

Similar to Saito2017icassp (20)

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdf
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 

More from Yuki Saito

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdfYuki Saito
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告Yuki Saito
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_ascYuki Saito
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUSYuki Saito
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversionYuki Saito
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentationYuki Saito
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn MeetingYuki Saito
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 readingYuki Saito
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumnYuki Saito
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020Yuki Saito
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_publishedYuki Saito
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNAYuki Saito
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationYuki Saito
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会Yuki Saito
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_ttsYuki Saito
 

More from Yuki Saito (20)

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdf
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentation
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn Meeting
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 reading
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumn
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_published
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNA
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generation
 
Saito19asj_s
Saito19asj_sSaito19asj_s
Saito19asj_s
 
Saito18sp03
Saito18sp03Saito18sp03
Saito18sp03
 
Saito18asj_s
Saito18asj_sSaito18asj_s
Saito18asj_s
 
Saito17asjA
Saito17asjASaito17asjA
Saito17asjA
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会
 
miyoshi17sp07
miyoshi17sp07miyoshi17sp07
miyoshi17sp07
 
saito2017asj_tts
saito2017asj_ttssaito2017asj_tts
saito2017asj_tts
 

Recently uploaded

Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfSenaatti-kiinteistöt
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 

Recently uploaded (20)

Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 

Saito2017icassp

  • 1. ©Yuki Saito, 07/03/2017 TRAINING ALGORITHM TO DECEIVE ANTI-SPOOFING VERIFICATION FOR DNN-BASED SPEECH SYNTHESIS Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari (The University of Tokyo) ICASSP 2017 SP-L4.2
  • 2. /17  Issue: quality degradation in statistical parametric speech synthesis due to over-smoothing of the speech params.  Countermeasures: reproducing natural statistics – 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.] – Histogram[Ohtani et al., 2012.]  Proposed: training algorithm to deceive an Anti-Spoofing Verification (ASV) for DNN-based speech synthesis – Tries to deceive the ASV which distinguishes natural / synthetic speech. – Compensates distribution difference betw. natural / synthetic speech.  Results: – Improves the synthetic speech quality. – Works comparably robustly against its hyper-parameter setting. 1 Outline of This Talk
  • 3. /17 Conventional Training Algorithm: Minimum Generation Error (MGE) Training 2 Generation error 𝐿G 𝒄, ො𝒄 Linguistic feats. [Wu et al., 2016.] Natural speech params. 𝐿G 𝒄, ො𝒄 = 1 𝑇 ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ Frame 𝑡 = 1 Static-dynamic mean vectors Frame 𝑡 = 𝑇
  • 4. /173 Issue of MGE Training: Over-smoothing of Generated Speech Parameters Natural MGE 21st mel-cepstral coefficient 23rdmel-cepstral coefficient These distributions are significantly different... (GV [Toda et al., 2007.] explicitly compensates the 2nd moment.) Narrow
  • 5. /174 Proposed algorithm: Training Algorithm to Deceive Anti-Spoofing Verification (ASV)
  • 6. /17 Anti-Spoofing Verification (ASV): Discriminator to Prevent Spoofing Attacks w/ Speech 5 [Wu et al., 2016.] [Chen et al., 2015.] 𝐿D,1 𝒄 𝐿D,0 ො𝒄 𝐿D 𝒄, ො𝒄 = → Minimize− 1 𝑇 ෍ 𝑡=1 𝑇 log 𝐷 𝒄 𝑡 − 1 𝑇 ෍ 𝑡=1 𝑇 log 1 − 𝐷 ො𝒄 𝑡 ො𝒄 Cross entropy 𝐿D 𝒄, ො𝒄 1: natural 0: generated Generated speech params. 𝒄Natural speech params. Feature function 𝝓 ⋅ Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅ or Loss to recognize generated speech as generated Loss to recognize natural speech as natural
  • 7. /17 Training Algorithm to Deceive ASV 6 𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D 𝐸 𝐿G 𝐸 𝐿D 𝐿D,1 ො𝒄 → Minimize 𝐿G 𝒄, ො𝒄 Linguistic feats. Natural speech params. 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ ASV 𝐷 ⋅ Loss to recognize generated speech as natural 𝜔D: weight, 𝐸𝐿G , 𝐸𝐿D : expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄 Static-dynamic mean vectors
  • 8. /17  ① Update the acoustic models  ② Update the ASV Iterative Optimization of Acoustic models and ASV 7 By iterating ① and ②, we construct the final acoustic models! Fixed Fixed 𝐿G 𝒄, ො𝒄 Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D 𝒄, ො𝒄 1: natural 0: generated Feature function 𝝓 ⋅ or
  • 9. /17  Compensations of speech feats. through the feature function: – Automatically-derived feats. such as auto-encoded feats. – Conventional analytically-derived feats. such as GV  Loss function for training the acoustic models: – Combination of MGE and adversarial training [Goodfellow et al., 2014.]  The effect of the adversarial training: – Minimizes the Jensen-Shannon divergence betw. the distributions of the natural data / generated data. 8 Discussions of Proposed Algorithm
  • 10. /179 Distributions of Speech Parameters Our algorithm alleviates the over-smoothing effect! 21st mel-cepstral coefficient 23rdmel-cepstral coefficient Natural MGE Proposed Narrow Wide as natural speech
  • 11. /17  Global Variance (GV): [Toda et al., 2007.] – 2nd moment of the parameter distribution 10 Compensation of Global Variance Feature index 0 5 10 15 20 10-3 10-1 101 Globalvariance Proposed Natural MGE 10-2 100 10-4 GV is NOT used for training, but compensated by the ASV!
  • 12. /17  Maximal Information Coefficient (MIC): [Reshef et al., 2011.] – Values to quantify a nonlinear correlation b/w two variables – Natural speech params. tend to have weak correlation [Ijima et al., 2016.] 11 Additional Effect: Alleviation of Unnaturally Strong Correlation Natural MGE 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 Strong Weak Proposed 0 6 12 18 24 0 6 12 18 24 Proposed algorithm not only compensates the GV, but also makes the correlations among speech params. natural!
  • 14. /17 Experimental Conditions 13 Dataset ATR Japanese speech database (phonetic balanced 503 sentences) Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling) Linguistic feats. 274-dimensional vector (phoneme, accent type, frame position, etc...) Speech params. Mel-cepstral coefficients (0th-through-24th), 𝐹0, 5-band aperiodicity Prediction params. Mel-cepstral coefficients (the others were NOT predicted) Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01) Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear) ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
  • 15. /17 Initialization, Training, and Objective Evaluation 14  Initialization: – Acoustic models: conventional MGE training – ASV: distinguish natural / generated speech after the MGE training  Training: – Acoustic models: update with the proposed algorithm – ASV: distinguish natural / generated speech after updating the acoustic models  Objective evaluation: – Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate Spoofing rate = # of the spoofing synthetic speech params. Total # of the synthetic speech params. We calculated these values w/ various 𝜔D.
  • 16. /17 Results of Objective Evaluations 15 Generation loss Spoofing rate 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D 0.45 0.50 0.55 0.60 0.65 0.70 0.75 1.0 0.8 0.6 0.4 0.2 0.0 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D Got worse when 𝜔D > 0.3, spoofing rate > 99% Got better Our algorithm makes the generation loss worse but can train the acoustic models to deceive the ASV!
  • 17. /17 Results of Subjective Evaluations in Terms of Speech Quality 16 Proposed 𝜔D = 1.0 Proposed 𝜔D = 0.3 MGE 𝜔D = 0.0 Preference score (w/ 8 listeners) 0.0 0.2 0.4 0.6 0.8 1.0 Got better NO significant difference Our algorithm improves the synthetic speech quality and works comparably robustly against its hyper-parameter setting! Error bars denote 95% confidence intervals. Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
  • 18. /17 Conclusion  Purpose: – Improving the speech quality of statistical parametric speech synthesis  Proposed: – Training algorithm to deceive an ASV • Compensates the difference b/w distributions of natural / generated speech params. using adversarial training  Results: – Improved the speech quality compared to conventional training – Worked comparably robustly against its hyper-parameter setting  Future work: – Devising temporal- and linguistic-dependent ASV – Extending our algorithm to generate 𝐹0 and duration 17