SlideShare a Scribd company logo
1 of 18
Download to read offline
©Yuki Saito, 07/03/2017
TRAINING ALGORITHM TO DECEIVE
ANTI-SPOOFING VERIFICATION
FOR DNN-BASED SPEECH SYNTHESIS
Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari
(The University of Tokyo)
ICASSP 2017 SP-L4.2
/17
 Issue: quality degradation in statistical parametric speech
synthesis due to over-smoothing of the speech params.
 Countermeasures: reproducing natural statistics
– 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.]
– Histogram[Ohtani et al., 2012.]
 Proposed: training algorithm to deceive an Anti-Spoofing
Verification (ASV) for DNN-based speech synthesis
– Tries to deceive the ASV which distinguishes natural / synthetic speech.
– Compensates distribution difference betw. natural / synthetic speech.
 Results:
– Improves the synthetic speech quality.
– Works comparably robustly against its hyper-parameter setting.
1
Outline of This Talk
/17
Conventional Training Algorithm:
Minimum Generation Error (MGE) Training
2
Generation
error
𝐿G 𝒄, ො𝒄
Linguistic
feats.
[Wu et al., 2016.]
Natural
speech
params.
𝐿G 𝒄, ො𝒄 =
1
𝑇
ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize
𝒄
ML-based
parameter
generation
Generated
speech
params.ො𝒄
Acoustic models
⋯
⋯
⋯
Frame
𝑡 = 1
Static-dynamic
mean vectors
Frame
𝑡 = 𝑇
/173
Issue of MGE Training:
Over-smoothing of Generated Speech Parameters
Natural MGE
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
These distributions are significantly different...
(GV [Toda et al., 2007.] explicitly compensates the 2nd moment.)
Narrow
/174
Proposed algorithm:
Training Algorithm to Deceive
Anti-Spoofing Verification (ASV)
/17
Anti-Spoofing Verification (ASV):
Discriminator to Prevent Spoofing Attacks w/ Speech
5
[Wu et al., 2016.] [Chen et al., 2015.]
𝐿D,1 𝒄 𝐿D,0 ො𝒄
𝐿D 𝒄, ො𝒄 = → Minimize−
1
𝑇
෍
𝑡=1
𝑇
log 𝐷 𝒄 𝑡 −
1
𝑇
෍
𝑡=1
𝑇
log 1 − 𝐷 ො𝒄 𝑡
ො𝒄
Cross entropy
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Generated
speech params.
𝒄Natural
speech params.
Feature
function
𝝓 ⋅
Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅
or
Loss to recognize
generated speech as generated
Loss to recognize
natural speech as natural
/17
Training Algorithm to Deceive ASV
6
𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D
𝐸 𝐿G
𝐸 𝐿D
𝐿D,1 ො𝒄 → Minimize
𝐿G 𝒄, ො𝒄
Linguistic
feats.
Natural
speech params. 𝒄
ML-based
parameter
generation
Generated
speech params.ො𝒄
Acoustic models
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
ASV 𝐷 ⋅
Loss to recognize
generated speech as natural
𝜔D: weight, 𝐸𝐿G
, 𝐸𝐿D
: expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄
Static-dynamic
mean vectors
/17
 ① Update the acoustic models
 ② Update the ASV
Iterative Optimization of Acoustic models and ASV
7
By iterating ① and ②, we construct the final acoustic models!
Fixed
Fixed
𝐿G 𝒄, ො𝒄
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D,1 ො𝒄
1: natural
Feature
function
𝝓 ⋅
Natural
𝒄
ML-based
parameter
generation
Generated
ො𝒄
⋯
⋯
⋯
𝐿D 𝒄, ො𝒄
1: natural
0: generated
Feature
function
𝝓 ⋅
or
/17
 Compensations of speech feats. through the feature function:
– Automatically-derived feats. such as auto-encoded feats.
– Conventional analytically-derived feats. such as GV
 Loss function for training the acoustic models:
– Combination of MGE and adversarial training [Goodfellow et al., 2014.]
 The effect of the adversarial training:
– Minimizes the Jensen-Shannon divergence betw. the distributions of
the natural data / generated data.
8
Discussions of Proposed Algorithm
/179
Distributions of Speech Parameters
Our algorithm alleviates the over-smoothing effect!
21st mel-cepstral coefficient
23rdmel-cepstral
coefficient
Natural MGE Proposed
Narrow
Wide as
natural speech
/17
 Global Variance (GV): [Toda et al., 2007.]
– 2nd moment of the parameter distribution
10
Compensation of Global Variance
Feature index
0 5 10 15 20
10-3
10-1
101
Globalvariance
Proposed
Natural
MGE
10-2
100
10-4
GV is NOT used for training, but compensated by the ASV!
/17
 Maximal Information Coefficient (MIC): [Reshef et al., 2011.]
– Values to quantify a nonlinear correlation b/w two variables
– Natural speech params. tend to have weak correlation [Ijima et al., 2016.]
11
Additional Effect:
Alleviation of Unnaturally Strong Correlation
Natural MGE
0 6 12 18 24
0.0
0.2
0.4
0.6
0.8
1.0
Strong
Weak
Proposed
0 6 12 18 24 0 6 12 18 24
Proposed algorithm not only compensates the GV,
but also makes the correlations among speech params. natural!
/1712
Experimental Evaluations
/17
Experimental Conditions
13
Dataset
ATR Japanese speech database
(phonetic balanced 503 sentences)
Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling)
Linguistic feats.
274-dimensional vector
(phoneme, accent type, frame position, etc...)
Speech params.
Mel-cepstral coefficients (0th-through-24th),
𝐹0, 5-band aperiodicity
Prediction params.
Mel-cepstral coefficients
(the others were NOT predicted)
Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01)
Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear)
ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
/17
Initialization, Training, and Objective Evaluation
14
 Initialization:
– Acoustic models: conventional MGE training
– ASV: distinguish natural / generated speech after the MGE training
 Training:
– Acoustic models: update with the proposed algorithm
– ASV: distinguish natural / generated speech after updating the acoustic
models
 Objective evaluation:
– Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate
Spoofing rate =
# of the spoofing synthetic speech params.
Total # of the synthetic speech params.
We calculated these values w/ various 𝜔D.
/17
Results of Objective Evaluations
15
Generation loss Spoofing rate
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
0.45
0.50
0.55
0.60
0.65
0.70
0.75
1.0
0.8
0.6
0.4
0.2
0.0
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Weight 𝜔D
Got
worse when 𝜔D > 0.3,
spoofing rate > 99%
Got
better
Our algorithm makes the generation loss worse
but
can train the acoustic models to deceive the ASV!
/17
Results of Subjective Evaluations
in Terms of Speech Quality
16
Proposed
𝜔D = 1.0
Proposed
𝜔D = 0.3
MGE
𝜔D = 0.0
Preference score (w/ 8 listeners)
0.0 0.2 0.4 0.6 0.8 1.0
Got
better
NO
significant
difference
Our algorithm improves the synthetic speech quality
and
works comparably robustly against its hyper-parameter setting!
Error bars denote 95% confidence intervals.
Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
/17
Conclusion
 Purpose:
– Improving the speech quality of statistical parametric speech synthesis
 Proposed:
– Training algorithm to deceive an ASV
• Compensates the difference b/w distributions of natural /
generated speech params. using adversarial training
 Results:
– Improved the speech quality compared to conventional training
– Worked comparably robustly against its hyper-parameter setting
 Future work:
– Devising temporal- and linguistic-dependent ASV
– Extending our algorithm to generate 𝐹0 and duration
17

More Related Content

What's hot

One Class SVMを用いた異常値検知
One Class SVMを用いた異常値検知One Class SVMを用いた異常値検知
One Class SVMを用いた異常値検知Yuto Mori
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度Masa Kato
 
統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展Shinnosuke Takamichi
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018佑 甲野
 
End-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークEnd-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークNU_I_TODALAB
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタToshihisa Tanaka
 
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてMaximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてYusuke Nakata
 
今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシンShinya Shimizu
 
敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)cvpaper. challenge
 
ICML 2020 最適輸送まとめ
ICML 2020 最適輸送まとめICML 2020 最適輸送まとめ
ICML 2020 最適輸送まとめohken
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANsDeep Learning JP
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
 
全体セミナー20170629
全体セミナー20170629全体セミナー20170629
全体セミナー20170629Jiro Nishitoba
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)Takuma Yagi
 
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient DescentDeep Learning JP
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリングmlm_kansai
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)Masahiro Suzuki
 

What's hot (20)

One Class SVMを用いた異常値検知
One Class SVMを用いた異常値検知One Class SVMを用いた異常値検知
One Class SVMを用いた異常値検知
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 
統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展統計的音声合成変換と近年の発展
統計的音声合成変換と近年の発展
 
Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018Reinforcement Learning @ NeurIPS2018
Reinforcement Learning @ NeurIPS2018
 
Hessian free
Hessian freeHessian free
Hessian free
 
End-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head DecoderネットワークEnd-to-End音声認識ためのMulti-Head Decoderネットワーク
End-to-End音声認識ためのMulti-Head Decoderネットワーク
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタ
 
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系についてMaximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
Maximum Entropy IRL(最大エントロピー逆強化学習)とその発展系について
 
今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン今さら聞けないカーネル法とサポートベクターマシン
今さら聞けないカーネル法とサポートベクターマシン
 
敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)敵対的生成ネットワーク(GAN)
敵対的生成ネットワーク(GAN)
 
ICML 2020 最適輸送まとめ
ICML 2020 最適輸送まとめICML 2020 最適輸送まとめ
ICML 2020 最適輸送まとめ
 
[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs[DL輪読会]逆強化学習とGANs
[DL輪読会]逆強化学習とGANs
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
全体セミナー20170629
全体セミナー20170629全体セミナー20170629
全体セミナー20170629
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
 
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
SSII2019OS: 深層学習にかかる時間を短くしてみませんか? ~分散学習の勧め~
 
ILRMA 20170227 danwakai
ILRMA 20170227 danwakaiILRMA 20170227 danwakai
ILRMA 20170227 danwakai
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 

Viewers also liked

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asjYuki Saito
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"Shinnosuke Takamichi
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputShinnosuke Takamichi
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用NU_I_TODALAB
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”Shinnosuke Takamichi
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]Shinnosuke Takamichi
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応Shinnosuke Takamichi
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”Shinnosuke Takamichi
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)Shinnosuke Takamichi
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Shinnosuke Takamichi
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習Shinnosuke Takamichi
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアルShunsuke Ono
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化Shunsuke Ono
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Shinnosuke Takamichi
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)Daichi Kitamura
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例Yahoo!デベロッパーネットワーク
 

Viewers also liked (18)

miyoshi2017asj
miyoshi2017asjmiyoshi2017asj
miyoshi2017asj
 
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
日本音響学会2017秋 ビギナーズセミナー "深層学習を深く学習するための基礎"
 
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech InputProsody-Controllable HMM-Based Speech Synthesis Using Speech Input
Prosody-Controllable HMM-Based Speech Synthesis Using Speech Input
 
音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用音声の声質を変換する技術とその応用
音声の声質を変換する技術とその応用
 
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
日本音響学会2017秋 ”Moment-matching networkに基づく一期一会音声合成における発話間変動の評価”
 
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
ICASSP2017読み会 (Deep Learning III) [電通大 中鹿先生]
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
GMMに基づく固有声変換のための変調スペクトル制約付きトラジェクトリ学習・適応
 
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
日本音響学会2017秋 ”クラウドソーシングを利用した対訳方言音声コーパスの構築”
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)ICASSP2017読み会 (acoustic modeling and adaptation)
ICASSP2017読み会 (acoustic modeling and adaptation)
 
Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)Ph.D defence (Shinnosuke Takamichi)
Ph.D defence (Shinnosuke Takamichi)
 
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
雑音環境下音声を用いた音声合成のための雑音生成モデルの敵対的学習
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化信号処理・画像処理における凸最適化
信号処理・画像処理における凸最適化
 
Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討Moment matching networkを用いた音声パラメータのランダム生成の検討
Moment matching networkを用いた音声パラメータのランダム生成の検討
 
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
ICASSP2017読み会(関東編)・AASP_L3(北村担当分)
 
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例ヤフー音声認識サービスでのディープラーニングとGPU利用事例
ヤフー音声認識サービスでのディープラーニングとGPU利用事例
 

Similar to Saito2017icassp

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfYuki Saito
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdfrahul mishra
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...IJECEIAES
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpomosi2005
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmtJAEMINJEONG5
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion RecognitionSeoul National University
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.pptbutest
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accentsipij
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...Lifeng (Aaron) Han
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...ssuser849b73
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET Journal
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Reviewijiert bestjournal
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_Dia Abdulkerim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 

Similar to Saito2017icassp (20)

nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
silent sound technology pdf
silent sound technology pdfsilent sound technology pdf
silent sound technology pdf
 
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
Towards an Optimal Speaker Modeling in Speaker Verification Systems using Per...
 
Une18apsipa
Une18apsipaUne18apsipa
Une18apsipa
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
2021 04-04-google nmt
2021 04-04-google nmt2021 04-04-google nmt
2021 04-04-google nmt
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Moore_slides.ppt
Moore_slides.pptMoore_slides.ppt
Moore_slides.ppt
 
Phonetic distance based accent
Phonetic distance based accentPhonetic distance based accent
Phonetic distance based accent
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
ACL-WMT13 poster.Quality Estimation for Machine Translation Using the Joint M...
 
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech E...
 
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
IRJET- Designing and Creating Punjabi Speech Synthesis System using Hidden Ma...
 
LPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A ReviewLPC Models and Different Speech Enhancement Techniques- A Review
LPC Models and Different Speech Enhancement Techniques- A Review
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 

More from Yuki Saito

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdfYuki Saito
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告Yuki Saito
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_ascYuki Saito
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUSYuki Saito
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversionYuki Saito
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentationYuki Saito
 
Nakai22sp03 presentation
Nakai22sp03 presentationNakai22sp03 presentation
Nakai22sp03 presentationYuki Saito
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)Yuki Saito
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn MeetingYuki Saito
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 readingYuki Saito
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumnYuki Saito
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020Yuki Saito
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_publishedYuki Saito
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNAYuki Saito
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationYuki Saito
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会Yuki Saito
 

More from Yuki Saito (20)

hirai23slp03.pdf
hirai23slp03.pdfhirai23slp03.pdf
hirai23slp03.pdf
 
Interspeech2022 参加報告
Interspeech2022 参加報告Interspeech2022 参加報告
Interspeech2022 参加報告
 
fujii22apsipa_asc
fujii22apsipa_ascfujii22apsipa_asc
fujii22apsipa_asc
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
Neural text-to-speech and voice conversion
Neural text-to-speech and voice conversionNeural text-to-speech and voice conversion
Neural text-to-speech and voice conversion
 
Nishimura22slp03 presentation
Nishimura22slp03 presentationNishimura22slp03 presentation
Nishimura22slp03 presentation
 
Nakai22sp03 presentation
Nakai22sp03 presentationNakai22sp03 presentation
Nakai22sp03 presentation
 
GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)GAN-based statistical speech synthesis (in Japanese)
GAN-based statistical speech synthesis (in Japanese)
 
Saito21asj Autumn Meeting
Saito21asj Autumn MeetingSaito21asj Autumn Meeting
Saito21asj Autumn Meeting
 
Interspeech2020 reading
Interspeech2020 readingInterspeech2020 reading
Interspeech2020 reading
 
Saito20asj_autumn
Saito20asj_autumnSaito20asj_autumn
Saito20asj_autumn
 
ICASSP読み会2020
ICASSP読み会2020ICASSP読み会2020
ICASSP読み会2020
 
Saito20asj s slide_published
Saito20asj s slide_publishedSaito20asj s slide_published
Saito20asj s slide_published
 
Saito19asjAutumn_DeNA
Saito19asjAutumn_DeNASaito19asjAutumn_DeNA
Saito19asjAutumn_DeNA
 
Deep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generationDeep learning for acoustic modeling in parametric speech generation
Deep learning for acoustic modeling in parametric speech generation
 
Saito19asj_s
Saito19asj_sSaito19asj_s
Saito19asj_s
 
Saito18sp03
Saito18sp03Saito18sp03
Saito18sp03
 
Saito18asj_s
Saito18asj_sSaito18asj_s
Saito18asj_s
 
Saito17asjA
Saito17asjASaito17asjA
Saito17asjA
 
釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会釧路高専情報工学科向け進学説明会
釧路高専情報工学科向け進学説明会
 

Recently uploaded

Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESfuthumetsaneliswa
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxthusosetemere
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptxLions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptxlionnarsimharajumjf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20rejz122017
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathphntsoaki
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNtntlai16
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 

Recently uploaded (20)

Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptxLions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20
 
History of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth deathHistory of Morena Moshoeshoe birth death
History of Morena Moshoeshoe birth death
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 

Saito2017icassp

  • 1. ©Yuki Saito, 07/03/2017 TRAINING ALGORITHM TO DECEIVE ANTI-SPOOFING VERIFICATION FOR DNN-BASED SPEECH SYNTHESIS Yuki Saito, Shinnosuke Takamichi, and Hiroshi Saruwatari (The University of Tokyo) ICASSP 2017 SP-L4.2
  • 2. /17  Issue: quality degradation in statistical parametric speech synthesis due to over-smoothing of the speech params.  Countermeasures: reproducing natural statistics – 2nd moment (a.k.a. Global Variance: GV) [Toda et al., 2007.] – Histogram[Ohtani et al., 2012.]  Proposed: training algorithm to deceive an Anti-Spoofing Verification (ASV) for DNN-based speech synthesis – Tries to deceive the ASV which distinguishes natural / synthetic speech. – Compensates distribution difference betw. natural / synthetic speech.  Results: – Improves the synthetic speech quality. – Works comparably robustly against its hyper-parameter setting. 1 Outline of This Talk
  • 3. /17 Conventional Training Algorithm: Minimum Generation Error (MGE) Training 2 Generation error 𝐿G 𝒄, ො𝒄 Linguistic feats. [Wu et al., 2016.] Natural speech params. 𝐿G 𝒄, ො𝒄 = 1 𝑇 ො𝒄 − 𝒄 ⊤ ො𝒄 − 𝒄 → Minimize 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ Frame 𝑡 = 1 Static-dynamic mean vectors Frame 𝑡 = 𝑇
  • 4. /173 Issue of MGE Training: Over-smoothing of Generated Speech Parameters Natural MGE 21st mel-cepstral coefficient 23rdmel-cepstral coefficient These distributions are significantly different... (GV [Toda et al., 2007.] explicitly compensates the 2nd moment.) Narrow
  • 5. /174 Proposed algorithm: Training Algorithm to Deceive Anti-Spoofing Verification (ASV)
  • 6. /17 Anti-Spoofing Verification (ASV): Discriminator to Prevent Spoofing Attacks w/ Speech 5 [Wu et al., 2016.] [Chen et al., 2015.] 𝐿D,1 𝒄 𝐿D,0 ො𝒄 𝐿D 𝒄, ො𝒄 = → Minimize− 1 𝑇 ෍ 𝑡=1 𝑇 log 𝐷 𝒄 𝑡 − 1 𝑇 ෍ 𝑡=1 𝑇 log 1 − 𝐷 ො𝒄 𝑡 ො𝒄 Cross entropy 𝐿D 𝒄, ො𝒄 1: natural 0: generated Generated speech params. 𝒄Natural speech params. Feature function 𝝓 ⋅ Here, 𝝓 𝒄 𝑡 = 𝒄 𝑡 ASV 𝐷 ⋅ or Loss to recognize generated speech as generated Loss to recognize natural speech as natural
  • 7. /17 Training Algorithm to Deceive ASV 6 𝐿 𝒄, ො𝒄 = 𝐿G 𝒄, ො𝒄 + 𝜔D 𝐸 𝐿G 𝐸 𝐿D 𝐿D,1 ො𝒄 → Minimize 𝐿G 𝒄, ො𝒄 Linguistic feats. Natural speech params. 𝒄 ML-based parameter generation Generated speech params.ො𝒄 Acoustic models ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ ASV 𝐷 ⋅ Loss to recognize generated speech as natural 𝜔D: weight, 𝐸𝐿G , 𝐸𝐿D : expectation values of 𝐿G 𝒄, ො𝒄 , 𝐿D,1 ො𝒄 Static-dynamic mean vectors
  • 8. /17  ① Update the acoustic models  ② Update the ASV Iterative Optimization of Acoustic models and ASV 7 By iterating ① and ②, we construct the final acoustic models! Fixed Fixed 𝐿G 𝒄, ො𝒄 Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D,1 ො𝒄 1: natural Feature function 𝝓 ⋅ Natural 𝒄 ML-based parameter generation Generated ො𝒄 ⋯ ⋯ ⋯ 𝐿D 𝒄, ො𝒄 1: natural 0: generated Feature function 𝝓 ⋅ or
  • 9. /17  Compensations of speech feats. through the feature function: – Automatically-derived feats. such as auto-encoded feats. – Conventional analytically-derived feats. such as GV  Loss function for training the acoustic models: – Combination of MGE and adversarial training [Goodfellow et al., 2014.]  The effect of the adversarial training: – Minimizes the Jensen-Shannon divergence betw. the distributions of the natural data / generated data. 8 Discussions of Proposed Algorithm
  • 10. /179 Distributions of Speech Parameters Our algorithm alleviates the over-smoothing effect! 21st mel-cepstral coefficient 23rdmel-cepstral coefficient Natural MGE Proposed Narrow Wide as natural speech
  • 11. /17  Global Variance (GV): [Toda et al., 2007.] – 2nd moment of the parameter distribution 10 Compensation of Global Variance Feature index 0 5 10 15 20 10-3 10-1 101 Globalvariance Proposed Natural MGE 10-2 100 10-4 GV is NOT used for training, but compensated by the ASV!
  • 12. /17  Maximal Information Coefficient (MIC): [Reshef et al., 2011.] – Values to quantify a nonlinear correlation b/w two variables – Natural speech params. tend to have weak correlation [Ijima et al., 2016.] 11 Additional Effect: Alleviation of Unnaturally Strong Correlation Natural MGE 0 6 12 18 24 0.0 0.2 0.4 0.6 0.8 1.0 Strong Weak Proposed 0 6 12 18 24 0 6 12 18 24 Proposed algorithm not only compensates the GV, but also makes the correlations among speech params. natural!
  • 14. /17 Experimental Conditions 13 Dataset ATR Japanese speech database (phonetic balanced 503 sentences) Train / evaluate data 450 sentences / 53 sentences (16 kHz sampling) Linguistic feats. 274-dimensional vector (phoneme, accent type, frame position, etc...) Speech params. Mel-cepstral coefficients (0th-through-24th), 𝐹0, 5-band aperiodicity Prediction params. Mel-cepstral coefficients (the others were NOT predicted) Optimization algorithm AdaGrad [Duchi et al., 2011.] (learning rate: 0.01) Acoustic models Feed-Forward 274 – 3x400 (ReLU) – 75 (linear) ASV Feed-Forward 25 – 2x200 (ReLU) – 1 (sigmoid)
  • 15. /17 Initialization, Training, and Objective Evaluation 14  Initialization: – Acoustic models: conventional MGE training – ASV: distinguish natural / generated speech after the MGE training  Training: – Acoustic models: update with the proposed algorithm – ASV: distinguish natural / generated speech after updating the acoustic models  Objective evaluation: – Generation loss 𝐿G 𝒄, ො𝒄 and spoofing rate Spoofing rate = # of the spoofing synthetic speech params. Total # of the synthetic speech params. We calculated these values w/ various 𝜔D.
  • 16. /17 Results of Objective Evaluations 15 Generation loss Spoofing rate 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D 0.45 0.50 0.55 0.60 0.65 0.70 0.75 1.0 0.8 0.6 0.4 0.2 0.0 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Weight 𝜔D Got worse when 𝜔D > 0.3, spoofing rate > 99% Got better Our algorithm makes the generation loss worse but can train the acoustic models to deceive the ASV!
  • 17. /17 Results of Subjective Evaluations in Terms of Speech Quality 16 Proposed 𝜔D = 1.0 Proposed 𝜔D = 0.3 MGE 𝜔D = 0.0 Preference score (w/ 8 listeners) 0.0 0.2 0.4 0.6 0.8 1.0 Got better NO significant difference Our algorithm improves the synthetic speech quality and works comparably robustly against its hyper-parameter setting! Error bars denote 95% confidence intervals. Speech samples: http://sython.org/demo/icassp2017advtts/demo.html
  • 18. /17 Conclusion  Purpose: – Improving the speech quality of statistical parametric speech synthesis  Proposed: – Training algorithm to deceive an ASV • Compensates the difference b/w distributions of natural / generated speech params. using adversarial training  Results: – Improved the speech quality compared to conventional training – Worked comparably robustly against its hyper-parameter setting  Future work: – Devising temporal- and linguistic-dependent ASV – Extending our algorithm to generate 𝐹0 and duration 17