SlideShare a Scribd company logo
1 of 18
Tacotron介绍
何云超
yunchaohe@gmail.com
语音合成
• 目标: 文字转语音
• 经典方法:
• 流程:前端提取文字特征、duration model、声学模型、vocoder
• 每个模块都需专业知识、错误叠加
• 端到端
• 输入:<文本、语音> 对
• 减少提特征过程
• 自定义条件容易加入,例如不同说话人、不同情感等
• 新数据自适应性强
• 单一模型比多阶段模型更健壮
端到端做TTS的挑战
• TTS为解压过程,ASR为压缩过程
• 相同的文本,对于不同的发音风格,有不同发音
• TTS输出是连续的,MT, ASR输出是离散
• 输出往往比输入长得多,e.g. 输入5个字,输出1s音频对应16K采
样点
端到端
• 原始输入 -> 理想输出
• E.g 面粉、水、糖 -> 面包
• 文字 -> 语音(TTS),源语言 -> 目标语言 (MT),语音 -> 文字 (ASR),QA
• 方法:sequence to sequence model
seq2seq主要构成
• 编码方式
• 解码方式
• 编码信息使用方式
红色:输入
蓝色:输出
绿色:隐含状态
多个:文本序列、语音序列
一个:类别
编码阶段
• 作用:对输入进行表示
• 如何表示?
• CNN -> 看图说话
• RNN -> 机器翻译
• DNN
• 手工等
• 表示结果?
• 单一向量,只保留总体记忆
• 多个向量,保留中间记忆
单一向量
多个向量
解码阶段
• 作用:依据编码阶段产生的向量(可能多个),产生输出
• 问题:
• 如何用编码向量
• 如何产生输出
• 静态或动态的使用编码向量
• 静态:每次所用编码向量不变
• 动态:每次会发生变化(注意力机制)
• 用RNN、CNN或NN产生输出
Examples
seq2seq for MT
Encoder Decoder Model
编码:将输入表示为一个向量
解码:静态使用
次数:单次使用(理论上可以)
缺点:1) 单一一个向量可能不足以描述
完整原始输入 2) 原始信息损失
编码:将输入表示为一个向量
解码:静态使用
次数:多次使用(重复利用)
缺点:在产生输出时并不是都
依赖于这一全局向量,可能只
依赖局部信息
Examples
编码:将输入表示为多个向量
解码:动态使用
次数:多次使用
优点:通过调整注意力权值,
可以刻画当前输出和哪一个输
入更相关
编码:将输入表示为一个向量
解码:静态使用
次数:多次使用(重复利用) 同前
More Detail
编码:将输入表示为多个向量
解码:动态使用
次数:多次使用
优点:通过调整注意力权值,
可以刻画当前输出和哪一个输
入更相关
如何产生注意力
权值?
attention model
C: 场景(查询)向量
(context vector)
Yi: 即ht,编码器产生
的输入表示
Z: attention加权输出
Encoder-Decoder
RNN Encoder-Decoder With attention
• 输入:
• RNN cell:
• 编码向量:
• 通常:
• Decoder:
• 通用:
• RNN:
Tacotron Model
• 输入输出:character index -> 80-band mel-
scale spectrogram
• 编码
• 目标:文字转矩阵
• 模型:Pre-net (NN)、CBHG (CNN)
• 解码
• 目标:产生输出
• 模型:Pre-Net、RNN
• 注意力
• 目标:文字矩阵转向量(依据context)
Tacotron Model 简化
Encoder
Attention
Models
Decoder
Character
Embeddings
Context Vector
from hidden state
New Context
Vector
Previous Output
Output
Encoder
• Pre-net + HBCG
• Pre-net: 两个全连接层 [N, T, C] ->
[N, T, C] -> [N, T, C/2]
• HGCG: CNN bank + max pooling +
CNN + Highway Nets + Bi-GRU
• Highway Nets:
• 𝑦 = 𝑔 ∗ 𝑅𝑒𝐿𝑈 𝑊𝑥 + 𝑏 +
1 − 𝑔 ∗ 𝑥
• 𝑔 = 𝜎( 𝑤𝑥 + 𝑏 )
Decoder
Attention
Models
Decoder
Context Vector
from hidden state
New Context
Vector
Previous Output
Output
Attention RNN
2 GRUs layers with residual
Dropout in Pre-net
Target: 80-band mel-scale spectrogram
论文参数
论文实验比较
我们的结果
• http://git.n.xiaomi.com/heyunchao/tacotron/tree/4e16daac48988bf
7bf349cdb4b653e6032edd935/samples

More Related Content

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Tacotron