SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
End-To-End Speech Recognition
with Recurrent Neural Networks
José A. R. Fonollosa
Universitat Politècnica de Catalunya
Barcelona, January 26, 2017
Deep Learning for Speech and Language
From speech processing to
machine learning
2
Towards end-to-end
RNN Speech Recognition?
• Architectures
– GMM-HMM: 30 years of feature engineering
– DNN-GMM-HMM: Trained features
– DNN-HMM: TDNN, LSTM, RNN, MS
– DNN for language modeling (RNN)
– End-to-end DNN?
• Examples
– Alex Graves (Google)
– Deep Speech (Baidu)
3
Recognition system
FEATURES RECOGNIZ
ER
DECISIO
N
Lexicon
Acoustic
Models
Languag
e
models
TASK
INFO
Featur
e
vector
N-bes
t
Hip.
RECOGNIZE
D
SENTENCE
microphon
e
x = x1
... x|x|
w = w1
... w|w|
GMM-HMM
5
N-GRAM
HMM
GMM (Training: ML, MMI, MPE, MWE, SAT, ..)
Feature Transformation (Derivative, LDA, MLLT, fMLLR, ..)
Perceptual Feature Extraction (MFCC, PLP, FF, VTLN,GammaTone, ..)
Acoustic
Model
Phonetic
inventory
Pronunciation
Lexicon
Language
Model
DNN-GMM-HMM
6
N-GRAM
HMM
GMM
Feature Transformation (LDA, MLLT, fMLLR)
DNN (Tandem, Bottleneck, DNN-Derived)
Feature Extraction (MFCC, PLP, FF)
Acoustic
Model
Phonetic
inventory
Pronunciation
Lexicon
Language
Model
Tandem
• MLP outputs as input to GMM
7
Input layer Output
layer
Hidden layers
Bottleneck Features
• Use one narrow hidden layer. Supervised or
unsupervised training (autoencoder)
8
Input
layer
Output
layer
Hidden layers
DNN-Derived Features
• Zhi-Jie Yan, Qiang Huo, Jian Xu: A scalable approach to using DNN-derived features in
GMM-HMM based acoustic modeling for LVCSR. INTERSPEECH 2013: 104-108
9
Input layer Output layer
Hidden layers
DNN-GMM-HMM
1
0
N-GRAM
HMM
GMM
Feature Transformation (LDA, MLLT, fMLLR, …)
DNN (Tandem, Bottleneck, DNN-Derived)
Language
Model
Acoustic
Model
Phonetic
inventory
Pronunciation
Lexicon
DNN-HMM
1
1
N-GRAM
HMM
DNN
Language
Model
Acoustic
Model
Phonetic
inventory
Pronunciation
Lexicon
DNN-HMM+RNNLM
1
2
(N-GRAM +) RNN
HMM
DNN
Language
Model
Acoustic
Model
Phonetic
inventory
Pronunciation
Lexicon
RNN-RNNLM
1
3
(N-GRAM +) RNN
RNN
Language
Model
Acoustic
Model
End-to-End RNN
• Alex Graves et al. (2006) “Connectionist temporal classification:
Labelling unsegmented sequence data with recurrent neural
networks” Proceedings of the International Conference on
Machine Learning, ICML
• Florian Eyben, Martin Wöllmer, Bjö̈rn Schuller, Alex Graves (2009)
“From Speech to Letters - Using a Novel Neural Network
Architecture for Grapheme Based ASR”, ASRU
• Alex Graves, Navdeep Jaitly, (Jun 2014) “Towards End-To-End
Speech Recognition with Recurrent Neural Networks”,
International Conference on Machine Learning, ICML
• Jan Chorowski et al. (Dec 2014) “End-to-end Continuous Speech
Recognition using Attention-based Recurrent NN: First Results”,
Deep Learning and Representation Learning Workshop: NIPS
14
End-to-End RNN
• Awni Hannun et al (Dec 2014), “Deep Speech: Scaling
up end-to-end speech recognition”, arXiv:1412.5567
[cs.CL]
• D. Bahdanau et al. (Dec 2014) “End-to-End
Attention-based Large Vocabulary Speech
Recognition”, arXiv:1508.04395 [cs.CL]
• Miao, Y., Gowayyed, M., and Metze, F. (2015). EESEN:
End-to-end speech recognition using deep RNN
models and WFST-based decoding. arXiv:1507.08240
[cs]
• Baidu Research – 34 authors- (Dec 2015), “Deep
Speech 2: End-to-end Speech Recognition in English
and Mandarin”, arXiv:1512.02595 [cs.CL]
15
End-to-End RNN
16
• No perceptual features (MFCC). No feature
transformation. No phonetic inventory. No transcription
dictionary. No HMM.
• The output of the RNN are characters including space,
apostrophe, (not CD phones)
• Connectionist Temporal Classification (No fixed aligment
speech/character)
• Data augmentation. 5,000 hours (9600 speakers) + noise =
100,000 hours. Optimizations: data parallelism, model
parallelism
• Good results in noisy conditions
Adam Coates
Bidirectional Recursive DNN
17
Unrolled
RNN
T H _ _ E … D _ O _ _G
Spectogram
Clipped ReLu
Accelerated
gradient method
GPU friendly
18
Deep Speech II (Baidu)
19
Language Model
English: Kneser-Ney smoothed 5-gram model with pruning.
Vocabulary: 400,000 words from 250 million lines of text
Language model with 850 million n-grams.
Mandarin: Kneser-Ney smoothed character level 5-gram model with pruning
Training data: 8 billion lines of text.
Language model with about 2 billion n-grams.
Maximize Q(y) = log(pctc(y|x)) + α log(plm(y)) + β word_count(y)
Connectionist Temporal Classification
20
Connectionist Temporal Classification
21
- The framewise network receives an error for misalignment
- The CTC network predicts the sequence of phonemes / characters (as
spikes separated by ‘blanks’)
- No force alignment (initial model) required for training.
Alex Graves 2006
GMM-HMM / DNN-HMM / RNN
22
Data Augmentation
23
This approach needs bigger models and bigger datasets.
Synthesis by superposition: reverberations, echoes, a large number of short
noise recordings from public video sources, jitter
Lombart Effect.
Results
2000 HUB5 (LDC2002S09)
24
System AM training data SWB CH
Vesely et al. (2013) SWB 12.6 24.1
Seide et al. (2014) SWB+Fisher+other 13.1 –
Hannun et al. (2014) SWB+Fisher 12.6 19.3
Zhou et al. (2014) SWB 14.2 –
Maas et al. (2014) SWB 14.3 26.0
Maas et al. (2014) SWB+Fisher 15.0 23.0
Soltau et al. (2014) SWB 10.4 19.1
Saon et al (2015) SWB+Fisher+CH 8.0 14.1
Hybrid(IBM) versus DS1(Baidu)
IBM 2015 Baidu 2014
Features VTL-PLP, MVN, LDA, STC, fMMLR,
i-Vector
80 log filter banks
Alignment GMM-HMM 300K Gaussians -
DNN DNN(5x2048) +
CNN(128x9x9+5x2048) + +RNN
32000 outputs
4RNN (5 x 2304)
28 outputs
DNN Training CE + MBR Discriminative Training (ST) CTC
HMM 32K states (DNN outputs)
pentaphone acoustic context
-
Language Model 37M 4-gram + model M (class based
exponential model) + 2 NNLM
4-gram (Transcripts)
25
DS1 versus DS2 (Baidu)
Deep Speech 1 (Baidu 2014) DS2 (Baidu 2015)
Features 80 log filter banks ?
Alignment - -
DNN 4RNN (5 x 2304)
28 outputs
9-layer, 7RNN, BatchNorm,
Conv. Layers. (Time/Freq)
DNN Training CTC CTC
HMM - -
Language Model 4-gram 5-gram
26
Results II (DS1)
27
Training: Baidu database + data augmentation
Test: new dataset of 200 recording in both clean and noisy settings (no details)
Comparison against commercial systems in production
Deep Speech 2 Versus DS1
28
Deep Speech 2 Versus DS1
29
DS2 Training data
30
DS2 Training data
31
System optimization
32
- Scalability and data-parallelism
- GPU implementation of CTC loss function
- Memory allocation
Deep Speech Demo
• http://www.ustream.tv/recorded/60113824/
highlight/631666
33
References
• Alex Graves et al. (2006) “Connectionist temporal classification: Labelling
unsegmented sequence data with recurrent neural networks” Proceedings
of the International Conference on Machine Learning, ICML
• Florian Eyben, Martin Wöllmer, Bjö̈rn Schuller, Alex Graves (2009) “From
Speech to Letters - Using a Novel Neural Network Architecture for
Grapheme Based ASR”, ASRU
• Alex Graves, Navdeep Jaitly, (Jun 2014) “Towards End-To-End Speech
Recognition with Recurrent Neural Networks”, International Conference
on Machine Learning, ICML
• Jan Chorowski et al. (Dec 2014) “End-to-end Continuous Speech
Recognition using Attention-based Recurrent NN: First Results”, Deep
Learning and Representation Learning Workshop: NIPS
• Awni Hannun et al (Dec 2014), “Deep Speech: Scaling up end-to-end
speech recognition”, arXiv:1412.5567 [cs.CL]
• George Saon et al. “The IBM 2015 English Conversational Telephone
Speech Recognition System”, arXiv:1505.05899 [cs.CL]
34

Weitere ähnliche Inhalte

Was ist angesagt?

An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningJulien SIMON
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Jeong-Gwan Lee
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applicationsAnas Arram, Ph.D
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning健程 杨
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 

Was ist angesagt? (20)

An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Attention is All You Need (Transformer)
Attention is All You Need (Transformer)Attention is All You Need (Transformer)
Attention is All You Need (Transformer)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Deep learning for real life applications
Deep learning for real life applicationsDeep learning for real life applications
Deep learning for real life applications
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Deep learning
Deep learningDeep learning
Deep learning
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 

Ähnlich wie End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learning for Speech and Language UPC 2017)

Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerYun-Nung (Vivian) Chen
 
Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...IAESIJAI
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxVishnuRajuV
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionSai Kiran Kadam
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentNVIDIA Taiwan
 
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2016 - ININ Submission to Zero Cost ASR TaskMediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2016 - ININ Submission to Zero Cost ASR Taskmultimediaeval
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022Kwanghee Choi
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationJEE HYUN PARK
 
Deep Learning - Speaker Recognition
Deep Learning - Speaker Recognition Deep Learning - Speaker Recognition
Deep Learning - Speaker Recognition Sai Kiran Kadam
 
[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesisChung-Il Kim
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesisNAVER Engineering
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 

Ähnlich wie End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learning for Speech and Language UPC 2017) (20)

Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue ManagerEnd-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager
 
Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...Deep convolutional neural networks-based features for Indonesian large vocabu...
Deep convolutional neural networks-based features for Indonesian large vocabu...
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2016 - ININ Submission to Zero Cost ASR TaskMediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
 
Trends of ICASSP 2022
Trends of ICASSP 2022Trends of ICASSP 2022
Trends of ICASSP 2022
 
neural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classificationneural based_context_representation_learning_for_dialog_act_classification
neural based_context_representation_learning_for_dialog_act_classification
 
Deep Learning - Speaker Recognition
Deep Learning - Speaker Recognition Deep Learning - Speaker Recognition
Deep Learning - Speaker Recognition
 
[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis[Chung il kim] 0829 thesis
[Chung il kim] 0829 thesis
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 

Mehr von Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 

Mehr von Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Kürzlich hochgeladen

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learning for Speech and Language UPC 2017)

  • 1. End-To-End Speech Recognition with Recurrent Neural Networks José A. R. Fonollosa Universitat Politècnica de Catalunya Barcelona, January 26, 2017 Deep Learning for Speech and Language
  • 2. From speech processing to machine learning 2
  • 3. Towards end-to-end RNN Speech Recognition? • Architectures – GMM-HMM: 30 years of feature engineering – DNN-GMM-HMM: Trained features – DNN-HMM: TDNN, LSTM, RNN, MS – DNN for language modeling (RNN) – End-to-end DNN? • Examples – Alex Graves (Google) – Deep Speech (Baidu) 3
  • 5. GMM-HMM 5 N-GRAM HMM GMM (Training: ML, MMI, MPE, MWE, SAT, ..) Feature Transformation (Derivative, LDA, MLLT, fMLLR, ..) Perceptual Feature Extraction (MFCC, PLP, FF, VTLN,GammaTone, ..) Acoustic Model Phonetic inventory Pronunciation Lexicon Language Model
  • 6. DNN-GMM-HMM 6 N-GRAM HMM GMM Feature Transformation (LDA, MLLT, fMLLR) DNN (Tandem, Bottleneck, DNN-Derived) Feature Extraction (MFCC, PLP, FF) Acoustic Model Phonetic inventory Pronunciation Lexicon Language Model
  • 7. Tandem • MLP outputs as input to GMM 7 Input layer Output layer Hidden layers
  • 8. Bottleneck Features • Use one narrow hidden layer. Supervised or unsupervised training (autoencoder) 8 Input layer Output layer Hidden layers
  • 9. DNN-Derived Features • Zhi-Jie Yan, Qiang Huo, Jian Xu: A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. INTERSPEECH 2013: 104-108 9 Input layer Output layer Hidden layers
  • 10. DNN-GMM-HMM 1 0 N-GRAM HMM GMM Feature Transformation (LDA, MLLT, fMLLR, …) DNN (Tandem, Bottleneck, DNN-Derived) Language Model Acoustic Model Phonetic inventory Pronunciation Lexicon
  • 14. End-to-End RNN • Alex Graves et al. (2006) “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks” Proceedings of the International Conference on Machine Learning, ICML • Florian Eyben, Martin Wöllmer, Bjö̈rn Schuller, Alex Graves (2009) “From Speech to Letters - Using a Novel Neural Network Architecture for Grapheme Based ASR”, ASRU • Alex Graves, Navdeep Jaitly, (Jun 2014) “Towards End-To-End Speech Recognition with Recurrent Neural Networks”, International Conference on Machine Learning, ICML • Jan Chorowski et al. (Dec 2014) “End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results”, Deep Learning and Representation Learning Workshop: NIPS 14
  • 15. End-to-End RNN • Awni Hannun et al (Dec 2014), “Deep Speech: Scaling up end-to-end speech recognition”, arXiv:1412.5567 [cs.CL] • D. Bahdanau et al. (Dec 2014) “End-to-End Attention-based Large Vocabulary Speech Recognition”, arXiv:1508.04395 [cs.CL] • Miao, Y., Gowayyed, M., and Metze, F. (2015). EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. arXiv:1507.08240 [cs] • Baidu Research – 34 authors- (Dec 2015), “Deep Speech 2: End-to-end Speech Recognition in English and Mandarin”, arXiv:1512.02595 [cs.CL] 15
  • 16. End-to-End RNN 16 • No perceptual features (MFCC). No feature transformation. No phonetic inventory. No transcription dictionary. No HMM. • The output of the RNN are characters including space, apostrophe, (not CD phones) • Connectionist Temporal Classification (No fixed aligment speech/character) • Data augmentation. 5,000 hours (9600 speakers) + noise = 100,000 hours. Optimizations: data parallelism, model parallelism • Good results in noisy conditions Adam Coates
  • 17. Bidirectional Recursive DNN 17 Unrolled RNN T H _ _ E … D _ O _ _G Spectogram Clipped ReLu Accelerated gradient method GPU friendly
  • 18. 18 Deep Speech II (Baidu)
  • 19. 19 Language Model English: Kneser-Ney smoothed 5-gram model with pruning. Vocabulary: 400,000 words from 250 million lines of text Language model with 850 million n-grams. Mandarin: Kneser-Ney smoothed character level 5-gram model with pruning Training data: 8 billion lines of text. Language model with about 2 billion n-grams. Maximize Q(y) = log(pctc(y|x)) + α log(plm(y)) + β word_count(y)
  • 21. Connectionist Temporal Classification 21 - The framewise network receives an error for misalignment - The CTC network predicts the sequence of phonemes / characters (as spikes separated by ‘blanks’) - No force alignment (initial model) required for training. Alex Graves 2006
  • 22. GMM-HMM / DNN-HMM / RNN 22
  • 23. Data Augmentation 23 This approach needs bigger models and bigger datasets. Synthesis by superposition: reverberations, echoes, a large number of short noise recordings from public video sources, jitter Lombart Effect.
  • 24. Results 2000 HUB5 (LDC2002S09) 24 System AM training data SWB CH Vesely et al. (2013) SWB 12.6 24.1 Seide et al. (2014) SWB+Fisher+other 13.1 – Hannun et al. (2014) SWB+Fisher 12.6 19.3 Zhou et al. (2014) SWB 14.2 – Maas et al. (2014) SWB 14.3 26.0 Maas et al. (2014) SWB+Fisher 15.0 23.0 Soltau et al. (2014) SWB 10.4 19.1 Saon et al (2015) SWB+Fisher+CH 8.0 14.1
  • 25. Hybrid(IBM) versus DS1(Baidu) IBM 2015 Baidu 2014 Features VTL-PLP, MVN, LDA, STC, fMMLR, i-Vector 80 log filter banks Alignment GMM-HMM 300K Gaussians - DNN DNN(5x2048) + CNN(128x9x9+5x2048) + +RNN 32000 outputs 4RNN (5 x 2304) 28 outputs DNN Training CE + MBR Discriminative Training (ST) CTC HMM 32K states (DNN outputs) pentaphone acoustic context - Language Model 37M 4-gram + model M (class based exponential model) + 2 NNLM 4-gram (Transcripts) 25
  • 26. DS1 versus DS2 (Baidu) Deep Speech 1 (Baidu 2014) DS2 (Baidu 2015) Features 80 log filter banks ? Alignment - - DNN 4RNN (5 x 2304) 28 outputs 9-layer, 7RNN, BatchNorm, Conv. Layers. (Time/Freq) DNN Training CTC CTC HMM - - Language Model 4-gram 5-gram 26
  • 27. Results II (DS1) 27 Training: Baidu database + data augmentation Test: new dataset of 200 recording in both clean and noisy settings (no details) Comparison against commercial systems in production
  • 28. Deep Speech 2 Versus DS1 28
  • 29. Deep Speech 2 Versus DS1 29
  • 32. System optimization 32 - Scalability and data-parallelism - GPU implementation of CTC loss function - Memory allocation
  • 33. Deep Speech Demo • http://www.ustream.tv/recorded/60113824/ highlight/631666 33
  • 34. References • Alex Graves et al. (2006) “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks” Proceedings of the International Conference on Machine Learning, ICML • Florian Eyben, Martin Wöllmer, Bjö̈rn Schuller, Alex Graves (2009) “From Speech to Letters - Using a Novel Neural Network Architecture for Grapheme Based ASR”, ASRU • Alex Graves, Navdeep Jaitly, (Jun 2014) “Towards End-To-End Speech Recognition with Recurrent Neural Networks”, International Conference on Machine Learning, ICML • Jan Chorowski et al. (Dec 2014) “End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results”, Deep Learning and Representation Learning Workshop: NIPS • Awni Hannun et al (Dec 2014), “Deep Speech: Scaling up end-to-end speech recognition”, arXiv:1412.5567 [cs.CL] • George Saon et al. “The IBM 2015 English Conversational Telephone Speech Recognition System”, arXiv:1505.05899 [cs.CL] 34