SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
ININ submission to MediaEval Zero Cost ASR task
Tejas Godambe, Naresh Kumar, Pavan Kumar,
Veera Raghavendra and Aravind Ganpathiraju
October 21, 2016
MediaEval Workshop, October 20-21, Hilversum, Netherlands
1 / 9
Introduction to Zero Cost ASR task
Motivation: To bridge the gap between “top speech labs and
companies” which can afford buying and collecting data for
development and research, and “other small players”.
Task: To build the best possible ASR in Vietnamese language
using limited public-domain data comprising diverse acoustic
conditions, and having imperfect transcripts.
More details of the task are in [Szoke and Anguera, 2016]
2 / 9
Data description
Official data from organizers
ELSA: Proprietary recordings of sentences read from a book
of Vietnamese quotes.
Forvo.com: Collection of short recordings downloaded from
forvo.com.
Rhinospike.com: Collection of both short and long
recordings downloaded from rhinospike.com.
“Surprise“ test data: Download of 35 Youtube videos
(broadcast news, presentations, talks, etc.).
Data from participants
Not used for training.
3 / 9
System Description
Kaldi toolkit [Povey et al., 2011] was used for system building.
Steps followed for building the final system:
1 Audio pre-processing: Long silences in training data were
truncated to 0.3 second.
2 Audio augmentation: Data was augmented with 0.9x and
1.1x speed perturbed versions of itself [Ko et al., 2015].
3 Use of pitch information: Pitch information was extracted
along with conventional MFCCs. [Ghahremani et al., 2014].
4 Estimation of robust parameters with less data: SGMM
acoustic model was used [Povey et al., 2010].
5 Use of more history: 5 gram language model (LM) was used.
6 Use of test data for training: Test data was decoded and
approximate transcripts were added to training data.
7 Hypothesis re-ranking with a different LM: Lattices were
generated and rescored using RNN LM [Mikolov et al., 2011].
8 Final decoding.
4 / 9
Results on dev-local data
Row Experiments WER (%) WERR (%)
1 Training the triphone model 37.0
2 Truncating silence in training data 27.4 37.0-27.4=9.6
3 Truncating silence in test data 50.3 27.4-50.3=-22.9
4 Using SGMM model 18.1 27.4-18.1=9.3
5 Using DNN model 23.5 18.1-23.5=-5.4
6 Using position independent phones 19.1 18.1-19.1=-1.0
7 Unsupervised adaptation 16.1 18.1-16.1=2.0
8 Audio augmentation-1 17.0 18.1-17.0=1.1
9 Audio augmentation-2 17.3 18.1-17.3=0.8
10 Using pitch information 16.9 18.1-16.9=1.2
11 Using 5 gram LM 16.1 18.1-16.1=2.0
12 Using 7 gram LM 16.6 18.1-6.6=1.5
13 Combined system 13.8
14 Rescoring lattices using RNN LM 13.5 13.8-13.5=0.3
15 ROVER [Fiscus, 1997] 13.5 13.5-13.5=0.0
5 / 9
Final results and discussion
Dev-local Dev Test
Our system did decent on data from ELSA and
rhinospike.com, but relatively poor on data from forvo.com
and Youtube.This warrants further investigation.
Immediate and complementary exploration areas include
ways to artificially increase size of data to train better ANNs,
exploring training of robust ANN models with less data.
6 / 9
References I
Fiscus, J. G. (1997).
A post-processing system to yield reduced word error rates: Recognizer output
voting error reduction (rover).
In Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997
IEEE Workshop on, pages 347–354. IEEE.
Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., and
Khudanpur, S. (2014).
A pitch extraction algorithm tuned for automatic speech recognition.
In 2014 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 2494–2498. IEEE.
Ko, T., Peddinti, V., Povey, D., and Khudanpur, S. (2015).
Audio augmentation for speech recognition.
In Proceedings of INTERSPEECH.
Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011).
Rnnlm-recurrent neural network language modeling toolkit.
In Proc. of the 2011 ASRU Workshop, pages 196–201.
7 / 9
References II
Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Goel,
N. K., Karafiát, M., Rastrow, A., Rose, R. C., et al. (2010).
Subspace gaussian mixture models for speech recognition.
In 2010 IEEE International Conference on Acoustics, Speech and Signal
Processing, pages 4330–4333. IEEE.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N.,
Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al. (2011).
The Kaldi speech recognition toolkit.
Szoke, I. and Anguera, X. (2016).
Zero cost speech recognition task at mediaeval 2016.
In Proc. of the 2016 MediaEval Workshop.
8 / 9
Thank you.
9 / 9

Weitere ähnliche Inhalte

Was ist angesagt?

ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016InVID Project
 
Context-based modeling of audio signals toward information retrieval
Context-based modeling of audio signals toward information retrievalContext-based modeling of audio signals toward information retrieval
Context-based modeling of audio signals toward information retrievalSamuel Kim
 
Sim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement LearningSim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement Learningatulshah16
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Dongmin Choi
 
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...SOYEON KIM
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 

Was ist angesagt? (20)

ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Context-based modeling of audio signals toward information retrieval
Context-based modeling of audio signals toward information retrievalContext-based modeling of audio signals toward information retrieval
Context-based modeling of audio signals toward information retrieval
 
Computer network lab 7
Computer network lab 7Computer network lab 7
Computer network lab 7
 
Computer network lab 2
Computer network lab 2Computer network lab 2
Computer network lab 2
 
Computer network lab 9
Computer network lab 9Computer network lab 9
Computer network lab 9
 
Computer Networks Lab
Computer Networks LabComputer Networks Lab
Computer Networks Lab
 
Computer network lab 3
Computer network lab 3Computer network lab 3
Computer network lab 3
 
Computer network lab 1
Computer network lab 1Computer network lab 1
Computer network lab 1
 
Sim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement LearningSim-to-Real Transfer in Deep Reinforcement Learning
Sim-to-Real Transfer in Deep Reinforcement Learning
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
 
Comuter network lab 6
Comuter network lab 6Comuter network lab 6
Comuter network lab 6
 
Computer network lab
Computer network labComputer network lab
Computer network lab
 
Computer network lab 5
Computer network lab 5Computer network lab 5
Computer network lab 5
 
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
 
Computer network lab 4
Computer network lab 4Computer network lab 4
Computer network lab 4
 
Computer network lab 8
Computer network lab 8Computer network lab 8
Computer network lab 8
 
Conputer network lab 10
Conputer network lab 10Conputer network lab 10
Conputer network lab 10
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Computer Network Lab
Computer Network LabComputer Network Lab
Computer Network Lab
 

Andere mochten auch

MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Taskmultimediaeval
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learnedmultimediaeval
 
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness TaskMediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness Taskmultimediaeval
 
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...multimediaeval
 
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images TaskMediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Taskmultimediaeval
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
 
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images TaskMediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Taskmultimediaeval
 
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...multimediaeval
 
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery ChallengeMediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challengemultimediaeval
 
MediaEval 2016 - Tag Propagation in Talking Face Graphs
MediaEval 2016 - Tag Propagation in Talking Face GraphsMediaEval 2016 - Tag Propagation in Talking Face Graphs
MediaEval 2016 - Tag Propagation in Talking Face Graphsmultimediaeval
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Taskmultimediaeval
 
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...multimediaeval
 
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...multimediaeval
 
MediaEval 2016 - Multimodal Person Discovery in TV Broadcast
MediaEval 2016 - Multimodal Person Discovery in TV BroadcastMediaEval 2016 - Multimodal Person Discovery in TV Broadcast
MediaEval 2016 - Multimodal Person Discovery in TV Broadcastmultimediaeval
 
MediaEval 2016 - Emotional Impact of Movies Task
MediaEval 2016 - Emotional Impact of Movies Task MediaEval 2016 - Emotional Impact of Movies Task
MediaEval 2016 - Emotional Impact of Movies Task multimediaeval
 

Andere mochten auch (15)

MediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience TaskMediaEval 2016 - Simula Team @ Context of Experience Task
MediaEval 2016 - Simula Team @ Context of Experience Task
 
MediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons LearnedMediaEval 2016 - Emotion in Music Task: Lessons Learned
MediaEval 2016 - Emotion in Music Task: Lessons Learned
 
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness TaskMediaEval 2016 - TUD-MMC Predicting media Interestingness Task
MediaEval 2016 - TUD-MMC Predicting media Interestingness Task
 
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
MediaEval 2016 - LAPI @ 2016 Retrieving Diverse Social Images Task: A Pseudo-...
 
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images TaskMediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task
MediaEval 2016 - UNED-UV @ Retrieving Diverse Social Images Task
 
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
 
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images TaskMediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
MediaEval 2016 - UPMC at MediaEval2016 Retrieving Diverse Social Images Task
 
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...
MediaEval 2016 - ETH-CVL: Textual-Visual Embeddings and Video2GIF for Video I...
 
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery ChallengeMediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
MediaEval 2016 - EUMSSI Team at the MediaEval Person Discovery Challenge
 
MediaEval 2016 - Tag Propagation in Talking Face Graphs
MediaEval 2016 - Tag Propagation in Talking Face GraphsMediaEval 2016 - Tag Propagation in Talking Face Graphs
MediaEval 2016 - Tag Propagation in Talking Face Graphs
 
MediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing TaskMediaEval 2016 - RECOD at Placing Task
MediaEval 2016 - RECOD at Placing Task
 
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
MediaEval 2016 - UVigo System for Multimodal Person Discovery in Broadcast TV...
 
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
MediaEval 2016 - Approaches to and Issues Arising from Answering Natural Lang...
 
MediaEval 2016 - Multimodal Person Discovery in TV Broadcast
MediaEval 2016 - Multimodal Person Discovery in TV BroadcastMediaEval 2016 - Multimodal Person Discovery in TV Broadcast
MediaEval 2016 - Multimodal Person Discovery in TV Broadcast
 
MediaEval 2016 - Emotional Impact of Movies Task
MediaEval 2016 - Emotional Impact of Movies Task MediaEval 2016 - Emotional Impact of Movies Task
MediaEval 2016 - Emotional Impact of Movies Task
 

Ähnlich wie MediaEval 2016 - ININ Submission to Zero Cost ASR Task

Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentNVIDIA Taiwan
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxVishnuRajuV
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voiceNsaroj kumar
 
Improving Indonesian multietnics speaker recognition using pitch shifting dat...
Improving Indonesian multietnics speaker recognition using pitch shifting dat...Improving Indonesian multietnics speaker recognition using pitch shifting dat...
Improving Indonesian multietnics speaker recognition using pitch shifting dat...IAESIJAI
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labIRJET Journal
 
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...journalBEEI
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionIRJET Journal
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTNathan Mathis
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGIRJET Journal
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Priyanka Reddy
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismIRJET Journal
 

Ähnlich wie MediaEval 2016 - ININ Submission to Zero Cost ASR Task (20)

Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
12EEE032- text 2 voice
12EEE032-  text 2 voice12EEE032-  text 2 voice
12EEE032- text 2 voice
 
Improving Indonesian multietnics speaker recognition using pitch shifting dat...
Improving Indonesian multietnics speaker recognition using pitch shifting dat...Improving Indonesian multietnics speaker recognition using pitch shifting dat...
Improving Indonesian multietnics speaker recognition using pitch shifting dat...
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...
Spoken language identification using i-vectors, x-vectors, PLDA and logistic ...
 
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech Recognition
 
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENTAMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
AMHARIC TEXT TO SPEECH SYNTHESIS FOR SYSTEM DEVELOPMENT
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
 
Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)Subjective comparison of_speech_enhancement_algori (1)
Subjective comparison of_speech_enhancement_algori (1)
 
Autotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualismAutotuned voice cloning enabling multilingualism
Autotuned voice cloning enabling multilingualism
 

Mehr von multimediaeval

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...multimediaeval
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...multimediaeval
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...multimediaeval
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Taskmultimediaeval
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...multimediaeval
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimatormultimediaeval
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...multimediaeval
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Imagesmultimediaeval
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matchingmultimediaeval
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...multimediaeval
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...multimediaeval
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...multimediaeval
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detectionmultimediaeval
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...multimediaeval
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attentionmultimediaeval
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...multimediaeval
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...multimediaeval
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...multimediaeval
 

Mehr von multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Kürzlich hochgeladen

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Kürzlich hochgeladen (20)

Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

MediaEval 2016 - ININ Submission to Zero Cost ASR Task

  • 1. ININ submission to MediaEval Zero Cost ASR task Tejas Godambe, Naresh Kumar, Pavan Kumar, Veera Raghavendra and Aravind Ganpathiraju October 21, 2016 MediaEval Workshop, October 20-21, Hilversum, Netherlands 1 / 9
  • 2. Introduction to Zero Cost ASR task Motivation: To bridge the gap between “top speech labs and companies” which can afford buying and collecting data for development and research, and “other small players”. Task: To build the best possible ASR in Vietnamese language using limited public-domain data comprising diverse acoustic conditions, and having imperfect transcripts. More details of the task are in [Szoke and Anguera, 2016] 2 / 9
  • 3. Data description Official data from organizers ELSA: Proprietary recordings of sentences read from a book of Vietnamese quotes. Forvo.com: Collection of short recordings downloaded from forvo.com. Rhinospike.com: Collection of both short and long recordings downloaded from rhinospike.com. “Surprise“ test data: Download of 35 Youtube videos (broadcast news, presentations, talks, etc.). Data from participants Not used for training. 3 / 9
  • 4. System Description Kaldi toolkit [Povey et al., 2011] was used for system building. Steps followed for building the final system: 1 Audio pre-processing: Long silences in training data were truncated to 0.3 second. 2 Audio augmentation: Data was augmented with 0.9x and 1.1x speed perturbed versions of itself [Ko et al., 2015]. 3 Use of pitch information: Pitch information was extracted along with conventional MFCCs. [Ghahremani et al., 2014]. 4 Estimation of robust parameters with less data: SGMM acoustic model was used [Povey et al., 2010]. 5 Use of more history: 5 gram language model (LM) was used. 6 Use of test data for training: Test data was decoded and approximate transcripts were added to training data. 7 Hypothesis re-ranking with a different LM: Lattices were generated and rescored using RNN LM [Mikolov et al., 2011]. 8 Final decoding. 4 / 9
  • 5. Results on dev-local data Row Experiments WER (%) WERR (%) 1 Training the triphone model 37.0 2 Truncating silence in training data 27.4 37.0-27.4=9.6 3 Truncating silence in test data 50.3 27.4-50.3=-22.9 4 Using SGMM model 18.1 27.4-18.1=9.3 5 Using DNN model 23.5 18.1-23.5=-5.4 6 Using position independent phones 19.1 18.1-19.1=-1.0 7 Unsupervised adaptation 16.1 18.1-16.1=2.0 8 Audio augmentation-1 17.0 18.1-17.0=1.1 9 Audio augmentation-2 17.3 18.1-17.3=0.8 10 Using pitch information 16.9 18.1-16.9=1.2 11 Using 5 gram LM 16.1 18.1-16.1=2.0 12 Using 7 gram LM 16.6 18.1-6.6=1.5 13 Combined system 13.8 14 Rescoring lattices using RNN LM 13.5 13.8-13.5=0.3 15 ROVER [Fiscus, 1997] 13.5 13.5-13.5=0.0 5 / 9
  • 6. Final results and discussion Dev-local Dev Test Our system did decent on data from ELSA and rhinospike.com, but relatively poor on data from forvo.com and Youtube.This warrants further investigation. Immediate and complementary exploration areas include ways to artificially increase size of data to train better ANNs, exploring training of robust ANN models with less data. 6 / 9
  • 7. References I Fiscus, J. G. (1997). A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (rover). In Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on, pages 347–354. IEEE. Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., and Khudanpur, S. (2014). A pitch extraction algorithm tuned for automatic speech recognition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2494–2498. IEEE. Ko, T., Peddinti, V., Povey, D., and Khudanpur, S. (2015). Audio augmentation for speech recognition. In Proceedings of INTERSPEECH. Mikolov, T., Kombrink, S., Deoras, A., Burget, L., and Cernocky, J. (2011). Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196–201. 7 / 9
  • 8. References II Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Goel, N. K., Karafiát, M., Rastrow, A., Rose, R. C., et al. (2010). Subspace gaussian mixture models for speech recognition. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4330–4333. IEEE. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlíček, P., Qian, Y., Schwarz, P., et al. (2011). The Kaldi speech recognition toolkit. Szoke, I. and Anguera, X. (2016). Zero cost speech recognition task at mediaeval 2016. In Proc. of the 2016 MediaEval Workshop. 8 / 9