SlideShare a Scribd company logo
1 of 43
Wreck a nice beach: adventures in speech recognitionStephen MarquardCentre for Educational Technology, University of Cape Townstephen.marquard@uct.ac.zaDepartment of Computer ScienceSeminar, April 2011
Overview Project goals Speech recognition Acoustic modelling Language modelling Integration into a lecture capture system
Project goals Integrate speech recognition into a lecture capture system: Opencast Matterhorn CMU Sphinx ASR engine Generate automatic transcripts of recorded lectures Allow users to correct and improve the transcripts (crowdsourcing) Use feedback to improve recognition accuracy (of the same, similar or subsequent recordings) Experiment and implement at UCT
Why is it important? Video and audio is more useful if you can: Navigate it easily Locate relevant recordings from a large set Use by students: Catch up on missed lectures (continuous play or read the transcript) Revision: jump to a particular point or find the lectures which cover topic X On the public web: Discoverability (search indexing)
Easy or hard? Easiest: small, fixed vocabulary, prescriptive grammar, discrete words, known audio conditions (command-and-control systems) Dictation applications in a specific domain, e.g. Dragon Naturally Speaking Hardest: speaker-independent, large vocabulary continuous speech recognition, adverse or unknown audio conditions
Why is it hard? People have huge amounts of prior experience and a rich (complex) understanding of context Modelling of context in ASR engines is currently very limited Even people misrecognize speech (e.g. new / foreign accents, specialized terminology, background noise)
Speech recognition Wreck a nice beach 			… you sing calm incense Reckon eyes peach Recognize speech 				… using common sense
Early history First known device 1952 (digits) Above: IBM Shoebox, 1961 http://www-03.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
Linguistics vs statistics 	Early approaches tried to recognize individual phonemes (phonetic units) and hence the words they formed. 	But not very successfully.
Airplanes don’t flap their wings 	“Every time I fire a linguist, my system improves” 	Fred Jelinek 	1985/1988
Speech recognition pipeline Audio (signal processing, extract features) Acoustic model (features to phonemes) Pronunciation dictionary (lexicon) Language model (likelihood of words) Confusion lattice (possible options) Results > confidence score
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-345-automatic-speech-recognition-spring-2003/lecture-notes/lecture1.pdf
Hidden Markov Models HMMs model transition probabilities: Alice talks to Bob three days in a row and discovers that on the first day he went for a walk, on the second day he went shopping, and on the third day he cleaned his apartment. Alice has a question: what is the most likely sequence of rainy/sunny days that would explain these observations? http://en.wikipedia.org/wiki/Viterbi_algorithm
Training in action 	“training 3 (decision) trees to depth 20 from 1 million images takes about a day on a 1000 core cluster”http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf
Characteristics of the field 	“the standard approach in our field [is] state-of-the-art system A is gently perturbed to create system B, resulting in a relative decrease in error rate of from 1 to 10%” Borlard, Hermansky and Morgan. Towards increasing speech recognition error rates, 1996. Algorithmic, drawing on many disciplines (especially signal processing, statistics, linguistics, natural language processing) Empirical: lots of different algorithms and optimizations Almost no theory to describe why particular approaches work better than others, or how to find optimal solutions Massive infrastructure is a big advantage: large and varied data sets, significant computing resources.
Audio issues Bandwidth Recording noise Ambient noise Reverberation Microphones Microphone arrays
Acoustic models Generated from a corpus of recorded, transcribed audio Both artificial and natural corpuses(TIMIT, Broadcast News, Meetings) Audio needs to match the application Audio bandwidth = ½ sampling rate Phone speech (sampled 8 KHz, bandwidth 4 KHz) Microphone speech (sampled 16 KHz, bandwidth 8 KHz, typical analysis on 130 Hz – 6800 Hz) There is a South African corpus of phone speech  But no South African corpus of microphone speech 
The TIMIT audio corpus 	0 47719 She had your dark suit in greasy wash water all year 2214 4428 she 4428 8316 had 7308 9691 your 9691 15331 dark 15331 19634 suit 20929 22453 in 22453 27697 greasy 27697 32326 wash 33120 36575 water 37597 39644 all 39644 43982 year 0 2214 h# 2214 3744 sh 3744 4428 ax-h 4428 5229 hv 5229 6927 ae 6927 7308 dcl 7308 8316 jh 8316 9691 axr 9691 11697 dcl 11697 12114 d 12114 13075 aa … Word and phoneme alignment by timecode. 630 speakers from 8 US dialect regions, speaking 10 sentences each.
Dialect regions The Nationwide Speech Project: A new corpus of American English dialects http://web.mit.edu/~nancyc/Public/Papers/Clopper_Pisoni_06_SC.pdf
Crowdsourcing the creation of a GPL speech corpus and open source acoustic models (Sphinx, ISIP, Julius, HTK). 	An important effort, but still small (84 hours at Dec 2010)www.voxforge.org
Language modelling Pronunciation dictionary (lexicon) 	TOMATO  T AH0 M EY1 T OW2 		TOMATO(1)  T AH0 M AA1 T OW2 Language model: a statistical sequence model of words. Trigram models (3 words) are common: 	-2.0998 YORK MONEY FUND  	-0.0798 YORK HEDGE FUND  	-0.1392 YORK MUTUAL FUND
Statistical sequence models Truly Madly _____ Widely used Applications Auto-suggest Spell-checkers Lossless compression Machine translation Language models for speech recognition Probability of token w in context of preceding tokens c, e.g. P(deeply), given “truly madly”
Context is king Micro-context (e.g. bi- and trigrams) 	United Kingdom 	United Airlines 	United Arab Emirates Long-range context 	“Cricket and rugby are amongst the most popular sports in the United _________” (example from The Sequence Memoizer, Wood et al, 2011).
Characteristics of language Power law frequency / rank distribution. Zipf’s law: 	“given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table” http://en.wikipedia.org/wiki/Zipf’s_law Also more frequent words are shorter.
How to get large language data sets Linguistic Data Consortium(by subscription, restricted) Some other more specialized corpora Microsoft (free, restricted) Google (Creative Commons license) Wikipedia (CC / GFDL license)
Using Wikipedia as a language resource Download a snapshot (6G compressed) Convert from XML and markup to plain text Create dictionaries of target size (by word frequency) Create language models of target size Approximately equal in size to English Gigaword Corpus
Grid computing for language modelling For when you need lots of RAM and/or lots of CPU www.sagrid.ac.za ICTS at UCT: Tim Carr, Andrew Lewis
Accounting for context: LM adaptation Adapt a language model to more closely resemble the target speech Using related text for Topic modelling (vocabulary, concepts) Style-of-speech modelling 	“ok and um it's quite useful to have a very good diagnostic test of of acute hepatitis um you know to prevent kind of unnecessary um surgery um so hepatitis is really one um example of a cause of acute abdominal pain that doesn't need surgery”
What’s special about lectures? Possibly helpful assumptions: Coherent topic(s) within a course One lecturer presents many lectures Specialized vocabulary Spoken speech different to written speech
Using Wikipedia for LM adaptation Goal is to adapt a “standard” LM to be specific to the topic of the audio Start somewhere: title, keywords, text from slides Select a set of documents, adapt the LM Using wikipedia, select by similarity: identify the set of documents most closely related to the starting point or keywords
Vector space modelling Represents documents as n-dimensional vectors (n terms) Document similarity established by comparing vectors, producing a similarity score. Gensim VSM toolkit: independent of corpus size (so good for wikipedia) LSI, LDA, TF-IDF measures.  Create a “similarity crawler” to build a corpus of documents related to the topic
Metrics Perplexity (average number of guesses required) Word Error Rate (edit distance: insertions, deletions, substitutions) Information Retrieval: precision and recall What’s sufficient? Need to close an accuracy gap of  Munteanu research: %WER for a transcript
What is lecture capture? Largely automated: ,[object Object]
 Processing
 OutputRecreates the lecture experience by recording: ,[object Object]
 video
 screen output (VGA)www.opencastproject.org
Licensing constraints Opencast Matterhorn is licensed under the ECL open source license (similar to Apache 2.0 license) Allows closed commercial derivatives Therefore cannot use software or datasets which are non-commercial or research-only. Can use Apache, BSD, LGPL, maybe GPL code and data.
Speech recognition software ecosystem Licensing and patents Closed Proprietary FOSS Open
Opencast in action

More Related Content

What's hot

Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitShubham Verma
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarWithTheBest
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyondlinshanleearchive
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding Systeminscit2006
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeRajpootBhatti5
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2Karthik Murugesan
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
download
downloaddownload
downloadbutest
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferRoelof Pieters
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyanrudolf eremyan
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Yuya Unno
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxVishnuRajuV
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsJitendra Patil
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsRoelof Pieters
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 

What's hot (20)

Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR ToolkitImplemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
Implemetation of parallelism in HMM DNN based state of the art kaldi ASR Toolkit
 
ppt
pptppt
ppt
 
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh TomarDeep Learning for Speech Recognition - Vikrant Singh Tomar
Deep Learning for Speech Recognition - Vikrant Singh Tomar
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyond
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
What are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 RoutledgeWhat are the basics of Analysing a corpus? chpt.10 Routledge
What are the basics of Analysing a corpus? chpt.10 Routledge
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Python NLTK
Python NLTKPython NLTK
Python NLTK
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
download
downloaddownload
download
 
Zero shot learning through cross-modal transfer
Zero shot learning through cross-modal transferZero shot learning through cross-modal transfer
Zero shot learning through cross-modal transfer
 
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf EremyanDataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
DataFest 2017. Introduction to Natural Language Processing by Rudolf Eremyan
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
Corpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical ToolsCorpus Linguistics :Analytical Tools
Corpus Linguistics :Analytical Tools
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 

Similar to Wreck a nice beach: adventures in speech recognition

Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesIJECEIAES
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.pptSahidKhan61
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniquessonukumar142
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 
A Recorded Debating Dataset
A Recorded Debating DatasetA Recorded Debating Dataset
A Recorded Debating DatasetScott Faria
 
Language Grid
Language GridLanguage Grid
Language Gridlindh
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010nottyknight
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic WebYannis Kalfoglou
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked DataMarco Fossati
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesXavier Anguera
 

Similar to Wreck a nice beach: adventures in speech recognition (20)

Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Asr
AsrAsr
Asr
 
Asr
AsrAsr
Asr
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Voice recognitionr.ppt
Voice recognitionr.pptVoice recognitionr.ppt
Voice recognitionr.ppt
 
sr.ppt
sr.pptsr.ppt
sr.ppt
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Assign
AssignAssign
Assign
 
10
1010
10
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
A Recorded Debating Dataset
A Recorded Debating DatasetA Recorded Debating Dataset
A Recorded Debating Dataset
 
Language Grid
Language GridLanguage Grid
Language Grid
 
Asr
AsrAsr
Asr
 
IVACS Symposium 2010
IVACS Symposium 2010IVACS Symposium 2010
IVACS Symposium 2010
 
Reasoning on the Semantic Web
Reasoning on the Semantic WebReasoning on the Semantic Web
Reasoning on the Semantic Web
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 

More from Stephen Marquard

The implementation of an Opt-Out Lecture Recording Policy at the University o...
The implementation of an Opt-Out Lecture Recording Policy at the University o...The implementation of an Opt-Out Lecture Recording Policy at the University o...
The implementation of an Opt-Out Lecture Recording Policy at the University o...Stephen Marquard
 
Orchestrating Self-Service Video Workflows with Opencast
Orchestrating Self-Service Video Workflows with OpencastOrchestrating Self-Service Video Workflows with Opencast
Orchestrating Self-Service Video Workflows with OpencastStephen Marquard
 
Smart workflows for Opencast
Smart workflows for OpencastSmart workflows for Opencast
Smart workflows for OpencastStephen Marquard
 
LectureSight is awesome and getting better! 
LectureSight is awesome and getting better! LectureSight is awesome and getting better! 
LectureSight is awesome and getting better! Stephen Marquard
 
Track4K in production at the University of Cape Town
Track4K in production at the University of Cape TownTrack4K in production at the University of Cape Town
Track4K in production at the University of Cape TownStephen Marquard
 
Opencast Valencia 2017: Users, groups, roles, ACLs and providers
Opencast Valencia 2017: Users, groups, roles, ACLs and providersOpencast Valencia 2017: Users, groups, roles, ACLs and providers
Opencast Valencia 2017: Users, groups, roles, ACLs and providersStephen Marquard
 
Opencast and Sakai at UCT, LectureSight and Track4K
Opencast and Sakai at UCT, LectureSight and Track4KOpencast and Sakai at UCT, LectureSight and Track4K
Opencast and Sakai at UCT, LectureSight and Track4KStephen Marquard
 
LectureSight in Action (Opencast Community Summit 2016)
LectureSight in Action (Opencast Community Summit 2016)LectureSight in Action (Opencast Community Summit 2016)
LectureSight in Action (Opencast Community Summit 2016)Stephen Marquard
 
Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Stephen Marquard
 
Why do students use lecture recordings?
Why do students use lecture recordings?Why do students use lecture recordings?
Why do students use lecture recordings?Stephen Marquard
 
Introduction to Opencast Matterhorn: Apereo 2014
Introduction to Opencast Matterhorn: Apereo 2014Introduction to Opencast Matterhorn: Apereo 2014
Introduction to Opencast Matterhorn: Apereo 2014Stephen Marquard
 
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014Stephen Marquard
 
Matterhorn 2014 Unconference: Ideas for automated post-recording video handling
Matterhorn 2014 Unconference: Ideas for automated post-recording video handlingMatterhorn 2014 Unconference: Ideas for automated post-recording video handling
Matterhorn 2014 Unconference: Ideas for automated post-recording video handlingStephen Marquard
 
Opencast Matterhorn at UCT
Opencast Matterhorn at UCTOpencast Matterhorn at UCT
Opencast Matterhorn at UCTStephen Marquard
 
Open Text: Speech recognition in Opencast Matterhorn
Open Text: Speech recognition in Opencast MatterhornOpen Text: Speech recognition in Opencast Matterhorn
Open Text: Speech recognition in Opencast Matterhorn Stephen Marquard
 
Advancing Online Assessment in Medical Education
Advancing Online Assessment in Medical EducationAdvancing Online Assessment in Medical Education
Advancing Online Assessment in Medical EducationStephen Marquard
 
SMS, Q&A and Course Evaluations in Sakai
SMS, Q&A and Course Evaluations in SakaiSMS, Q&A and Course Evaluations in Sakai
SMS, Q&A and Course Evaluations in SakaiStephen Marquard
 
SMS, Q&A, Course Evaluation tools in Sakai
SMS, Q&A, Course Evaluation tools in SakaiSMS, Q&A, Course Evaluation tools in Sakai
SMS, Q&A, Course Evaluation tools in SakaiStephen Marquard
 
Sakai E Learning Update Sep09
Sakai E Learning Update Sep09Sakai E Learning Update Sep09
Sakai E Learning Update Sep09Stephen Marquard
 

More from Stephen Marquard (20)

The implementation of an Opt-Out Lecture Recording Policy at the University o...
The implementation of an Opt-Out Lecture Recording Policy at the University o...The implementation of an Opt-Out Lecture Recording Policy at the University o...
The implementation of an Opt-Out Lecture Recording Policy at the University o...
 
Orchestrating Self-Service Video Workflows with Opencast
Orchestrating Self-Service Video Workflows with OpencastOrchestrating Self-Service Video Workflows with Opencast
Orchestrating Self-Service Video Workflows with Opencast
 
Smart workflows for Opencast
Smart workflows for OpencastSmart workflows for Opencast
Smart workflows for Opencast
 
LectureSight is awesome and getting better! 
LectureSight is awesome and getting better! LectureSight is awesome and getting better! 
LectureSight is awesome and getting better! 
 
Track4K in production at the University of Cape Town
Track4K in production at the University of Cape TownTrack4K in production at the University of Cape Town
Track4K in production at the University of Cape Town
 
Opencast Valencia 2017: Users, groups, roles, ACLs and providers
Opencast Valencia 2017: Users, groups, roles, ACLs and providersOpencast Valencia 2017: Users, groups, roles, ACLs and providers
Opencast Valencia 2017: Users, groups, roles, ACLs and providers
 
Opencast and Sakai at UCT, LectureSight and Track4K
Opencast and Sakai at UCT, LectureSight and Track4KOpencast and Sakai at UCT, LectureSight and Track4K
Opencast and Sakai at UCT, LectureSight and Track4K
 
LectureSight in Action (Opencast Community Summit 2016)
LectureSight in Action (Opencast Community Summit 2016)LectureSight in Action (Opencast Community Summit 2016)
LectureSight in Action (Opencast Community Summit 2016)
 
Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015
 
Why do students use lecture recordings?
Why do students use lecture recordings?Why do students use lecture recordings?
Why do students use lecture recordings?
 
Introduction to Opencast Matterhorn: Apereo 2014
Introduction to Opencast Matterhorn: Apereo 2014Introduction to Opencast Matterhorn: Apereo 2014
Introduction to Opencast Matterhorn: Apereo 2014
 
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014
Introduction to Opencast Matterhorn, Apereo Mexico Conference, May 2014
 
Matterhorn 2014 Unconference: Ideas for automated post-recording video handling
Matterhorn 2014 Unconference: Ideas for automated post-recording video handlingMatterhorn 2014 Unconference: Ideas for automated post-recording video handling
Matterhorn 2014 Unconference: Ideas for automated post-recording video handling
 
Opencast Matterhorn at UCT
Opencast Matterhorn at UCTOpencast Matterhorn at UCT
Opencast Matterhorn at UCT
 
Open Text: Speech recognition in Opencast Matterhorn
Open Text: Speech recognition in Opencast MatterhornOpen Text: Speech recognition in Opencast Matterhorn
Open Text: Speech recognition in Opencast Matterhorn
 
Advancing Online Assessment in Medical Education
Advancing Online Assessment in Medical EducationAdvancing Online Assessment in Medical Education
Advancing Online Assessment in Medical Education
 
SMS, Q&A and Course Evaluations in Sakai
SMS, Q&A and Course Evaluations in SakaiSMS, Q&A and Course Evaluations in Sakai
SMS, Q&A and Course Evaluations in Sakai
 
SMS, Q&A, Course Evaluation tools in Sakai
SMS, Q&A, Course Evaluation tools in SakaiSMS, Q&A, Course Evaluation tools in Sakai
SMS, Q&A, Course Evaluation tools in Sakai
 
Sakai E Learning Update Sep09
Sakai E Learning Update Sep09Sakai E Learning Update Sep09
Sakai E Learning Update Sep09
 
Vula is my survival kit
Vula is my survival kitVula is my survival kit
Vula is my survival kit
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 

Wreck a nice beach: adventures in speech recognition

  • 1. Wreck a nice beach: adventures in speech recognitionStephen MarquardCentre for Educational Technology, University of Cape Townstephen.marquard@uct.ac.zaDepartment of Computer ScienceSeminar, April 2011
  • 2. Overview Project goals Speech recognition Acoustic modelling Language modelling Integration into a lecture capture system
  • 3. Project goals Integrate speech recognition into a lecture capture system: Opencast Matterhorn CMU Sphinx ASR engine Generate automatic transcripts of recorded lectures Allow users to correct and improve the transcripts (crowdsourcing) Use feedback to improve recognition accuracy (of the same, similar or subsequent recordings) Experiment and implement at UCT
  • 4. Why is it important? Video and audio is more useful if you can: Navigate it easily Locate relevant recordings from a large set Use by students: Catch up on missed lectures (continuous play or read the transcript) Revision: jump to a particular point or find the lectures which cover topic X On the public web: Discoverability (search indexing)
  • 5. Easy or hard? Easiest: small, fixed vocabulary, prescriptive grammar, discrete words, known audio conditions (command-and-control systems) Dictation applications in a specific domain, e.g. Dragon Naturally Speaking Hardest: speaker-independent, large vocabulary continuous speech recognition, adverse or unknown audio conditions
  • 6. Why is it hard? People have huge amounts of prior experience and a rich (complex) understanding of context Modelling of context in ASR engines is currently very limited Even people misrecognize speech (e.g. new / foreign accents, specialized terminology, background noise)
  • 7. Speech recognition Wreck a nice beach … you sing calm incense Reckon eyes peach Recognize speech … using common sense
  • 8. Early history First known device 1952 (digits) Above: IBM Shoebox, 1961 http://www-03.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
  • 9. Linguistics vs statistics Early approaches tried to recognize individual phonemes (phonetic units) and hence the words they formed. But not very successfully.
  • 10. Airplanes don’t flap their wings “Every time I fire a linguist, my system improves” Fred Jelinek 1985/1988
  • 11. Speech recognition pipeline Audio (signal processing, extract features) Acoustic model (features to phonemes) Pronunciation dictionary (lexicon) Language model (likelihood of words) Confusion lattice (possible options) Results > confidence score
  • 13. Hidden Markov Models HMMs model transition probabilities: Alice talks to Bob three days in a row and discovers that on the first day he went for a walk, on the second day he went shopping, and on the third day he cleaned his apartment. Alice has a question: what is the most likely sequence of rainy/sunny days that would explain these observations? http://en.wikipedia.org/wiki/Viterbi_algorithm
  • 14. Training in action “training 3 (decision) trees to depth 20 from 1 million images takes about a day on a 1000 core cluster”http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf
  • 15. Characteristics of the field “the standard approach in our field [is] state-of-the-art system A is gently perturbed to create system B, resulting in a relative decrease in error rate of from 1 to 10%” Borlard, Hermansky and Morgan. Towards increasing speech recognition error rates, 1996. Algorithmic, drawing on many disciplines (especially signal processing, statistics, linguistics, natural language processing) Empirical: lots of different algorithms and optimizations Almost no theory to describe why particular approaches work better than others, or how to find optimal solutions Massive infrastructure is a big advantage: large and varied data sets, significant computing resources.
  • 16. Audio issues Bandwidth Recording noise Ambient noise Reverberation Microphones Microphone arrays
  • 17.
  • 18. Acoustic models Generated from a corpus of recorded, transcribed audio Both artificial and natural corpuses(TIMIT, Broadcast News, Meetings) Audio needs to match the application Audio bandwidth = ½ sampling rate Phone speech (sampled 8 KHz, bandwidth 4 KHz) Microphone speech (sampled 16 KHz, bandwidth 8 KHz, typical analysis on 130 Hz – 6800 Hz) There is a South African corpus of phone speech  But no South African corpus of microphone speech 
  • 19. The TIMIT audio corpus 0 47719 She had your dark suit in greasy wash water all year 2214 4428 she 4428 8316 had 7308 9691 your 9691 15331 dark 15331 19634 suit 20929 22453 in 22453 27697 greasy 27697 32326 wash 33120 36575 water 37597 39644 all 39644 43982 year 0 2214 h# 2214 3744 sh 3744 4428 ax-h 4428 5229 hv 5229 6927 ae 6927 7308 dcl 7308 8316 jh 8316 9691 axr 9691 11697 dcl 11697 12114 d 12114 13075 aa … Word and phoneme alignment by timecode. 630 speakers from 8 US dialect regions, speaking 10 sentences each.
  • 20. Dialect regions The Nationwide Speech Project: A new corpus of American English dialects http://web.mit.edu/~nancyc/Public/Papers/Clopper_Pisoni_06_SC.pdf
  • 21. Crowdsourcing the creation of a GPL speech corpus and open source acoustic models (Sphinx, ISIP, Julius, HTK). An important effort, but still small (84 hours at Dec 2010)www.voxforge.org
  • 22. Language modelling Pronunciation dictionary (lexicon) TOMATO T AH0 M EY1 T OW2 TOMATO(1) T AH0 M AA1 T OW2 Language model: a statistical sequence model of words. Trigram models (3 words) are common: -2.0998 YORK MONEY FUND -0.0798 YORK HEDGE FUND -0.1392 YORK MUTUAL FUND
  • 23. Statistical sequence models Truly Madly _____ Widely used Applications Auto-suggest Spell-checkers Lossless compression Machine translation Language models for speech recognition Probability of token w in context of preceding tokens c, e.g. P(deeply), given “truly madly”
  • 24. Context is king Micro-context (e.g. bi- and trigrams) United Kingdom United Airlines United Arab Emirates Long-range context “Cricket and rugby are amongst the most popular sports in the United _________” (example from The Sequence Memoizer, Wood et al, 2011).
  • 25.
  • 26. Characteristics of language Power law frequency / rank distribution. Zipf’s law: “given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table” http://en.wikipedia.org/wiki/Zipf’s_law Also more frequent words are shorter.
  • 27. How to get large language data sets Linguistic Data Consortium(by subscription, restricted) Some other more specialized corpora Microsoft (free, restricted) Google (Creative Commons license) Wikipedia (CC / GFDL license)
  • 28. Using Wikipedia as a language resource Download a snapshot (6G compressed) Convert from XML and markup to plain text Create dictionaries of target size (by word frequency) Create language models of target size Approximately equal in size to English Gigaword Corpus
  • 29. Grid computing for language modelling For when you need lots of RAM and/or lots of CPU www.sagrid.ac.za ICTS at UCT: Tim Carr, Andrew Lewis
  • 30. Accounting for context: LM adaptation Adapt a language model to more closely resemble the target speech Using related text for Topic modelling (vocabulary, concepts) Style-of-speech modelling “ok and um it's quite useful to have a very good diagnostic test of of acute hepatitis um you know to prevent kind of unnecessary um surgery um so hepatitis is really one um example of a cause of acute abdominal pain that doesn't need surgery”
  • 31. What’s special about lectures? Possibly helpful assumptions: Coherent topic(s) within a course One lecturer presents many lectures Specialized vocabulary Spoken speech different to written speech
  • 32.
  • 33. Using Wikipedia for LM adaptation Goal is to adapt a “standard” LM to be specific to the topic of the audio Start somewhere: title, keywords, text from slides Select a set of documents, adapt the LM Using wikipedia, select by similarity: identify the set of documents most closely related to the starting point or keywords
  • 34. Vector space modelling Represents documents as n-dimensional vectors (n terms) Document similarity established by comparing vectors, producing a similarity score. Gensim VSM toolkit: independent of corpus size (so good for wikipedia) LSI, LDA, TF-IDF measures. Create a “similarity crawler” to build a corpus of documents related to the topic
  • 35. Metrics Perplexity (average number of guesses required) Word Error Rate (edit distance: insertions, deletions, substitutions) Information Retrieval: precision and recall What’s sufficient? Need to close an accuracy gap of Munteanu research: %WER for a transcript
  • 36.
  • 38.
  • 40. screen output (VGA)www.opencastproject.org
  • 41. Licensing constraints Opencast Matterhorn is licensed under the ECL open source license (similar to Apache 2.0 license) Allows closed commercial derivatives Therefore cannot use software or datasets which are non-commercial or research-only. Can use Apache, BSD, LGPL, maybe GPL code and data.
  • 42. Speech recognition software ecosystem Licensing and patents Closed Proprietary FOSS Open
  • 44. Prior work in ASR for lectures MIT Lecture Browser (SUMMIT recognizer)U. Toronto / ePresence PhD prototype by CosminMunteanu(SONIC recognizer) ETH Zurich Integration of CMU Sphinx with REPLAY
  • 45. Work in progress Get consistently good quality audio recordings Implement dynamic language model adaptation Integrate into Opencast Matterhorn workflow Show transcript to users in UI, enable search Allow users to edit / improve transcript Use edits to improve recognition
  • 46. Speech recognition in the cloud Google Android: 70 CPU-years to build models Nexiwave: cloud service using GPUs Advantages: potentially massive computing resources Disadvantages: generic issues and risks with cloud services Bandwidth, lock-in, terms of service, data ownership and retention, etc.
  • 47. Find out more Truly Madly Wordly: my blog on open source language modelling and speech recognition: http://trulymadlywordly.blogspot.com CMU Sphinxhttp://cmusphinx.sourceforge.net/ Opencast http://www.opencastproject.org