SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Automatic transcription of
video files
Carlos Turró
Universitat Politecnica de Valencia
Agenda
• Why automatic transcription
• State of the art: The transLectures project
• Automatic transcription of Lecture Recordings: The Opencast Project
• Notes & the near future
Why automatic transcription of video files?
• Accessibility
Why automatic transcription of video files?
• Accessibility
• Searching into a video file
• Searching into a video repository
• Topic identification
• …and much more
The transLectures project
• Development of an engine for Automated Speech Recognition (ASR)
for lectures & educational content
• Development of translation tools for that content
• Implementation
• Case studies: Videolectures.NET & Polimedia (UPV video repository)
• Real-life evaluation
• Integration into Opencast
http://www.translectures.eu
5
transLectures partners
12 Nov 2013
Name Country
1 Universitat Politècnica de València (MLLP) Spain
2 Xerox SAS France
3 Institut Jožef Stefan Slovenia
3+ Knowledge for All Foundation UK
4 RWTH Aachen University Germany
5 EML – European Media Laboratory Germany
6 DDS – Deluxe Digital Studios UK
36 Months
November 2014
Statistical transcription (and translation)
Acustic
Model
Language
Model
Sound ASR Engine
Statistical transcription (and translation)
Acustic
Model
Language
Model
Manually transcripted
voice Modeling Engine
Architecture of TransLectures
Lecture
Language
Model
Slides
Extra
content
Result
Intelligent interaction
Transcription Translation
Languages
12 Nov 2013
1
0
• Transcription (ASR)
• EN
• SL
• ES
• Translation (MT)
• EN>SL , SL>EN
• EN>ES , ES>EN
• EN>FR
• EN>DE
Transcription and Translation Platform
Transcription and Translation Platform API
Transcription and Translation Platform
• Post-editing web interface (in HTML5)
Example video
• https://media.upv.es/?id=b444d12e-db23-9a4f-9b3b-d1d9275d4cb4
Scientifical Evaluations
• WER = Word Error Ratio
• The lower the better
• Usually, a human transcriptor
has a WER around 12
Beyond transLectures
Beyond transLectures
WER
Language M10 M17
Dutch 25.7 24.5
Italian 21.2 17.7
Portuguese 45.9 43.0
Spanish 15.9 14.4
Estonian N/A 27.1
French N/A 22.7
Beyond transLectures
The Opencast Community is…
Universities, companies and people:
• concerned with academic video
• attracted to the Opencast values of openly exchanging ideas,
experience, knowledge and code
• committed to building and maintaining a robust, flexible, high-quality
open source lecture capture and academic video management
solution.
Now also part of
Full-featured Lecture Recording ecosystem
Who uses Opencast?
Around the world, with
strong adoption in
Europe especially.
43 Adopters with public
information (May 2014)
30+ commercial partner
clients
http://opencast.org/matterhor
n-adopters
Yesterday’s tweet
Indexing in Opencast
• Opencast has built-in OCR indexing capabilities
Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search
server
• New operations can be added
Video (slides) -> transcription (tL) -> Apache Lucene search server
or
Video (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache
Lucene search server
Why do I need an indexing server?
• Powerful, Accurate and Efficient Search Algorithms
• ranked searching -- best results returned first
• many powerful query types: phrase queries, wildcard queries, proximity
queries, range queries and more
• fielded searching (e.g. title, author, contents)
• sorting by any field
• multiple-index searching with merged results
• allows simultaneous update and searching
• flexible faceting, highlighting, joins and result grouping
• fast, memory-efficient and typo-tolerant suggesters
Demo on searching
• https://media.upv.es
Notes & the near future
• ASR Technology is enough good for automated transcription of videos
… with enough good sound
• There are lecture recording systems that enables to plug
transcriptions for searching
…like Opencast
• There are already things to solve
• Transcription speed (in good progress)
• Topic indentification
• Adding more languages
Thanks!
Questions?
Learning more ….
transLectures
http://translectures.eu
Video in a multilingual context (EMMA)
http://association.media-and-learning.eu/portal/resource/ml-webinar-
video-multilingual-context
Opencast State of the Project
http://lanyrd.com/2015/apereo/sdmpry/

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (8)

Shan_Oracle_EBS
Shan_Oracle_EBSShan_Oracle_EBS
Shan_Oracle_EBS
 
Data Science Governance
Data Science GovernanceData Science Governance
Data Science Governance
 
Watch newsletter nov2014
Watch newsletter nov2014Watch newsletter nov2014
Watch newsletter nov2014
 
Open edx developing x-blocks @ upvalencia (4)
Open edx   developing x-blocks @ upvalencia (4)Open edx   developing x-blocks @ upvalencia (4)
Open edx developing x-blocks @ upvalencia (4)
 
Sir Isaac Newton
Sir Isaac NewtonSir Isaac Newton
Sir Isaac Newton
 
09 e00348
09 e0034809 e00348
09 e00348
 
Jose antonio sanchez merino impress práctica 1
Jose antonio sanchez merino impress práctica 1Jose antonio sanchez merino impress práctica 1
Jose antonio sanchez merino impress práctica 1
 
La robotica
La roboticaLa robotica
La robotica
 

Ähnlich wie Automatic transcription of video files sig media

The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...Avalon Media System
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013MediaMixerCommunity
 
Introducing Matterhorn
Introducing MatterhornIntroducing Matterhorn
Introducing MatterhornKenji Lamb
 
REC:all Exploring the potential of lecture capture in universities and higher...
REC:all Exploring the potential of lecture capture in universities and higher...REC:all Exploring the potential of lecture capture in universities and higher...
REC:all Exploring the potential of lecture capture in universities and higher...MEDEA Awards
 
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EUmoocs
 
Technion IR: Institutional Repository with DSpace
Technion IR: Institutional Repository with DSpaceTechnion IR: Institutional Repository with DSpace
Technion IR: Institutional Repository with DSpaceElena Yaroshenko
 
Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Stephen Marquard
 
Avalon at Stanford University Libraries
Avalon at Stanford University LibrariesAvalon at Stanford University Libraries
Avalon at Stanford University LibrariesAvalon Media System
 
Education using FIRE
Education using FIRE Education using FIRE
Education using FIRE FORGE project
 
Enriching video content for educational uses with Paella Player
Enriching video content for educational uses with Paella PlayerEnriching video content for educational uses with Paella Player
Enriching video content for educational uses with Paella PlayerCarlos Turró Ribalta
 
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...mharpasu
 
Podcasting De Luxe
Podcasting De LuxePodcasting De Luxe
Podcasting De LuxeMartin Ebner
 
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...Alexis Monville
 
Automated Podcasting System for Universities
Automated Podcasting System for UniversitiesAutomated Podcasting System for Universities
Automated Podcasting System for UniversitiesEducational Technology
 
OpenChain at EOLE 2017
OpenChain at EOLE 2017OpenChain at EOLE 2017
OpenChain at EOLE 2017Shane Coughlan
 
Presentatie ILS Koha
Presentatie ILS KohaPresentatie ILS Koha
Presentatie ILS KohaFers
 

Ähnlich wie Automatic transcription of video files sig media (20)

The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
The Avalon Media System: An Open Source Audio/Video System for Libraries and ...
 
Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013Intelligent tools-mitja-jermol-2013-bali-7 may2013
Intelligent tools-mitja-jermol-2013-bali-7 may2013
 
Introducing Matterhorn
Introducing MatterhornIntroducing Matterhorn
Introducing Matterhorn
 
REC:all Exploring the potential of lecture capture in universities and higher...
REC:all Exploring the potential of lecture capture in universities and higher...REC:all Exploring the potential of lecture capture in universities and higher...
REC:all Exploring the potential of lecture capture in universities and higher...
 
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
 
An overview of EPrints :The University of Glasgow's Experience
An overview of EPrints :The University of Glasgow's ExperienceAn overview of EPrints :The University of Glasgow's Experience
An overview of EPrints :The University of Glasgow's Experience
 
Technion IR: Institutional Repository with DSpace
Technion IR: Institutional Repository with DSpaceTechnion IR: Institutional Repository with DSpace
Technion IR: Institutional Repository with DSpace
 
Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015
 
Avalon at Stanford University Libraries
Avalon at Stanford University LibrariesAvalon at Stanford University Libraries
Avalon at Stanford University Libraries
 
Education using FIRE
Education using FIRE Education using FIRE
Education using FIRE
 
Enriching video content for educational uses with Paella Player
Enriching video content for educational uses with Paella PlayerEnriching video content for educational uses with Paella Player
Enriching video content for educational uses with Paella Player
 
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
It takes a Village: Implementing a Homegrown Solution for Streaming Video Res...
 
Avalon Media System update
Avalon Media System updateAvalon Media System update
Avalon Media System update
 
Podcasting De Luxe
Podcasting De LuxePodcasting De Luxe
Podcasting De Luxe
 
Videolectures for ocwc2010
Videolectures for ocwc2010Videolectures for ocwc2010
Videolectures for ocwc2010
 
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...
ScrumDay 2014 - Développer des produits avec des équipes distribuées - Alexis...
 
Automated Podcasting System for Universities
Automated Podcasting System for UniversitiesAutomated Podcasting System for Universities
Automated Podcasting System for Universities
 
OpenChain at EOLE 2017
OpenChain at EOLE 2017OpenChain at EOLE 2017
OpenChain at EOLE 2017
 
Presentatie ILS Koha
Presentatie ILS KohaPresentatie ILS Koha
Presentatie ILS Koha
 
ELIXIR TCG update
ELIXIR TCG updateELIXIR TCG update
ELIXIR TCG update
 

Mehr von Carlos Turró Ribalta

User derived videos in opencast. a first draft from upv
User derived videos in opencast. a first draft from upvUser derived videos in opencast. a first draft from upv
User derived videos in opencast. a first draft from upvCarlos Turró Ribalta
 
Hacia una nueva docencia ... caso UPV
Hacia una nueva docencia ... caso UPVHacia una nueva docencia ... caso UPV
Hacia una nueva docencia ... caso UPVCarlos Turró Ribalta
 
Pedagogical innovation at Universitat Politècnica de València
Pedagogical innovation at Universitat Politècnica de ValènciaPedagogical innovation at Universitat Politècnica de València
Pedagogical innovation at Universitat Politècnica de ValènciaCarlos Turró Ribalta
 
Video is key for Flipped Learning: the experience at UP Valencia
Video is key for Flipped Learning: the experience at UP ValenciaVideo is key for Flipped Learning: the experience at UP Valencia
Video is key for Flipped Learning: the experience at UP ValenciaCarlos Turró Ribalta
 

Mehr von Carlos Turró Ribalta (6)

User derived videos in opencast. a first draft from upv
User derived videos in opencast. a first draft from upvUser derived videos in opencast. a first draft from upv
User derived videos in opencast. a first draft from upv
 
Paella player and Opencast
Paella player and OpencastPaella player and Opencast
Paella player and Opencast
 
Hacia una nueva docencia ... caso UPV
Hacia una nueva docencia ... caso UPVHacia una nueva docencia ... caso UPV
Hacia una nueva docencia ... caso UPV
 
Paella player 5
Paella player 5Paella player 5
Paella player 5
 
Pedagogical innovation at Universitat Politècnica de València
Pedagogical innovation at Universitat Politècnica de ValènciaPedagogical innovation at Universitat Politècnica de València
Pedagogical innovation at Universitat Politècnica de València
 
Video is key for Flipped Learning: the experience at UP Valencia
Video is key for Flipped Learning: the experience at UP ValenciaVideo is key for Flipped Learning: the experience at UP Valencia
Video is key for Flipped Learning: the experience at UP Valencia
 

Kürzlich hochgeladen

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxNiranjanYadav41
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxachiever3003
 

Kürzlich hochgeladen (20)

Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
BSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptxBSNL Internship Training presentation.pptx
BSNL Internship Training presentation.pptx
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Crystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptxCrystal Structure analysis and detailed information pptx
Crystal Structure analysis and detailed information pptx
 

Automatic transcription of video files sig media

  • 1. Automatic transcription of video files Carlos Turró Universitat Politecnica de Valencia
  • 2. Agenda • Why automatic transcription • State of the art: The transLectures project • Automatic transcription of Lecture Recordings: The Opencast Project • Notes & the near future
  • 3. Why automatic transcription of video files? • Accessibility
  • 4. Why automatic transcription of video files? • Accessibility • Searching into a video file • Searching into a video repository • Topic identification • …and much more
  • 5. The transLectures project • Development of an engine for Automated Speech Recognition (ASR) for lectures & educational content • Development of translation tools for that content • Implementation • Case studies: Videolectures.NET & Polimedia (UPV video repository) • Real-life evaluation • Integration into Opencast http://www.translectures.eu 5
  • 6. transLectures partners 12 Nov 2013 Name Country 1 Universitat Politècnica de València (MLLP) Spain 2 Xerox SAS France 3 Institut Jožef Stefan Slovenia 3+ Knowledge for All Foundation UK 4 RWTH Aachen University Germany 5 EML – European Media Laboratory Germany 6 DDS – Deluxe Digital Studios UK 36 Months November 2014
  • 7. Statistical transcription (and translation) Acustic Model Language Model Sound ASR Engine
  • 8. Statistical transcription (and translation) Acustic Model Language Model Manually transcripted voice Modeling Engine
  • 10. Languages 12 Nov 2013 1 0 • Transcription (ASR) • EN • SL • ES • Translation (MT) • EN>SL , SL>EN • EN>ES , ES>EN • EN>FR • EN>DE
  • 13. Transcription and Translation Platform • Post-editing web interface (in HTML5)
  • 15. Scientifical Evaluations • WER = Word Error Ratio • The lower the better • Usually, a human transcriptor has a WER around 12
  • 17. Beyond transLectures WER Language M10 M17 Dutch 25.7 24.5 Italian 21.2 17.7 Portuguese 45.9 43.0 Spanish 15.9 14.4 Estonian N/A 27.1 French N/A 22.7
  • 19. The Opencast Community is… Universities, companies and people: • concerned with academic video • attracted to the Opencast values of openly exchanging ideas, experience, knowledge and code • committed to building and maintaining a robust, flexible, high-quality open source lecture capture and academic video management solution. Now also part of
  • 21. Who uses Opencast? Around the world, with strong adoption in Europe especially. 43 Adopters with public information (May 2014) 30+ commercial partner clients http://opencast.org/matterhor n-adopters
  • 23. Indexing in Opencast • Opencast has built-in OCR indexing capabilities Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search server • New operations can be added Video (slides) -> transcription (tL) -> Apache Lucene search server or Video (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache Lucene search server
  • 24. Why do I need an indexing server? • Powerful, Accurate and Efficient Search Algorithms • ranked searching -- best results returned first • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more • fielded searching (e.g. title, author, contents) • sorting by any field • multiple-index searching with merged results • allows simultaneous update and searching • flexible faceting, highlighting, joins and result grouping • fast, memory-efficient and typo-tolerant suggesters
  • 25. Demo on searching • https://media.upv.es
  • 26. Notes & the near future • ASR Technology is enough good for automated transcription of videos … with enough good sound • There are lecture recording systems that enables to plug transcriptions for searching …like Opencast • There are already things to solve • Transcription speed (in good progress) • Topic indentification • Adding more languages
  • 28. Learning more …. transLectures http://translectures.eu Video in a multilingual context (EMMA) http://association.media-and-learning.eu/portal/resource/ml-webinar- video-multilingual-context Opencast State of the Project http://lanyrd.com/2015/apereo/sdmpry/