2. Agenda
• Why automatic transcription
• State of the art: The transLectures project
• Automatic transcription of Lecture Recordings: The Opencast Project
• Notes & the near future
4. Why automatic transcription of video files?
• Accessibility
• Searching into a video file
• Searching into a video repository
• Topic identification
• …and much more
5. The transLectures project
• Development of an engine for Automated Speech Recognition (ASR)
for lectures & educational content
• Development of translation tools for that content
• Implementation
• Case studies: Videolectures.NET & Polimedia (UPV video repository)
• Real-life evaluation
• Integration into Opencast
http://www.translectures.eu
5
6. transLectures partners
12 Nov 2013
Name Country
1 Universitat Politècnica de València (MLLP) Spain
2 Xerox SAS France
3 Institut Jožef Stefan Slovenia
3+ Knowledge for All Foundation UK
4 RWTH Aachen University Germany
5 EML – European Media Laboratory Germany
6 DDS – Deluxe Digital Studios UK
36 Months
November 2014
17. Beyond transLectures
WER
Language M10 M17
Dutch 25.7 24.5
Italian 21.2 17.7
Portuguese 45.9 43.0
Spanish 15.9 14.4
Estonian N/A 27.1
French N/A 22.7
19. The Opencast Community is…
Universities, companies and people:
• concerned with academic video
• attracted to the Opencast values of openly exchanging ideas,
experience, knowledge and code
• committed to building and maintaining a robust, flexible, high-quality
open source lecture capture and academic video management
solution.
Now also part of
21. Who uses Opencast?
Around the world, with
strong adoption in
Europe especially.
43 Adopters with public
information (May 2014)
30+ commercial partner
clients
http://opencast.org/matterhor
n-adopters
23. Indexing in Opencast
• Opencast has built-in OCR indexing capabilities
Video (slides) -> OCR (hunspell) -> Word list filter -> Apache Lucene search
server
• New operations can be added
Video (slides) -> transcription (tL) -> Apache Lucene search server
or
Video (slides) -> OCR (hunspell) -> transcription (tL) -> Word list filter ->Apache
Lucene search server
24. Why do I need an indexing server?
• Powerful, Accurate and Efficient Search Algorithms
• ranked searching -- best results returned first
• many powerful query types: phrase queries, wildcard queries, proximity
queries, range queries and more
• fielded searching (e.g. title, author, contents)
• sorting by any field
• multiple-index searching with merged results
• allows simultaneous update and searching
• flexible faceting, highlighting, joins and result grouping
• fast, memory-efficient and typo-tolerant suggesters
26. Notes & the near future
• ASR Technology is enough good for automated transcription of videos
… with enough good sound
• There are lecture recording systems that enables to plug
transcriptions for searching
…like Opencast
• There are already things to solve
• Transcription speed (in good progress)
• Topic indentification
• Adding more languages
28. Learning more ….
transLectures
http://translectures.eu
Video in a multilingual context (EMMA)
http://association.media-and-learning.eu/portal/resource/ml-webinar-
video-multilingual-context
Opencast State of the Project
http://lanyrd.com/2015/apereo/sdmpry/