SlideShare a Scribd company logo
1 of 28
Hidden  treasures  lost  forever? Speech technology  for the disclosure of Dutch audiovisual archives Mies Langelaar  and  Willemijn Heeren
Contents ,[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The  approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The  test case ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Searching the RR  archives  I Minimal content descriptions per hour data
Searching the RR archives II ? ? ? ? ?
Main  problems ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Towards  solutions … ,[object Object],[object Object]
[object Object],[object Object]
AV Collection of Rotterdam Municipal Archives ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Work in progress ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Work in progress (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to ensure long term sustainability  ,[object Object]
Trusted Digital Repository Feeder   System Workflow Job   Queue File Storage Characterisation Preservation  Planning Migration Technical Registry Active Preservation Data  Management Access Reporting Storage Adaptor Passive Preservation Ingest   Toolkit Preservation Controller Workflow Controller User Administrator Archivist Metadata Store
[object Object],[object Object],[object Object],[object Object],How to ensure long term access to data?
[object Object]
Disclosure  through  speech technology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
AV archiving workflow CHoral Content production Indexing ,[object Object],[object Object],[object Object],[object Object],End user ASR IR UI
Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Alignment + Speech signal Typed transcript Landgenooten  waar ik enkele  Begin frame # End frame # Word 00000 54400 -silence- 54400 65280 Landgenooten 65280 69120 Waar 69120 73600 Ik 73600 79520 Enkele … … …
Automatic  speech  recognition Acoustic model Language   model Pronunciation   dictionary Speech  recognition 50+ hour audio 250-500 M words Pre-processing Classification speech/nonspeech Segmentation of speakers 2 nd  recognition with adapted models Word level index
Types of word  level  indexes ,[object Object],[object Object],ASR: Er is een bekend beeld voor veel ouders de  grote show in  onveilige situatie voor de school TXT: ‘t  is een bekend beeld voor veel ouders. De chaotische en onveilige situatie voor de school “ D’66 is z’n ene zetel kwijt”
Discussion ASR ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
User  interface  development ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
CHoral  speech technology for GAR ,[object Object],[object Object]
Disc ussion ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]

More Related Content

Viewers also liked

Presentatie Informatie van nu, beschikbaar in de toekomst
Presentatie  Informatie van nu, beschikbaar in de toekomstPresentatie  Informatie van nu, beschikbaar in de toekomst
Presentatie Informatie van nu, beschikbaar in de toekomstMies Langelaar
 
Big data x big archives = great opportunities
Big data x big archives = great opportunitiesBig data x big archives = great opportunities
Big data x big archives = great opportunitiesKVANdagen
 
Presentation E Depot Concern
Presentation E Depot ConcernPresentation E Depot Concern
Presentation E Depot ConcernMies Langelaar
 
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebos
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebosEen e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebos
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebosKVANdagen
 
Bridging The Gap Eca 2010
Bridging The Gap Eca 2010Bridging The Gap Eca 2010
Bridging The Gap Eca 2010Mies Langelaar
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your BusinessBarry Feldman
 

Viewers also liked (6)

Presentatie Informatie van nu, beschikbaar in de toekomst
Presentatie  Informatie van nu, beschikbaar in de toekomstPresentatie  Informatie van nu, beschikbaar in de toekomst
Presentatie Informatie van nu, beschikbaar in de toekomst
 
Big data x big archives = great opportunities
Big data x big archives = great opportunitiesBig data x big archives = great opportunities
Big data x big archives = great opportunities
 
Presentation E Depot Concern
Presentation E Depot ConcernPresentation E Depot Concern
Presentation E Depot Concern
 
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebos
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebosEen e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebos
Een e-depot voor de kleine(re) archiefdienst: kruimels in het keuzebos
 
Bridging The Gap Eca 2010
Bridging The Gap Eca 2010Bridging The Gap Eca 2010
Bridging The Gap Eca 2010
 
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business32 Ways a Digital Marketing Consultant Can Help Grow Your Business
32 Ways a Digital Marketing Consultant Can Help Grow Your Business
 

Similar to Iasa Presentatie

Digitisaing oral history by Sally Hone
Digitisaing oral history by Sally HoneDigitisaing oral history by Sally Hone
Digitisaing oral history by Sally HonePublicLibraryServices
 
1. digital audio recording
1. digital audio recording1. digital audio recording
1. digital audio recordingPaulo Abelho
 
Digital Preservation for Public Broadcasting
Digital Preservation for Public BroadcastingDigital Preservation for Public Broadcasting
Digital Preservation for Public BroadcastingRebecca Fraimow
 
Audiovisual content exploitation JTS2010
Audiovisual content exploitation  JTS2010 Audiovisual content exploitation  JTS2010
Audiovisual content exploitation JTS2010 roelandordelman.nl
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionKay Gregg
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyLewisB2013
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyLewisB2013
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyLewisB2013
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheetluisfvazquez1
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work SheetKyleFielding
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondBenoit Pauwels
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondULB - Bibliothèques
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2ThomasDowson123
 
Print to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryPrint to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryMartin Kalfatovic
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsTouradj Ebrahimi
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...Broadcasters Dilemma with Archive Asset Management – Torn between long term a...
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...FIAT/IFTA
 
DAT-to-file ingest - Brecht Declercq (VRT)
DAT-to-file ingest - Brecht Declercq (VRT)DAT-to-file ingest - Brecht Declercq (VRT)
DAT-to-file ingest - Brecht Declercq (VRT)Brecht Declercq
 

Similar to Iasa Presentatie (20)

Digitisaing oral history by Sally Hone
Digitisaing oral history by Sally HoneDigitisaing oral history by Sally Hone
Digitisaing oral history by Sally Hone
 
1. digital audio recording
1. digital audio recording1. digital audio recording
1. digital audio recording
 
Digital Preservation for Public Broadcasting
Digital Preservation for Public BroadcastingDigital Preservation for Public Broadcasting
Digital Preservation for Public Broadcasting
 
Audiovisual content exploitation JTS2010
Audiovisual content exploitation  JTS2010 Audiovisual content exploitation  JTS2010
Audiovisual content exploitation JTS2010
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Ig2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copyIg2 task 1 work sheet lewis brady copy
Ig2 task 1 work sheet lewis brady copy
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
IG2 Task 1 Work Sheet
IG2 Task 1 Work SheetIG2 Task 1 Work Sheet
IG2 Task 1 Work Sheet
 
Digital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the PondDigital Preservation Best Practices: Lessons Learned From Across the Pond
Digital Preservation Best Practices: Lessons Learned From Across the Pond
 
Digital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the PondDigital Presentation Best Practices: Lessons Learned From Across the Pond
Digital Presentation Best Practices: Lessons Learned From Across the Pond
 
Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2Sound recording glossary improved vershion 2
Sound recording glossary improved vershion 2
 
Print to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your LibraryPrint to Pixels: Digitizing in Your Library
Print to Pixels: Digitizing in Your Library
 
Ig2 task 1 work sheet
Ig2 task 1 work sheetIg2 task 1 work sheet
Ig2 task 1 work sheet
 
Michel Merten; Digitaliseren van video
Michel Merten; Digitaliseren van videoMichel Merten; Digitaliseren van video
Michel Merten; Digitaliseren van video
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metrics
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...Broadcasters Dilemma with Archive Asset Management – Torn between long term a...
Broadcasters Dilemma with Archive Asset Management – Torn between long term a...
 
DAT-to-file ingest - Brecht Declercq (VRT)
DAT-to-file ingest - Brecht Declercq (VRT)DAT-to-file ingest - Brecht Declercq (VRT)
DAT-to-file ingest - Brecht Declercq (VRT)
 

Iasa Presentatie

  • 1. Hidden treasures lost forever? Speech technology for the disclosure of Dutch audiovisual archives Mies Langelaar and Willemijn Heeren
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Searching the RR archives I Minimal content descriptions per hour data
  • 7. Searching the RR archives II ? ? ? ? ?
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Trusted Digital Repository Feeder System Workflow Job Queue File Storage Characterisation Preservation Planning Migration Technical Registry Active Preservation Data Management Access Reporting Storage Adaptor Passive Preservation Ingest Toolkit Preservation Controller Workflow Controller User Administrator Archivist Metadata Store
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Alignment + Speech signal Typed transcript Landgenooten waar ik enkele Begin frame # End frame # Word 00000 54400 -silence- 54400 65280 Landgenooten 65280 69120 Waar 69120 73600 Ik 73600 79520 Enkele … … …
  • 22. Automatic speech recognition Acoustic model Language model Pronunciation dictionary Speech recognition 50+ hour audio 250-500 M words Pre-processing Classification speech/nonspeech Segmentation of speakers 2 nd recognition with adapted models Word level index
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.

Editor's Notes

  1. The dense descriptions, generally per hour of audio, lead to large chunks for user exploration when results are found for a given query.
  2. The undisclosed part of the collection cannot be accessed, and its content is largely unknown.
  3. For ‘disclosure’ the speech technology researchers want “to automatically generate a time-stamped content description”. The automation will reduce the human annotation effort, and the fact that annotations are time-stamped means that words are linked to locations in the audio recording, allowing fragments to be retrieved in addition to entire audiovisual documents. The technology used for disclosure depends on (1) the available metadata, and (2) the availability of context documents, i.e. documents that are either directly related to the recording or to the topic of the recording. When there is a transcript of te recording available, the words in the transcripts can be aligned to the audio. During this process the locations of the known words are determined in the audio signal. The result is a fairly accurate index of which word was said where in the audio. When it is unknown exactly what was said in the audio recording ASR can be used to generate hypotheses of what was said where in the recording. Context documents can be valuable here to improve the models used for speech recognition. Speech recognizers generate output that is generally not without errors, but up to word error rates of 30 to 40% -- that is 3 or 4 out of every ten words were recognized in correctly-- the automatically generated content descriptions may successfully be used as search indexes. This is explained by the fact that speech is redundant, i.e. when something is on-topic it will be referred to more than once, and that many of the words that have a high risk of being mis-recognized make a relatively small contribution to the information content, i.e. prepositions (in, at), determiners (a, the) etc.
  4. How does CHoral technology fit into the archiving workflow? This of course is a simplified representation, but it gives a general idea. After content has been produced, it is transferred to the archives for preservation. The data are being stored, archivists index the collection, and users may search the index for recordings of possible interest. <start animation> CHoral uses the recordings and the existing metadata to give the user a new kind of access. In addition to searching the catalogue for recordings that can be listened to at the archive’s listening room, search results come with audio fragments that can be listened to online, e.g., from the searcher’s home or work location. <animation 2> The technology consists of automatic speech recognition for index generation, information retrieval technology for finding relevant audio fragments in the collection, and of new user interface components that support interaction with the audio fragments.
  5. During alignment the locations of known words are determined in the speech signal. By matching the acoustics in the speech signal to the expected acoustics of individual words each word in the transcript is matched to the location in the audio where it is most likely to occur. This results in an index that gives exact word positions for each word in the transcript. The accuracy of the resulting index is very high.
  6. The following type of speech recognition system is used. Before the actual recognition process is started some pre-processing is done. This consists of (1) classifying the audio document into speech and non-speech segments, so that the parts of the recordings that do not contain speech (e.g., music, street noise) are not fed to the recognition system. Moreover, the speech may be segmented into coherent chunks per speaker, so that models may be adapted to individual speakers. The speech recognition system itself consists of three components: (1) an acoustic model that models the different speech sounds of a languages, (2) a language model that models which sequences of words are likely, and (3) a dictionary that prescribes out of which speech sounds a word is made up. To develop an acoustic model, over 50 hours of annotated speech materials are needed. To develop a language model, texts of hundreds of millions words are used. The output of the ASR system is a word level index, or a hypothesis of which words were spoken where in the audio document. Instead of running the recognition process just once, the output of the first round may be used to better choose the models used during recognition. Therefore, a so-called second pass is often run with adapted models to arrive at a more accurate index.
  7. The output of an ASR system can take several forms. The most well-known form of output is in sentences, reflecting the most likely word sequence that was recognized by the system. For indexing purposes, however, other output types should be considered. One candidate are lattice structures that do not only store the most likely word sequency for a certain fragment of audio, but also alternative words that are likely at certain positions. In this way, alternatives are kept available.
  8. For successful take up of technology some investments are needed. Thanks to the ongoing digitization process as well as standardization of formats audio documents should increasingly be fit for automatic processing without further adaptations. The quality of automatic annotations depends on the quality of the ASR models, and those can be tuned to different domains by accurate transcriptions of representative samples and/or (large amounts of) text data on the same or a strongly related topic. But when an ASR system is used to automatically generate time-stamped content descriptions, should those descriptions be validated by archivists? And if so, how?
  9. A surrogate is a textual or visual represensation of the content of a spoken word document that can be used by searchers to assess a document’s contents before he/she decides to listen to the audio.