Audio Information for Hyperlinking of TV Content

Audio Information for Hyperlinking
of TV Content
Petra Galuščáková and Pavel Pecina
galuscakova@ufal.mff.cuni.cz
Faculty of Mathematics and Physics
Charles University in Prague
SLAM Workshop, 30. 10. 2015

2
Hyperlinking TV Content
● Our main objective: create hyperlinks
● Retrieve segments similar to a given query segment from
the collection of television programmes.
● Benefits:
● Recommendation – bring additional entertainment value
● Exploratory search – explore the topic and enable users to
find unexplored connections

3
BBC Broadcast Data
● Subtitles
● Three ASR transcripts
● LIMSI
– word variants occurring at the same time
– confidence of each word variant
● TED-LIUM
– confidence of each word
● NST-Sheffield
● Metadata
● Prosodic features

4
System Description
● Retrieve relevant segments
● Divide documents into 60-second long segments
● A new segment is created each 10 seconds
● Index textual segments
● Post-filter retrieved segments
● A query segment is transformed to textual query
● Terrier IR Framework
● Speech retrieval
● Suffering from problems associated with ASR systems

5
Speech Retrieval Problems
1. Restricted vocabulary
● Data and query segment expansion
● Combination of transcripts
2. Lack of reliability
● Utilizing only the most confident words of the
transcripts
● Using confidence score
3. Lack of content
● Audio music information
● Acoustic similarity

6
1. Restricted Vocabulary
● Number of unique words in transcripts is almost three
times smaller than in subtitles.
● Low frequency words are expected to be the most
informative for the information retrieval.
● Expand data and query segments
● Metadata
● Content surrounding the query segment
● Combine different transcripts

7
Data and Query Segment
Expansion
● Metadata
● Concatenate each data and query segment with
metadata of the corresponding file.
● Title, episode title, description, short episode synopsis,
service name, and program variant
● Content surrounding the query segment
● Use 200 seconds before and after the query segment.

8
Expansion Results

10
Expansion Results
● The improvement is significant in terms both measures.
● Expansion using metadata and context may substantially
reduce query expansion problem.
● The highest MAP-tol score was achieved on the LIUM
transcript.
● Even though the transcripts have a relatively high WER.
● The metadata and context produce much higher relative
improvement to the automatic transcripts than to the
subtitles.
● MAP-bin score corresponds with the WER

11
Transcripts Combination
MAP-bin MAP-tol

12
Transcripts Combination
● The combination is generally helpful.
● Even though the high score achieved by the LIUM
transcripts
● The overall highest MAP-bin score was achieved using
union of the LIMSI and NST transcripts.
● Outperforms the results achieved with the subtitles

13
2. Transcript Reliability
● WER
● LIMSI: 57.5%
● TED-LIUM: 65.1%
● NST-Sheffield: 58.6%
● Word variants
● Word confidence

14
Word Variants
● Compare utilization of the first, most reliable word and all
word variants in LIMSI transcripts.

15
Word Confidence
● Only use words with high confidence scores
● Only the words from LIMSI and LIUM transcripts with a
confidence score higher than a given threshold
● Increased both scores for the development set
● It did not outperform fully transcribed test data
● We also experimented with voting

16
3. Lack of Content
● We only use content of the subtitles/transcripts
● A wide range of acoustic attributes could also be
utilized: applause, music, shouts, explosions, whispers,
background noise, …
● Acoustic fingerprinting
● Acoustic similarity

17
Acoustic Fingerprinting
Motivation
● Obtain additional information from the music
contained within the query segment
● Especially helpful for hyperlinking music programmes

18
Acoustic Fingerprinting
● 1) Minimize noise in each query segment
● Query segments were divided into 10-second long
passages; a new passage was created each second
● 2) Submit sub-segments to Doreso API service
● 3) Retrieve song title, artist and album
● Development set: 4 queries out of 30
● Test set: 10 queries out of 30
● 4) Concatenate title and artist and album name with
text of query segment
● Both retrieval scores drop

19
Acoustic Similarity
Motivation
● Retrieve identical acoustic segments
● E.g. signature tunes and jingles
● Detect semantically related segments
● E.g. segments containing action
scenes and music

20
Acoustic Similarity
● Calculate similarity between data and query vector
sequences of prosodic features
● Find the most similar sequences near the beginning
● Linearly combine the highest acoustic similarity with text-
based similarity score
● MAP-bin: 0.2689 0.2687
● MAP-tol: 0.2465 0.2473

22
Overview
Restricted vocabulary
Data expansion +
Transcripts combination +
Transcript reliability
Word variants +
Word confidence -
Lack of content
Acoustic fingerprinting -
Acoustic similarity +

23
Thank you
This research has been supported by the project AMALACH (grant n. DF12P01OVV022 of the
program NAKI of the Ministry of Culture of the Czech Republic), the Czech Science Foundation
(grant n. P103/12/G084), and the Charles University Grant Agency (grant n. 920913).

Audio Information for Hyperlinking of TV Content

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (14)

Ähnlich wie Audio Information for Hyperlinking of TV Content

Ähnlich wie Audio Information for Hyperlinking of TV Content (20)

Mehr von Petra Galuscakova

Mehr von Petra Galuscakova (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Audio Information for Hyperlinking of TV Content