SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Audio Information for Hyperlinking
of TV Content
Petra Galuščáková and Pavel Pecina
galuscakova@ufal.mff.cuni.cz
Faculty of Mathematics and Physics
Charles University in Prague
SLAM Workshop, 30. 10. 2015
2
Hyperlinking TV Content
● Our main objective: create hyperlinks
● Retrieve segments similar to a given query segment from
the collection of television programmes.
● Benefits:
● Recommendation – bring additional entertainment value
● Exploratory search – explore the topic and enable users to
find unexplored connections
3
BBC Broadcast Data
● Subtitles
● Three ASR transcripts
● LIMSI
– word variants occurring at the same time
– confidence of each word variant
● TED-LIUM
– confidence of each word
● NST-Sheffield
● Metadata
● Prosodic features
4
System Description
● Retrieve relevant segments
● Divide documents into 60-second long segments
● A new segment is created each 10 seconds
● Index textual segments
● Post-filter retrieved segments
● A query segment is transformed to textual query
● Terrier IR Framework
● Speech retrieval
● Suffering from problems associated with ASR systems
5
Speech Retrieval Problems
1. Restricted vocabulary
● Data and query segment expansion
● Combination of transcripts
2. Lack of reliability
● Utilizing only the most confident words of the
transcripts
● Using confidence score
3. Lack of content
● Audio music information
● Acoustic similarity
6
1. Restricted Vocabulary
● Number of unique words in transcripts is almost three
times smaller than in subtitles.
● Low frequency words are expected to be the most
informative for the information retrieval.
● Expand data and query segments
● Metadata
● Content surrounding the query segment
● Combine different transcripts
7
Data and Query Segment
Expansion
● Metadata
● Concatenate each data and query segment with
metadata of the corresponding file.
● Title, episode title, description, short episode synopsis,
service name, and program variant
● Content surrounding the query segment
● Use 200 seconds before and after the query segment.
8
Data and Query Segment
Expansion Results
9
MAP-bin vs. WER
MAP-bin
WER
10
Data and Query Segment
Expansion Results
● The improvement is significant in terms both measures.
● Expansion using metadata and context may substantially
reduce query expansion problem.
● The highest MAP-tol score was achieved on the LIUM
transcript.
● Even though the transcripts have a relatively high WER.
● The metadata and context produce much higher relative
improvement to the automatic transcripts than to the
subtitles.
● MAP-bin score corresponds with the WER
11
Transcripts Combination
MAP-bin MAP-tol
12
Transcripts Combination
● The combination is generally helpful.
● Even though the high score achieved by the LIUM
transcripts
● The overall highest MAP-bin score was achieved using
union of the LIMSI and NST transcripts.
● Outperforms the results achieved with the subtitles
13
2. Transcript Reliability
● WER
● LIMSI: 57.5%
● TED-LIUM: 65.1%
● NST-Sheffield: 58.6%
● Word variants
● Word confidence
14
Word Variants
● Compare utilization of the first, most reliable word and all
word variants in LIMSI transcripts.
15
Word Confidence
● Only use words with high confidence scores
● Only the words from LIMSI and LIUM transcripts with a
confidence score higher than a given threshold
● Increased both scores for the development set
● It did not outperform fully transcribed test data
● We also experimented with voting
16
3. Lack of Content
● We only use content of the subtitles/transcripts
● A wide range of acoustic attributes could also be
utilized: applause, music, shouts, explosions, whispers,
background noise, …
● Acoustic fingerprinting
● Acoustic similarity
17
Acoustic Fingerprinting
Motivation
● Obtain additional information from the music
contained within the query segment
● Especially helpful for hyperlinking music programmes
18
Acoustic Fingerprinting
● 1) Minimize noise in each query segment
● Query segments were divided into 10-second long
passages; a new passage was created each second
● 2) Submit sub-segments to Doreso API service
● 3) Retrieve song title, artist and album
● Development set: 4 queries out of 30
● Test set: 10 queries out of 30
● 4) Concatenate title and artist and album name with
text of query segment
● Both retrieval scores drop
19
Acoustic Similarity
Motivation
● Retrieve identical acoustic segments
● E.g. signature tunes and jingles
● Detect semantically related segments
● E.g. segments containing action
scenes and music
20
Acoustic Similarity
● Calculate similarity between data and query vector
sequences of prosodic features
● Find the most similar sequences near the beginning
● Linearly combine the highest acoustic similarity with text-
based similarity score
● MAP-bin: 0.2689 0.2687
● MAP-tol: 0.2465 0.2473
21
Conclusion
22
Overview
Restricted vocabulary
Data expansion +
Transcripts combination +
Transcript reliability
Word variants +
Word confidence -
Lack of content
Acoustic fingerprinting -
Acoustic similarity +
23
Thank you
This research has been supported by the project AMALACH (grant n. DF12P01OVV022 of the
program NAKI of the Ministry of Culture of the Czech Republic), the Czech Science Foundation
(grant n. P103/12/G084), and the Charles University Grant Agency (grant n. 920913).

Weitere ähnliche Inhalte

Andere mochten auch

What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...
What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...
What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...Twixor - Conversational Messaging Platform
 
Сущность нормирования труда
Сущность нормирования трудаСущность нормирования труда
Сущность нормирования трудаKsenBogdan
 
Lighting in music videos
Lighting in music videosLighting in music videos
Lighting in music videosMegan English
 
Global low -e glass market by type by application - opportunities and forecas...
Global low -e glass market by type by application - opportunities and forecas...Global low -e glass market by type by application - opportunities and forecas...
Global low -e glass market by type by application - opportunities and forecas...AzothAnalytics
 
Sound in TV & Film
Sound in TV & FilmSound in TV & Film
Sound in TV & Filmasdfghjkl33
 
Types of audio used in tv
Types of audio used in tvTypes of audio used in tv
Types of audio used in tvDestiny Tambwe
 
365squared Presentation - FASG
365squared Presentation - FASG365squared Presentation - FASG
365squared Presentation - FASGRoneel Prasad
 
Audio Essentials for Broadcast and Multiscreen
Audio Essentials for Broadcast and MultiscreenAudio Essentials for Broadcast and Multiscreen
Audio Essentials for Broadcast and MultiscreenEllis Reid
 
Maintaining Audio Quality In The Broadcast Facility 2011
Maintaining Audio Quality In The Broadcast Facility 2011Maintaining Audio Quality In The Broadcast Facility 2011
Maintaining Audio Quality In The Broadcast Facility 2011Radikal Ltd.
 
Principles of broadcasting
Principles of broadcastingPrinciples of broadcasting
Principles of broadcastingJun Tariman
 
Digital Video Broadcasting (DVB)
Digital Video Broadcasting (DVB)Digital Video Broadcasting (DVB)
Digital Video Broadcasting (DVB)Anees Akhtar
 

Andere mochten auch (14)

What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...
What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...
What is the difference between ‘P2P’ and ‘A2P’ messaging? What can we expect ...
 
BOLETÍN INFORMATIVO
BOLETÍN INFORMATIVOBOLETÍN INFORMATIVO
BOLETÍN INFORMATIVO
 
Сущность нормирования труда
Сущность нормирования трудаСущность нормирования труда
Сущность нормирования труда
 
Lighting in music videos
Lighting in music videosLighting in music videos
Lighting in music videos
 
Global low -e glass market by type by application - opportunities and forecas...
Global low -e glass market by type by application - opportunities and forecas...Global low -e glass market by type by application - opportunities and forecas...
Global low -e glass market by type by application - opportunities and forecas...
 
Sound in TV & Film
Sound in TV & FilmSound in TV & Film
Sound in TV & Film
 
Types of audio used in tv
Types of audio used in tvTypes of audio used in tv
Types of audio used in tv
 
365squared Presentation - FASG
365squared Presentation - FASG365squared Presentation - FASG
365squared Presentation - FASG
 
Audio Essentials for Broadcast and Multiscreen
Audio Essentials for Broadcast and MultiscreenAudio Essentials for Broadcast and Multiscreen
Audio Essentials for Broadcast and Multiscreen
 
Maintaining Audio Quality In The Broadcast Facility 2011
Maintaining Audio Quality In The Broadcast Facility 2011Maintaining Audio Quality In The Broadcast Facility 2011
Maintaining Audio Quality In The Broadcast Facility 2011
 
Tv broadcasting
Tv broadcasting Tv broadcasting
Tv broadcasting
 
Digital audio
Digital audioDigital audio
Digital audio
 
Principles of broadcasting
Principles of broadcastingPrinciples of broadcasting
Principles of broadcasting
 
Digital Video Broadcasting (DVB)
Digital Video Broadcasting (DVB)Digital Video Broadcasting (DVB)
Digital Video Broadcasting (DVB)
 

Ähnlich wie Audio Information for Hyperlinking of TV Content

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Petra Galuscakova
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesXavier Anguera
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Sebastian Ruder
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptxMounika715343
 
application_layer (1).pdf
application_layer (1).pdfapplication_layer (1).pdf
application_layer (1).pdflathass5
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)University of Piraeus
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?BIOVIA
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisProvectus
 
IRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET Journal
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalGanesh Harugeri
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...TELKOMNIKA JOURNAL
 
EXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsEXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsStuart Myles
 

Ähnlich wie Audio Information for Hyperlinking of TV Content (20)

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
 
final ppt BATCH 3.pptx
final ppt BATCH 3.pptxfinal ppt BATCH 3.pptx
final ppt BATCH 3.pptx
 
application_layer (1).pdf
application_layer (1).pdfapplication_layer (1).pdf
application_layer (1).pdf
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)
A Personalized Audio Server using MPEG-7 and MPEG-21 standards (presentation)
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Lenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesisLenar Gabdrakhmanov (Provectus): Speech synthesis
Lenar Gabdrakhmanov (Provectus): Speech synthesis
 
IRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language ProcessingIRJET- Audio Data Summarization System using Natural Language Processing
IRJET- Audio Data Summarization System using Natural Language Processing
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information Retrieval
 
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
Streaming Audio Using MPEG–7 Audio Spectrum Envelope to Enable Self-similarit...
 
EXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for NewsEXTRA Open Source Rules Classification for News
EXTRA Open Source Rules Classification for News
 

Mehr von Petra Galuscakova

Combining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalCombining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalPetra Galuscakova
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionPetra Galuscakova
 
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachEvaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachPetra Galuscakova
 
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskPetra Galuscakova
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiPetra Galuscakova
 
Application of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalApplication of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalPetra Galuscakova
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPetra Galuscakova
 

Mehr von Petra Galuscakova (7)

Combining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information RetrievalCombining Evidence for Cross-language Information Retrieval
Combining Evidence for Cross-language Information Retrieval
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
 
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkachEvaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
Evaluácia tematického vyhľadávania v audiovizuálnych nahrávkach
 
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech TaskCUNI at MediaEval 2013 Similar Segments in Social Speech Task
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
 
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmiČesko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
Česko-slovenský paralelný korpus určený pre preklad medzi blízkymi jazykmi
 
Application of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information RetrievalApplication of Topic Segmentation in Audiovisual Information Retrieval
Application of Topic Segmentation in Audiovisual Information Retrieval
 
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech RetrievalPenalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval
 

Kürzlich hochgeladen

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Kürzlich hochgeladen (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

Audio Information for Hyperlinking of TV Content

  • 1. Audio Information for Hyperlinking of TV Content Petra Galuščáková and Pavel Pecina galuscakova@ufal.mff.cuni.cz Faculty of Mathematics and Physics Charles University in Prague SLAM Workshop, 30. 10. 2015
  • 2. 2 Hyperlinking TV Content ● Our main objective: create hyperlinks ● Retrieve segments similar to a given query segment from the collection of television programmes. ● Benefits: ● Recommendation – bring additional entertainment value ● Exploratory search – explore the topic and enable users to find unexplored connections
  • 3. 3 BBC Broadcast Data ● Subtitles ● Three ASR transcripts ● LIMSI – word variants occurring at the same time – confidence of each word variant ● TED-LIUM – confidence of each word ● NST-Sheffield ● Metadata ● Prosodic features
  • 4. 4 System Description ● Retrieve relevant segments ● Divide documents into 60-second long segments ● A new segment is created each 10 seconds ● Index textual segments ● Post-filter retrieved segments ● A query segment is transformed to textual query ● Terrier IR Framework ● Speech retrieval ● Suffering from problems associated with ASR systems
  • 5. 5 Speech Retrieval Problems 1. Restricted vocabulary ● Data and query segment expansion ● Combination of transcripts 2. Lack of reliability ● Utilizing only the most confident words of the transcripts ● Using confidence score 3. Lack of content ● Audio music information ● Acoustic similarity
  • 6. 6 1. Restricted Vocabulary ● Number of unique words in transcripts is almost three times smaller than in subtitles. ● Low frequency words are expected to be the most informative for the information retrieval. ● Expand data and query segments ● Metadata ● Content surrounding the query segment ● Combine different transcripts
  • 7. 7 Data and Query Segment Expansion ● Metadata ● Concatenate each data and query segment with metadata of the corresponding file. ● Title, episode title, description, short episode synopsis, service name, and program variant ● Content surrounding the query segment ● Use 200 seconds before and after the query segment.
  • 8. 8 Data and Query Segment Expansion Results
  • 10. 10 Data and Query Segment Expansion Results ● The improvement is significant in terms both measures. ● Expansion using metadata and context may substantially reduce query expansion problem. ● The highest MAP-tol score was achieved on the LIUM transcript. ● Even though the transcripts have a relatively high WER. ● The metadata and context produce much higher relative improvement to the automatic transcripts than to the subtitles. ● MAP-bin score corresponds with the WER
  • 12. 12 Transcripts Combination ● The combination is generally helpful. ● Even though the high score achieved by the LIUM transcripts ● The overall highest MAP-bin score was achieved using union of the LIMSI and NST transcripts. ● Outperforms the results achieved with the subtitles
  • 13. 13 2. Transcript Reliability ● WER ● LIMSI: 57.5% ● TED-LIUM: 65.1% ● NST-Sheffield: 58.6% ● Word variants ● Word confidence
  • 14. 14 Word Variants ● Compare utilization of the first, most reliable word and all word variants in LIMSI transcripts.
  • 15. 15 Word Confidence ● Only use words with high confidence scores ● Only the words from LIMSI and LIUM transcripts with a confidence score higher than a given threshold ● Increased both scores for the development set ● It did not outperform fully transcribed test data ● We also experimented with voting
  • 16. 16 3. Lack of Content ● We only use content of the subtitles/transcripts ● A wide range of acoustic attributes could also be utilized: applause, music, shouts, explosions, whispers, background noise, … ● Acoustic fingerprinting ● Acoustic similarity
  • 17. 17 Acoustic Fingerprinting Motivation ● Obtain additional information from the music contained within the query segment ● Especially helpful for hyperlinking music programmes
  • 18. 18 Acoustic Fingerprinting ● 1) Minimize noise in each query segment ● Query segments were divided into 10-second long passages; a new passage was created each second ● 2) Submit sub-segments to Doreso API service ● 3) Retrieve song title, artist and album ● Development set: 4 queries out of 30 ● Test set: 10 queries out of 30 ● 4) Concatenate title and artist and album name with text of query segment ● Both retrieval scores drop
  • 19. 19 Acoustic Similarity Motivation ● Retrieve identical acoustic segments ● E.g. signature tunes and jingles ● Detect semantically related segments ● E.g. segments containing action scenes and music
  • 20. 20 Acoustic Similarity ● Calculate similarity between data and query vector sequences of prosodic features ● Find the most similar sequences near the beginning ● Linearly combine the highest acoustic similarity with text- based similarity score ● MAP-bin: 0.2689 0.2687 ● MAP-tol: 0.2465 0.2473
  • 22. 22 Overview Restricted vocabulary Data expansion + Transcripts combination + Transcript reliability Word variants + Word confidence - Lack of content Acoustic fingerprinting - Acoustic similarity +
  • 23. 23 Thank you This research has been supported by the project AMALACH (grant n. DF12P01OVV022 of the program NAKI of the Ministry of Culture of the Czech Republic), the Czech Science Foundation (grant n. P103/12/G084), and the Charles University Grant Agency (grant n. 920913).