SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Penalty Functions for Evaluation Measures
of Unsegmented Speech Retrieval
Petra Galuščáková, Pavel Pecina, Jan Hajič
Institute of Formal and Applied Linguistics
Charles University in Prague
{galuscakova,pecina,hajic}@ufal.mff.cuni.cz
Motivation
● Speech Retrieval
● Retrieving information from a collection of audio data in
response to a given query
– modality of the query could be arbitrary, either text or speech
● Usually solved as text retrieval on transcriptions of the audio
obtained by ASR
● But: speech transcriptions are not 100% accurate, vocabulary is
different, speech contains additional elements speech is usually
not segmented into topically coherent passages
→ special evaluation methods for speech retrieval
are needed
Evaluation of speech retrieval I
● Known Segments Boundaries
● Speech collection is segmented to passages which can play the
role of documents
● Precision/Recall
● Average Precision
– arithmetic mean of the values of precision for the set of first
most relevant retrieved documents
● Mean Average Precision
– arithmetic mean of the AP values for the set of the queries
Evaluation of speech retrieval II
● Unknown Boudaries
● No topical segmentation, the system is expected to retrieve
exact starting points for each query
● Mean Average Segment Precision
– recently introduced, used in MediaEval
– designed for evaluation of retrieval of relevant document
parts
● Mean Generalized Average Precision
– designed to allow certain tolerance in matching search
results against a gold standard relevance assessment
– tolerance is determined by the Penalty Function
Evaluation of speech retrieval
- mGAP score
N = number of assessed starting points
Rk= reward calculated according to the Penalty Function
pk is the value of Precision for the position k calculated as:
mGAP = arithmetic mean of the GAP values for the set of the queries
GAP=
∑
Rk ≠0
pk
N
pk =
∑
i=1
k
Ri
k
Evaluation of speech retrieval
- mGAP score
● Time difference between the starting point of the topic
determined by the system and the true starting point of this
topic obtained during relevance assessment
● The actual shape of the function can be chosen arbitrarily
● The Penalty Function used in the mGAP measure in the Cross-
Language Speech Retrieval Track of CLEF 2006 and 2007
Evaluation of speech retrieval
- mGAP score
● Has been widely used, however, the measure (and the Penalty
Function itself) have not been adequately studied
● Questions:
● the Penalty Function is symmetrical and starting points
retrieved by a system in the same distance before and after
a true starting point are treated as equally good (or bad)
– “shape” of the function itself
● “width” of the Penalty Function, i.e. the maximum distance
for which the reward is non-zero
Penalty Function Proposal
Methodology
● Lab test to study the behaviour of users
● IR system simulation
● Users were presented the topics from the test collection and
playback points randomly generated in a vicinity of a starting point
of a relevant segment
● Users should have navigated in the recording and indicate when the
speaker started to talk about the given topic
● After they found the relevant segment, the participants were asked
to indicate their satisfaction with the playback point
Number of participants 24
Number of processed starting points 263
Data
● Test collection used for Cross-Language Speech-Retrieval track of
CLEF 2007
● Manually processed by human assessors – relevant passages for
given topics were identified
● Part of oral history archive from the Malach Project (Holocaust
testimonies)
● Recorded in Czech
Recordings in Malach Project 52 000
Czech recordigs in Malach Project 700
Assessed Czech recordings 357
Average length of the recording 95 min
Processed topics 116
Results
Time analysis
● We measure the elapsed time between the beginning of playback
and the moment when the participant presses the button indicating
that the relevant passage was found
● Respondents generally need less time to complete the task when
the playback point is located before the true starting point
Users’ satisfaction
● Participants were requested to indicate to what extent they were
happy with the location of the playback points in the scale of: very
good, good, bad or very bad
● Trend not clear - most satisfied when the playback reference point
lies shortly before the true starting point but function value
decreases more slowly for positive time
Proposed mGAP
Modifications
1) Users prefer playback points appearing before the beginning of
a true relevant passages to those appearing after, i.e. more
reward should be given to playback points appearing before the
true starting point of a relevant segment
2) Users are tolerant to playback points appearing within a 1-
minute distance from the true starting points. i.e. equal
(maximum) reward should be given to all playback points which
are closer than one minute to the true starting point.
3) Users are still satisfied when playback points appear in two- or
three- minute distance from the true starting point. i.e. function
should be “wider”.
Proposed mGAP
Modifications
Comparison with the Original
Measure
● Outputs of CLEF 2007 Cross-Language Speech Retrieval Track
● 15 retrieval system scored with the original and proposed
Penalty functions
● High correlation
Conclusion
Conclusion
● We described evaluation of speech retrieval (segmented/not
segmented)
● Described mGAP, penalty function drawbacks
● We organized human-based lab test
● Based on lab test results we modified Penalty Function
● Finally compared modified Penalty Function with the original
function
Thank you

Weitere ähnliche Inhalte

Andere mochten auch

Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Petra Galuscakova
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics Aseel K. Mahmood
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySeminar Links
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionPetra Galuscakova
 

Andere mochten auch (7)

CV-Mohsin
CV-MohsinCV-Mohsin
CV-Mohsin
 
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
Experiments with Segmentation Strategies for Passage Retrieval in Audio-Visua...
 
Unit01 bac
Unit01 bacUnit01 bac
Unit01 bac
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
speech production in psycholinguistics
speech production in psycholinguistics speech production in psycholinguistics
speech production in psycholinguistics
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Czech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test CollectionCzech Malach Cross-lingual Speech Retrieval Test Collection
Czech Malach Cross-lingual Speech Retrieval Test Collection
 

Ähnlich wie Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval

Using EEG when usability testing
Using EEG when usability testingUsing EEG when usability testing
Using EEG when usability testingCaroline Jarrett
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2vrgokila
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUSYuki Saito
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
e3-chap-09.ppt
e3-chap-09.ppte3-chap-09.ppt
e3-chap-09.pptKingSh2
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docxECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docxjenkinsmandie
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Mapsivaderivader
 
CS571: Introduction
CS571: IntroductionCS571: Introduction
CS571: IntroductionJinho Choi
 
HCI 3e - Ch 9: Evaluation techniques
HCI 3e - Ch 9:  Evaluation techniquesHCI 3e - Ch 9:  Evaluation techniques
HCI 3e - Ch 9: Evaluation techniquesAlan Dix
 

Ähnlich wie Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval (20)

Using EEG when usability testing
Using EEG when usability testingUsing EEG when usability testing
Using EEG when usability testing
 
Human Computer Interaction Evaluation
Human Computer Interaction EvaluationHuman Computer Interaction Evaluation
Human Computer Interaction Evaluation
 
evaluation technique uni 2
evaluation technique uni 2evaluation technique uni 2
evaluation technique uni 2
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
saito22research_talk_at_NUS
saito22research_talk_at_NUSsaito22research_talk_at_NUS
saito22research_talk_at_NUS
 
E3 chap-09
E3 chap-09E3 chap-09
E3 chap-09
 
5. bleu
5. bleu5. bleu
5. bleu
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
Evaluation techniques
Evaluation techniquesEvaluation techniques
Evaluation techniques
 
OSCE.pptx
OSCE.pptxOSCE.pptx
OSCE.pptx
 
e3-chap-09.ppt
e3-chap-09.ppte3-chap-09.ppt
e3-chap-09.ppt
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docxECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
ECET350 Final Exam Study GuideYOU MAY WANT TO PRINT THIS GUIDE.1.docx
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
ETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google MapsETA Prediction with Graph Neural Networks in Google Maps
ETA Prediction with Graph Neural Networks in Google Maps
 
CS571: Introduction
CS571: IntroductionCS571: Introduction
CS571: Introduction
 
Learning
LearningLearning
Learning
 
HCI 3e - Ch 9: Evaluation techniques
HCI 3e - Ch 9:  Evaluation techniquesHCI 3e - Ch 9:  Evaluation techniques
HCI 3e - Ch 9: Evaluation techniques
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval

  • 1. Penalty Functions for Evaluation Measures of Unsegmented Speech Retrieval Petra Galuščáková, Pavel Pecina, Jan Hajič Institute of Formal and Applied Linguistics Charles University in Prague {galuscakova,pecina,hajic}@ufal.mff.cuni.cz
  • 2. Motivation ● Speech Retrieval ● Retrieving information from a collection of audio data in response to a given query – modality of the query could be arbitrary, either text or speech ● Usually solved as text retrieval on transcriptions of the audio obtained by ASR ● But: speech transcriptions are not 100% accurate, vocabulary is different, speech contains additional elements speech is usually not segmented into topically coherent passages → special evaluation methods for speech retrieval are needed
  • 3. Evaluation of speech retrieval I ● Known Segments Boundaries ● Speech collection is segmented to passages which can play the role of documents ● Precision/Recall ● Average Precision – arithmetic mean of the values of precision for the set of first most relevant retrieved documents ● Mean Average Precision – arithmetic mean of the AP values for the set of the queries
  • 4. Evaluation of speech retrieval II ● Unknown Boudaries ● No topical segmentation, the system is expected to retrieve exact starting points for each query ● Mean Average Segment Precision – recently introduced, used in MediaEval – designed for evaluation of retrieval of relevant document parts ● Mean Generalized Average Precision – designed to allow certain tolerance in matching search results against a gold standard relevance assessment – tolerance is determined by the Penalty Function
  • 5. Evaluation of speech retrieval - mGAP score N = number of assessed starting points Rk= reward calculated according to the Penalty Function pk is the value of Precision for the position k calculated as: mGAP = arithmetic mean of the GAP values for the set of the queries GAP= ∑ Rk ≠0 pk N pk = ∑ i=1 k Ri k
  • 6. Evaluation of speech retrieval - mGAP score ● Time difference between the starting point of the topic determined by the system and the true starting point of this topic obtained during relevance assessment ● The actual shape of the function can be chosen arbitrarily ● The Penalty Function used in the mGAP measure in the Cross- Language Speech Retrieval Track of CLEF 2006 and 2007
  • 7. Evaluation of speech retrieval - mGAP score ● Has been widely used, however, the measure (and the Penalty Function itself) have not been adequately studied ● Questions: ● the Penalty Function is symmetrical and starting points retrieved by a system in the same distance before and after a true starting point are treated as equally good (or bad) – “shape” of the function itself ● “width” of the Penalty Function, i.e. the maximum distance for which the reward is non-zero
  • 9. Methodology ● Lab test to study the behaviour of users ● IR system simulation ● Users were presented the topics from the test collection and playback points randomly generated in a vicinity of a starting point of a relevant segment ● Users should have navigated in the recording and indicate when the speaker started to talk about the given topic ● After they found the relevant segment, the participants were asked to indicate their satisfaction with the playback point Number of participants 24 Number of processed starting points 263
  • 10.
  • 11. Data ● Test collection used for Cross-Language Speech-Retrieval track of CLEF 2007 ● Manually processed by human assessors – relevant passages for given topics were identified ● Part of oral history archive from the Malach Project (Holocaust testimonies) ● Recorded in Czech Recordings in Malach Project 52 000 Czech recordigs in Malach Project 700 Assessed Czech recordings 357 Average length of the recording 95 min Processed topics 116
  • 13. Time analysis ● We measure the elapsed time between the beginning of playback and the moment when the participant presses the button indicating that the relevant passage was found ● Respondents generally need less time to complete the task when the playback point is located before the true starting point
  • 14. Users’ satisfaction ● Participants were requested to indicate to what extent they were happy with the location of the playback points in the scale of: very good, good, bad or very bad ● Trend not clear - most satisfied when the playback reference point lies shortly before the true starting point but function value decreases more slowly for positive time
  • 15. Proposed mGAP Modifications 1) Users prefer playback points appearing before the beginning of a true relevant passages to those appearing after, i.e. more reward should be given to playback points appearing before the true starting point of a relevant segment 2) Users are tolerant to playback points appearing within a 1- minute distance from the true starting points. i.e. equal (maximum) reward should be given to all playback points which are closer than one minute to the true starting point. 3) Users are still satisfied when playback points appear in two- or three- minute distance from the true starting point. i.e. function should be “wider”.
  • 17. Comparison with the Original Measure ● Outputs of CLEF 2007 Cross-Language Speech Retrieval Track ● 15 retrieval system scored with the original and proposed Penalty functions ● High correlation
  • 19. Conclusion ● We described evaluation of speech retrieval (segmented/not segmented) ● Described mGAP, penalty function drawbacks ● We organized human-based lab test ● Based on lab test results we modified Penalty Function ● Finally compared modified Penalty Function with the original function