SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Jorge Proença 1,2
Arlindo Veiga 1,2
Fernando Perdigão 1,2
The SPL-IT Query by Example Search on Speech
system for MediaEval 2014
The 2014 Query by Example Search on Speech (QUESST)
1 Instituto de Telecomunicações,
Coimbra, Portugal
2 Electrical and Computer Eng.
Department,
University of Coimbra, Portugal
2
SPL-IT system
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Overview of the system:
 Fuses Dynamic Time Warping (DTW) modifications
 Fuses results from systems with phonetic recognizers for 3
languages
3
Phonetic Recognizer
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Hard to extract good posteriorgrams with an HMM system (our in-
house system).
 Used 3 systems/languages (for 8 kHz) based on long temporal context
and neural networks from Brnu University of Technology (BUT):
 Czech
 Hungarian
 Russian
 Output: posteriorgrams (3 states per phoneme).
 Leading and trailing silence/noise removed
PhonemeState
Frame
State Posteriorgram example for one query
4
Dynamic Time Warping
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Local Distance matrix:
 Dot Product of Query and Audio posterior probability vectors;
 Back-off with l =10-4
   , logD q x q x  
Distance Matrix of Query vs Audio
5
Dynamic Time Warping
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Basic DTW strategy (A1):
 Smallest distance in identically
weighted unitary jumps:
Distance Matrix (top) and accumulated Distance matrix (bottom) of Query vs Audio
6
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 4 additional approaches:
(A2) – Cutting up to 250ms at the end of the query,
keeping the segment above 500ms
(A3) – Cutting up to 250ms at the beginning of the query,
keeping the segment above 500ms
QueryQuery
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A2 (bottom)
7
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
(A4) – Allowing one jump in the path up to ½ Query’s length,
can’t occur at initial and final 250ms of the query
can’t occur for queries shorter than 800ms
QueryQuery
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A4 (bottom)
8
DTW Modifications
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
(A5) – Swaps: accounting for re-ordering of words.
Backtrack the best 5 candidates from (A1) from the end,
Find the best path for the beginning of the query, ahead of the
end of the first one, with restrictions similar to (A4).QueryQuery
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A5 (bottom)
9
Fusing systems
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Different approaches:
 Minimum of the approaches – not the best.
 Harmonic mean found to be a good compromise.
 Per-query normalization (standard score):
 Different languages:
 Arithmetic mean of the 3 scores.

X
10
Submissions and Results
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Primary: fusing (A1) and (A2) (basic and cutting the end)
 Late: fusing the 5 approaches.
 Late provided worse overall results
primary late
Cnxe, MinCnxe - Dev 0.6797, 0.5438 0.7106, 0.5881
Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240
ATWV, MTWV - Dev 0.4494, 0.4494 0.4051, 0.4052
ATWV, MTWV - Eval 0.4399, 0.4423 0.3918, 0.4218
11
Submissions and Results (cont.)
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Primary: fusing (A1) and (A2) (basic and cutting the end)
 Late: fusing the 5 approaches.
Cnxe for isolated approaches on Eval:
 A1: 0.6823, A2: 0.6721, A3: 0.6947, A4: 0.6957 A5: 0.6999
 For Type 3 queries, late system was better:
 0.8049 Cnxe on primary to 0.7865 Cnxe on late
primary late
Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240
12
Conclusions
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
 Although this year’s task has an added difficulty, a simple DTW still works
well for most cases.
 Cutting queries at the end revealed to be the best strategy, and fusing it
with A1 was even better.
 Including the possibility of jumps and re-orders increased False Positives
overall, since these special cases are a small part of the database.
 We lacked an optimization method for Cnxe
 Which would greatly improve the results.
13
END – Thank You
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Processing Speed:
 Hardware – CRAY CX1 Cluster, running windows server 2008 HPC, using 16 of 56
cores (7 nodes with double Intel Xeon 5520 2.27GHz quad-core and 24GB RAM per
node).
 Indexing Speed Factor – 1.4
 Searching Speed Factor – 0.0029 per sec and per language
 Peak Memory – 0.098 GB

Weitere ähnliche Inhalte

Andere mochten auch

UPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection TaskUPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection Task
multimediaeval
 

Andere mochten auch (11)

Emotion in Music Task at MediaEval 2014
Emotion in Music Task at MediaEval 2014Emotion in Music Task at MediaEval 2014
Emotion in Music Task at MediaEval 2014
 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014
 
TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bas...
TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bas...TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bas...
TALP-UPC at MediaEval 2014 Placing Task: Combining Geographical Knowledge Bas...
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
 
UPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection TaskUPC at MediaEval 2014 Social Event Detection Task
UPC at MediaEval 2014 Social Event Detection Task
 
Synchronizing Multi-User Photo Galleries with MRF
Synchronizing Multi-User Photo Galleries with MRFSynchronizing Multi-User Photo Galleries with MRF
Synchronizing Multi-User Photo Galleries with MRF
 
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata TaskStravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
Stravinsqi/De Montfort University at the MediaEval 2014 C@merata Task
 
4845 Programa de Embajadores Rotarios 2016 2017
4845 Programa de Embajadores Rotarios 2016 20174845 Programa de Embajadores Rotarios 2016 2017
4845 Programa de Embajadores Rotarios 2016 2017
 
04 sem cert hparticipation_final
04 sem cert hparticipation_final04 sem cert hparticipation_final
04 sem cert hparticipation_final
 
RECOD at MediaEval 2014: Violent Scenes Detection Task
RECOD at MediaEval 2014: Violent Scenes Detection TaskRECOD at MediaEval 2014: Violent Scenes Detection Task
RECOD at MediaEval 2014: Violent Scenes Detection Task
 
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” TaskThe Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task
 

Ähnlich wie T he SPL - IT Query by Example Search on Speech system for MediaEval 2014

Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
IJAAS Team
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
niranjan kumar
 
USRP Project Final Report
USRP Project Final ReportUSRP Project Final Report
USRP Project Final Report
Arjan Gupta
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2
Minh Tuan Nguyen
 

Ähnlich wie T he SPL - IT Query by Example Search on Speech system for MediaEval 2014 (20)

2014 CISAP6 Atmospheric Turbulent Dispersion Modeling Methods using Machine L...
2014 CISAP6 Atmospheric Turbulent Dispersion Modeling Methods using Machine L...2014 CISAP6 Atmospheric Turbulent Dispersion Modeling Methods using Machine L...
2014 CISAP6 Atmospheric Turbulent Dispersion Modeling Methods using Machine L...
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
MediaEval2015 - The SPL-IT-UC Query by Example Search on Speech system for Me...
 
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
Performance Analysis of MIMO–OFDM for PCHBF , RELAY Technique with MMSE For T...
 
Master Thesis of Computer Engineering: OpenTranslator
Master Thesis of Computer Engineering: OpenTranslatorMaster Thesis of Computer Engineering: OpenTranslator
Master Thesis of Computer Engineering: OpenTranslator
 
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...Convolutional Neural Network to Model Articulation Impairments in Patients wi...
Convolutional Neural Network to Model Articulation Impairments in Patients wi...
 
Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
Spectral Efficient Blind Channel Estimation Technique for MIMO-OFDM Communica...
 
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
MediaEval 2015 - The SPL-IT-UC Query by Example Search on Speech system for M...
 
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
VOICE PASSWORD BASED SPEAKER VERIFICATION SYSTEM USING VOWEL AND NON VOWEL RE...
 
A Summative Comparison of Blind Channel Estimation Techniques for Orthogonal ...
A Summative Comparison of Blind Channel Estimation Techniques for Orthogonal ...A Summative Comparison of Blind Channel Estimation Techniques for Orthogonal ...
A Summative Comparison of Blind Channel Estimation Techniques for Orthogonal ...
 
USRP Project Final Report
USRP Project Final ReportUSRP Project Final Report
USRP Project Final Report
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 
Peak detection using wavelet transform
Peak detection using wavelet transformPeak detection using wavelet transform
Peak detection using wavelet transform
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
Nni v7
Nni v7Nni v7
Nni v7
 
ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2ICT-GroupProject-Report2-NguyenDangHoa_2
ICT-GroupProject-Report2-NguyenDangHoa_2
 
J03502050055
J03502050055J03502050055
J03502050055
 
An experience on empirical research about rdf stream
An experience on empirical research about rdf streamAn experience on empirical research about rdf stream
An experience on empirical research about rdf stream
 
CV_English_Fernando_Andres_Sanchez_Gonzalez_17-Feb-2017
CV_English_Fernando_Andres_Sanchez_Gonzalez_17-Feb-2017CV_English_Fernando_Andres_Sanchez_Gonzalez_17-Feb-2017
CV_English_Fernando_Andres_Sanchez_Gonzalez_17-Feb-2017
 
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
Semantics, Automatic Metadata and Audiovisual Contents. A case of study: the ...
 

Mehr von multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

Mehr von multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Kürzlich hochgeladen

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Kürzlich hochgeladen (20)

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

T he SPL - IT Query by Example Search on Speech system for MediaEval 2014

  • 1. Jorge Proença 1,2 Arlindo Veiga 1,2 Fernando Perdigão 1,2 The SPL-IT Query by Example Search on Speech system for MediaEval 2014 The 2014 Query by Example Search on Speech (QUESST) 1 Instituto de Telecomunicações, Coimbra, Portugal 2 Electrical and Computer Eng. Department, University of Coimbra, Portugal
  • 2. 2 SPL-IT system MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN Overview of the system:  Fuses Dynamic Time Warping (DTW) modifications  Fuses results from systems with phonetic recognizers for 3 languages
  • 3. 3 Phonetic Recognizer MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Hard to extract good posteriorgrams with an HMM system (our in- house system).  Used 3 systems/languages (for 8 kHz) based on long temporal context and neural networks from Brnu University of Technology (BUT):  Czech  Hungarian  Russian  Output: posteriorgrams (3 states per phoneme).  Leading and trailing silence/noise removed PhonemeState Frame State Posteriorgram example for one query
  • 4. 4 Dynamic Time Warping MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Local Distance matrix:  Dot Product of Query and Audio posterior probability vectors;  Back-off with l =10-4    , logD q x q x   Distance Matrix of Query vs Audio
  • 5. 5 Dynamic Time Warping MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Basic DTW strategy (A1):  Smallest distance in identically weighted unitary jumps: Distance Matrix (top) and accumulated Distance matrix (bottom) of Query vs Audio
  • 6. 6 DTW Modifications MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  4 additional approaches: (A2) – Cutting up to 250ms at the end of the query, keeping the segment above 500ms (A3) – Cutting up to 250ms at the beginning of the query, keeping the segment above 500ms QueryQuery Audio Query vs. Audio posterior distance matrix (top) and the best path from A2 (bottom)
  • 7. 7 DTW Modifications MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN (A4) – Allowing one jump in the path up to ½ Query’s length, can’t occur at initial and final 250ms of the query can’t occur for queries shorter than 800ms QueryQuery Audio Query vs. Audio posterior distance matrix (top) and the best path from A4 (bottom)
  • 8. 8 DTW Modifications MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN (A5) – Swaps: accounting for re-ordering of words. Backtrack the best 5 candidates from (A1) from the end, Find the best path for the beginning of the query, ahead of the end of the first one, with restrictions similar to (A4).QueryQuery Audio Query vs. Audio posterior distance matrix (top) and the best path from A5 (bottom)
  • 9. 9 Fusing systems MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Different approaches:  Minimum of the approaches – not the best.  Harmonic mean found to be a good compromise.  Per-query normalization (standard score):  Different languages:  Arithmetic mean of the 3 scores.  X
  • 10. 10 Submissions and Results MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Primary: fusing (A1) and (A2) (basic and cutting the end)  Late: fusing the 5 approaches.  Late provided worse overall results primary late Cnxe, MinCnxe - Dev 0.6797, 0.5438 0.7106, 0.5881 Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240 ATWV, MTWV - Dev 0.4494, 0.4494 0.4051, 0.4052 ATWV, MTWV - Eval 0.4399, 0.4423 0.3918, 0.4218
  • 11. 11 Submissions and Results (cont.) MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Primary: fusing (A1) and (A2) (basic and cutting the end)  Late: fusing the 5 approaches. Cnxe for isolated approaches on Eval:  A1: 0.6823, A2: 0.6721, A3: 0.6947, A4: 0.6957 A5: 0.6999  For Type 3 queries, late system was better:  0.8049 Cnxe on primary to 0.7865 Cnxe on late primary late Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240
  • 12. 12 Conclusions MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN  Although this year’s task has an added difficulty, a simple DTW still works well for most cases.  Cutting queries at the end revealed to be the best strategy, and fusing it with A1 was even better.  Including the possibility of jumps and re-orders increased False Positives overall, since these special cases are a small part of the database.  We lacked an optimization method for Cnxe  Which would greatly improve the results.
  • 13. 13 END – Thank You MediaEval 2014 | October 16-17 2014, Barcelona, SPAIN Processing Speed:  Hardware – CRAY CX1 Cluster, running windows server 2008 HPC, using 16 of 56 cores (7 nodes with double Intel Xeon 5520 2.27GHz quad-core and 24GB RAM per node).  Indexing Speed Factor – 1.4  Searching Speed Factor – 0.0029 per sec and per language  Peak Memory – 0.098 GB