SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Overview        System Description             System performance           Conclusion   Acknowledgement




           The CUHK Systems for Spoken Web Search task at
                         MediaEval 2012

                                     Haipeng Wang and Tan Lee

                                     Department of Electronic Engineering
                                     The Chinese University of Hong Kong


                                          September 30, 2012
Overview        System Description   System performance   Conclusion   Acknowledgement



Outline


      1    Overview

      2    System Description
             PTDTW framework
             Tokenizers
             DTW detection
             Pseudo-relevance Feedback and Score Normalization

      3    System configuration and performance

      4    Conclusion

      5    Acknowledgement
Overview       System Description   System performance   Conclusion      Acknowledgement



Overview


           2012 Spoken Web Search task [Metze et al., 2012]
               QbyE STD:     Audio search using audio queries.
               Multilingual: Four South African languages.
               Low-resource: Less than 4-hour DEV audio data in total.
               Extreme case: One example for each query term.
           Overview of our systems
               Aiming at language-independent QbyE STD system.
               Multiple resources:
               1) the DEV audio data; 2) rich-resource languages.
               Combine different resources: PTDTW framework.
               Pseudo-relevance feedback (PRF).
               Score normalization.
Overview        System Description            System performance         Conclusion           Acknowledgement



Posteriorgram-based template matching

                 Training
                Resources



                  Query                        Query
                 Example                   Posteriorgrams
                                                                                  Detection
                               Tokenizer
                                                                                   Score
                   Test                         Test
                Utterance                  Posteriorgrams
                                                              DETECT by DTW



           Figure: Posteriorgram-based template matching[Hazen et al., 2009]
           Training resources: audio data with or without transcriptions.
           Tokenizer: if trained without transcriptions, unsupervised;
           otherwise, supervised.
           Posteriorgrams: more robust than spectral features.
           How to effectively combine different resources?
Overview             System Description                  System performance              Conclusion       Acknowledgement



PTDTW framework


                                           Query
                                      Posteriorgrams 1        DTW
                       Tokenizer 1                          distance
                                            Test            Matrix D1
                                      Posteriorgrams 1

            Query                          Query
           Example                    Posteriorgrams 2        DTW
                       Tokenizer 2                          distance           DTW                           Raw
                                            Test            Matrix D2
                                      Posteriorgrams 2                        Distance                     Detection
                                                                              Matrix D                      Score
          Test
       Utterance                           Query
                                      Posteriorgrams N        DTW                         DETECT by DTW
                       Tokenizer N                          distance
                                            Test            Matrix DN
                                      Posteriorgrams N




                                          Figure: PTDTW Framework
                Parallel tokenizers followed by DTW detection (PTDTW).
                Modified from the posteriorgram-based template matching
                approach.
                Key idea: Combining DTW distance matrices.
Overview       System Description   System performance   Conclusion     Acknowledgement



Unsupervised tokenizers



           MFCC-GMM tokenizer [Zhang and Glass, 2009]
               Unsupervised training from the DEV data without transcription.
               1024 Gaussian components.
               39-dim MFCC + MVN + VTLN
           MFCC-ASM tokenizer [Lee et al., 1988, Wang et al., 2012]
               Acoustic segment model, also named as self-organized unit
               (SOU) [Siu et al., 2010].
               Unsupervised training from the DEV data without transcription.
               256 ASM units. Each unit has 3 state, with 16 gaussian
               components for each state.
               39-dim MFCC + MVN + VTLN
Overview       System Description   System performance   Conclusion    Acknowledgement



Phoneme recognizers


           Czech, Hungarian, Russian phoneme recognizers
               developed by BUT [Schwarz, 2009].
               trained from SpeechDat-E corpora.
           Mandarin phoneme recognizer
               179 tonal phonemes.
               About 15-hour training data from CallHome corpus and
               CallFriend corpus.
           English phoneme recognizer
               40 phonemes.
               About 15-hour training data from Fisher corpus and Swichboard
               Cellular corpus.
Overview           System Description               System performance   Conclusion            Acknowledgement



Phoneme recognizers



           Input    Phoneme              Taking               PCA                       Gaussian
                                                                         GMM
           Data    Recognizers          Logarithm           Transform                 Posteriorgrams


                                         Figure: Tandem Structure



              256 Gaussian components trained on the DEV data.
              Using tandem structure, we have 5 tokenizers:
              CZ-GMM, HU-GMM, RU-GMM, MA-GMM and EN-GMM.
Overview       System Description            System performance        Conclusion   Acknowledgement



DTW detection


           DTW detection is performed with a sliding window.
           Find the path minimizing the normalized distance:
                                                         K
                                    ˆ                    1   d(i(k), j(k))wk
                                    d=     min
                                         K,i(k),j(k)           Z(w)
           where d(i(k), j(k)) is set to the inner-product distance, wk = 1,
           and Z(w) = K.
           Additional constraint: |i(k) − j(k)| ≤ R.
           Due to the large variation of the query length, R is not set to a
           fixed number, but in proportional to the query length I:
                             1
           R = α × I. (α = 3 in our systems).
Overview       System Description   System performance     Conclusion       Acknowledgement



Pseudo-relevance Feedback and Score Normalization


           Pseudo-revelance Feedback for each query:
               1) The top H hits from all the test utterances were selected as the
               relevance examples. Selection criterion included: a) H ≤ 3; b)
               raw detection score should be larger than a pre-set threshold.
                                                                       ˆ ˆ
               2) The relevance examples were used to score the top H (H = 2
               for this task) hits from each test utterance.
               3) The scores obtained by the relevance examples were linearly
               fused with the scores of the original query examples.
           Score normalization for each query:
               ˆq,t = (sq,t − µq )/δq
               s
               sq,t is the score of the qth query on the tth hit region.
                          2
               µq and δq are the mean and variance of the scores for the qth
               query estimated from the development data.
Overview              System Description     System performance          Conclusion          Acknowledgement



System Configuration and Performance
                     Table: System Configurations and ATWV performances.
                           System No.           1         2        3      4            5
                                                √         √               √            √
                          MFCC-GMM
                                                √         √               √            √
                          MFCC-ASM
                                                                   √      √            √
                        PHNREC-GMM1
                                                √                         √
                              PRF
                                                √         √        √      √            √
                       Score Normalization
                          devQ - devC         0.68      0.63      0.73   0.78         0.74
                           devQ - evlC        0.60      0.55      0.70   0.75         0.70
                           evlQ - devC        0.68      0.65      0.73   0.77         0.75
                           evlQ - evlC        0.64      0.59      0.72   0.74         0.74

                System 1 and 2 belong to the require run condition.
                System 3, 4 and 5 belong to the general run condition.
                The best performance (system 4) is achieved when all the tokenizers, PRF and
                Score normalization are used.
           1
               PHNREC-GMM denotes the combination of the five used tandem tokenizers: CZ-GMM,
      HU-GMM, RU-GMM, MA-GMM, and EN-GMM.
Overview        System Description       System performance          Conclusion          Acknowledgement



System Configuration and Performance

                Table: System Configurations and ATWV performances.
                     System No.             1         2        3      4            5
                                            √         √               √            √
                    MFCC-GMM
                                            √         √               √            √
                    MFCC-ASM
                                                               √      √            √
                  PHNREC-GMM
                                            √                         √
                        PRF
                                            √         √        √      √            √
                 Score Normalization
                    devQ - devC           0.68      0.63      0.73   0.78         0.74
                     devQ - evlC          0.60      0.55      0.70   0.75         0.70
                     evlQ - devC          0.68      0.65      0.73   0.77         0.75
                     evlQ - evlC          0.64      0.59      0.72   0.74         0.74


           Supervised tokenizers perform better than the unsupervised tokenizers.
           Training resources for unsupervised tokenizers are limited in this task, but not
           limited for supervised tokenizers.
           The PTDTW framework provides a flexible way to combine all these resources.
Overview        System Description     System performance          Conclusion          Acknowledgement



System Configuration and Performance

               Table: System Configurations and ATWV performances.
                     System No.           1         2        3      4            5
                                          √         √               √            √
                    MFCC-GMM
                                          √         √               √            √
                    MFCC-ASM
                                                             √      √            √
                  PHNREC-GMM
                                          √                         √
                        PRF
                                          √         √        √      √            √
                 Score Normalization
                    devQ - devC         0.68      0.63      0.73   0.78         0.74
                     devQ - evlC        0.60      0.55      0.70   0.75         0.70
                     evlQ - devC        0.68      0.65      0.73   0.77         0.75
                     evlQ - evlC        0.64      0.59      0.72   0.74         0.74


           Combination of supervised tokenizers and unsupervised tokenizers leads to
           consistent improvement.
           Pseudo-relevance Feedback provides consistent improvement.
Overview       System Description   System performance   Conclusion      Acknowledgement



Conclusion




           A PTDTW framework was proposed for the query-by-example
           STD task in this evaluation.
           Supervised tokenizers performed better than unsupervised
           tokenizers for this task. The combination of supervised and
           unsupervised tokenizers provided consistent gain.
           Pseudo-relevance feedback and score normalization were used.
Overview       System Description   System performance   Conclusion   Acknowledgement



Acknowledgement




           Thank Cheung-Chi Leung from IIR for helpful discussions.
           Thank the organizers for organizing this evaluation.
           Thank BUT for sharing the phoneme recognizers and scripts.
           This research is partially supported by the General Research
           Funds (Ref: 414010 and 413811) from the Hong Kong Research
           Grants Council.
Overview   System Description     System performance   Conclusion   Acknowledgement




                                Thank you!
Overview            System Description                  System performance                  Conclusion               Acknowledgement



Reference

           Hazen, T., Shen, W., and White, C. (2009).
           Query-by-example spoken term detection using phonetic posteriorgram templates.
           In ASRU.
           Lee, C., Soong, F., and Juang, B. (1988).
           A segment model based approach to speech recognition.
           In ICASSP.
           Metze, F., Barnard, E., Davel, M., van Heerden, C., Anguera, X., Gravier, G., and Rajput, N. (2012).
           The spoken web search task.
           In MediaEval 2012 Workshop.

           Schwarz, P. (2009).
           Phoneme recognition based on long temporal context, PhD thesis.

           Siu, M., Gish, H., Chan, A., and Belfield, W. (2010).
           Improved topic classification and keyword discovery using an hmm-based speech recognizer trained without
           supervision.
           In INTERSPEECH.
           Wang, H., C.Leung, LEE, T., Li, H., and Ma, B. (2012).
           An acoustic segment modeling approach to query-by-example spoken term detection.
           In ICASSP.
           Zhang, Y. and Glass, J. (2009).
           Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams.
           In ASRU.

Weitere ähnliche Inhalte

Was ist angesagt?

A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelIDES Editor
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...gt_ebuddy
 
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionAdvance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionIJERA Editor
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Wesley De Neve
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition systemDeepesh Lekhak
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012
 
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...IDES Editor
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGARajesh Roshan
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...csandit
 

Was ist angesagt? (16)

A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov ModelA Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
A Novel Method for Speaker Independent Recognition Based on Hidden Markov Model
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
Siguccs20101026
Siguccs20101026Siguccs20101026
Siguccs20101026
 
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
Text Prompted Remote Speaker Authentication : Joint Speech and Speaker Recogn...
 
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protectionAdvance Digital Video Watermarking based on DWT-PCA for Copyright protection
Advance Digital Video Watermarking based on DWT-PCA for Copyright protection
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
Contribution of Non-Scrambled Chroma Information in Privacy-Protected Face Im...
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012
 
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...
Design of Optimal Linear Phase FIR High Pass Filter using Improved Particle S...
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Speech based password authentication system on FPGA
Speech based password authentication system on FPGASpeech based password authentication system on FPGA
Speech based password authentication system on FPGA
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...
 

Ähnlich wie CUHK System for the Spoken Web Search task at Mediaeval 2012

On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...
On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...
On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...Nestor Michael Tiglao
 
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...Victor Asanza
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
 
Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsMohammad Jafar Mashhadi
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger Eli Kaminuma
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learningbutest
 
Remote authentication via biometrics1
Remote authentication via biometrics1Remote authentication via biometrics1
Remote authentication via biometrics1Omkar Salunke
 
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic ProgrammingRealtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programmingadil raja
 
Multisensor Data Fusion : Techno Briefing
Multisensor Data Fusion : Techno BriefingMultisensor Data Fusion : Techno Briefing
Multisensor Data Fusion : Techno BriefingPaveen Juntama
 
Pin pointpresentation
Pin pointpresentationPin pointpresentation
Pin pointpresentationLevan Huan
 
Android Malware
Android Malware Android Malware
Android Malware Nambiraju
 
Uvm presentation dac2011_final
Uvm presentation dac2011_finalUvm presentation dac2011_final
Uvm presentation dac2011_finalsean chen
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slidessmpant
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptxNambiraju
 

Ähnlich wie CUHK System for the Spoken Web Search task at Mediaeval 2012 (20)

Cuhk system 14oct_2
Cuhk system 14oct_2Cuhk system 14oct_2
Cuhk system 14oct_2
 
Cuhk system 14oct
Cuhk system 14octCuhk system 14oct
Cuhk system 14oct
 
On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...
On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...
On the Optimization and Comparative Evaluation of a Reliable and Efficient Ca...
 
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...
⭐⭐⭐⭐⭐ Localización en ambiente de interiores basado en Machine Learning con r...
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
 
Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software Systems
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Advances in Bayesian Learning
Advances in Bayesian LearningAdvances in Bayesian Learning
Advances in Bayesian Learning
 
Remote authentication via biometrics1
Remote authentication via biometrics1Remote authentication via biometrics1
Remote authentication via biometrics1
 
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic ProgrammingRealtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programming
 
Multisensor Data Fusion : Techno Briefing
Multisensor Data Fusion : Techno BriefingMultisensor Data Fusion : Techno Briefing
Multisensor Data Fusion : Techno Briefing
 
Pin pointpresentation
Pin pointpresentationPin pointpresentation
Pin pointpresentation
 
annInstance28Nov6pm
annInstance28Nov6pmannInstance28Nov6pm
annInstance28Nov6pm
 
Tridiagonal solver in gpu
Tridiagonal solver in gpuTridiagonal solver in gpu
Tridiagonal solver in gpu
 
Android Malware
Android Malware Android Malware
Android Malware
 
Uvm presentation dac2011_final
Uvm presentation dac2011_finalUvm presentation dac2011_final
Uvm presentation dac2011_final
 
Prelim Slides
Prelim SlidesPrelim Slides
Prelim Slides
 
4g lte matlab
4g lte matlab4g lte matlab
4g lte matlab
 
Presentation, navid khoob
Presentation, navid khoobPresentation, navid khoob
Presentation, navid khoob
 
1st review android malware.pptx
1st review  android malware.pptx1st review  android malware.pptx
1st review android malware.pptx
 

Mehr von MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 

Mehr von MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
Closing
ClosingClosing
Closing
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

CUHK System for the Spoken Web Search task at Mediaeval 2012

  • 1. Overview System Description System performance Conclusion Acknowledgement The CUHK Systems for Spoken Web Search task at MediaEval 2012 Haipeng Wang and Tan Lee Department of Electronic Engineering The Chinese University of Hong Kong September 30, 2012
  • 2. Overview System Description System performance Conclusion Acknowledgement Outline 1 Overview 2 System Description PTDTW framework Tokenizers DTW detection Pseudo-relevance Feedback and Score Normalization 3 System configuration and performance 4 Conclusion 5 Acknowledgement
  • 3. Overview System Description System performance Conclusion Acknowledgement Overview 2012 Spoken Web Search task [Metze et al., 2012] QbyE STD: Audio search using audio queries. Multilingual: Four South African languages. Low-resource: Less than 4-hour DEV audio data in total. Extreme case: One example for each query term. Overview of our systems Aiming at language-independent QbyE STD system. Multiple resources: 1) the DEV audio data; 2) rich-resource languages. Combine different resources: PTDTW framework. Pseudo-relevance feedback (PRF). Score normalization.
  • 4. Overview System Description System performance Conclusion Acknowledgement Posteriorgram-based template matching Training Resources Query Query Example Posteriorgrams Detection Tokenizer Score Test Test Utterance Posteriorgrams DETECT by DTW Figure: Posteriorgram-based template matching[Hazen et al., 2009] Training resources: audio data with or without transcriptions. Tokenizer: if trained without transcriptions, unsupervised; otherwise, supervised. Posteriorgrams: more robust than spectral features. How to effectively combine different resources?
  • 5. Overview System Description System performance Conclusion Acknowledgement PTDTW framework Query Posteriorgrams 1 DTW Tokenizer 1 distance Test Matrix D1 Posteriorgrams 1 Query Query Example Posteriorgrams 2 DTW Tokenizer 2 distance DTW Raw Test Matrix D2 Posteriorgrams 2 Distance Detection Matrix D Score Test Utterance Query Posteriorgrams N DTW DETECT by DTW Tokenizer N distance Test Matrix DN Posteriorgrams N Figure: PTDTW Framework Parallel tokenizers followed by DTW detection (PTDTW). Modified from the posteriorgram-based template matching approach. Key idea: Combining DTW distance matrices.
  • 6. Overview System Description System performance Conclusion Acknowledgement Unsupervised tokenizers MFCC-GMM tokenizer [Zhang and Glass, 2009] Unsupervised training from the DEV data without transcription. 1024 Gaussian components. 39-dim MFCC + MVN + VTLN MFCC-ASM tokenizer [Lee et al., 1988, Wang et al., 2012] Acoustic segment model, also named as self-organized unit (SOU) [Siu et al., 2010]. Unsupervised training from the DEV data without transcription. 256 ASM units. Each unit has 3 state, with 16 gaussian components for each state. 39-dim MFCC + MVN + VTLN
  • 7. Overview System Description System performance Conclusion Acknowledgement Phoneme recognizers Czech, Hungarian, Russian phoneme recognizers developed by BUT [Schwarz, 2009]. trained from SpeechDat-E corpora. Mandarin phoneme recognizer 179 tonal phonemes. About 15-hour training data from CallHome corpus and CallFriend corpus. English phoneme recognizer 40 phonemes. About 15-hour training data from Fisher corpus and Swichboard Cellular corpus.
  • 8. Overview System Description System performance Conclusion Acknowledgement Phoneme recognizers Input Phoneme Taking PCA Gaussian GMM Data Recognizers Logarithm Transform Posteriorgrams Figure: Tandem Structure 256 Gaussian components trained on the DEV data. Using tandem structure, we have 5 tokenizers: CZ-GMM, HU-GMM, RU-GMM, MA-GMM and EN-GMM.
  • 9. Overview System Description System performance Conclusion Acknowledgement DTW detection DTW detection is performed with a sliding window. Find the path minimizing the normalized distance: K ˆ 1 d(i(k), j(k))wk d= min K,i(k),j(k) Z(w) where d(i(k), j(k)) is set to the inner-product distance, wk = 1, and Z(w) = K. Additional constraint: |i(k) − j(k)| ≤ R. Due to the large variation of the query length, R is not set to a fixed number, but in proportional to the query length I: 1 R = α × I. (α = 3 in our systems).
  • 10. Overview System Description System performance Conclusion Acknowledgement Pseudo-relevance Feedback and Score Normalization Pseudo-revelance Feedback for each query: 1) The top H hits from all the test utterances were selected as the relevance examples. Selection criterion included: a) H ≤ 3; b) raw detection score should be larger than a pre-set threshold. ˆ ˆ 2) The relevance examples were used to score the top H (H = 2 for this task) hits from each test utterance. 3) The scores obtained by the relevance examples were linearly fused with the scores of the original query examples. Score normalization for each query: ˆq,t = (sq,t − µq )/δq s sq,t is the score of the qth query on the tth hit region. 2 µq and δq are the mean and variance of the scores for the qth query estimated from the development data.
  • 11. Overview System Description System performance Conclusion Acknowledgement System Configuration and Performance Table: System Configurations and ATWV performances. System No. 1 2 3 4 5 √ √ √ √ MFCC-GMM √ √ √ √ MFCC-ASM √ √ √ PHNREC-GMM1 √ √ PRF √ √ √ √ √ Score Normalization devQ - devC 0.68 0.63 0.73 0.78 0.74 devQ - evlC 0.60 0.55 0.70 0.75 0.70 evlQ - devC 0.68 0.65 0.73 0.77 0.75 evlQ - evlC 0.64 0.59 0.72 0.74 0.74 System 1 and 2 belong to the require run condition. System 3, 4 and 5 belong to the general run condition. The best performance (system 4) is achieved when all the tokenizers, PRF and Score normalization are used. 1 PHNREC-GMM denotes the combination of the five used tandem tokenizers: CZ-GMM, HU-GMM, RU-GMM, MA-GMM, and EN-GMM.
  • 12. Overview System Description System performance Conclusion Acknowledgement System Configuration and Performance Table: System Configurations and ATWV performances. System No. 1 2 3 4 5 √ √ √ √ MFCC-GMM √ √ √ √ MFCC-ASM √ √ √ PHNREC-GMM √ √ PRF √ √ √ √ √ Score Normalization devQ - devC 0.68 0.63 0.73 0.78 0.74 devQ - evlC 0.60 0.55 0.70 0.75 0.70 evlQ - devC 0.68 0.65 0.73 0.77 0.75 evlQ - evlC 0.64 0.59 0.72 0.74 0.74 Supervised tokenizers perform better than the unsupervised tokenizers. Training resources for unsupervised tokenizers are limited in this task, but not limited for supervised tokenizers. The PTDTW framework provides a flexible way to combine all these resources.
  • 13. Overview System Description System performance Conclusion Acknowledgement System Configuration and Performance Table: System Configurations and ATWV performances. System No. 1 2 3 4 5 √ √ √ √ MFCC-GMM √ √ √ √ MFCC-ASM √ √ √ PHNREC-GMM √ √ PRF √ √ √ √ √ Score Normalization devQ - devC 0.68 0.63 0.73 0.78 0.74 devQ - evlC 0.60 0.55 0.70 0.75 0.70 evlQ - devC 0.68 0.65 0.73 0.77 0.75 evlQ - evlC 0.64 0.59 0.72 0.74 0.74 Combination of supervised tokenizers and unsupervised tokenizers leads to consistent improvement. Pseudo-relevance Feedback provides consistent improvement.
  • 14. Overview System Description System performance Conclusion Acknowledgement Conclusion A PTDTW framework was proposed for the query-by-example STD task in this evaluation. Supervised tokenizers performed better than unsupervised tokenizers for this task. The combination of supervised and unsupervised tokenizers provided consistent gain. Pseudo-relevance feedback and score normalization were used.
  • 15. Overview System Description System performance Conclusion Acknowledgement Acknowledgement Thank Cheung-Chi Leung from IIR for helpful discussions. Thank the organizers for organizing this evaluation. Thank BUT for sharing the phoneme recognizers and scripts. This research is partially supported by the General Research Funds (Ref: 414010 and 413811) from the Hong Kong Research Grants Council.
  • 16. Overview System Description System performance Conclusion Acknowledgement Thank you!
  • 17. Overview System Description System performance Conclusion Acknowledgement Reference Hazen, T., Shen, W., and White, C. (2009). Query-by-example spoken term detection using phonetic posteriorgram templates. In ASRU. Lee, C., Soong, F., and Juang, B. (1988). A segment model based approach to speech recognition. In ICASSP. Metze, F., Barnard, E., Davel, M., van Heerden, C., Anguera, X., Gravier, G., and Rajput, N. (2012). The spoken web search task. In MediaEval 2012 Workshop. Schwarz, P. (2009). Phoneme recognition based on long temporal context, PhD thesis. Siu, M., Gish, H., Chan, A., and Belfield, W. (2010). Improved topic classification and keyword discovery using an hmm-based speech recognizer trained without supervision. In INTERSPEECH. Wang, H., C.Leung, LEE, T., Li, H., and Ma, B. (2012). An acoustic segment modeling approach to query-by-example spoken term detection. In ICASSP. Zhang, Y. and Glass, J. (2009). Unsupervised spoken keyword spotting via segmental dtw on gaussian posteriorgrams. In ASRU.