T he SPL - IT Query by Example Search on Speech system for MediaEval 2014

Jorge Proença 1,2
Arlindo Veiga 1,2
Fernando Perdigão 1,2
The SPL-IT Query by Example Search on Speech
system for MediaEval 2014
The 2014 Query by Example Search on Speech (QUESST)
1 Instituto de Telecomunicações,
Coimbra, Portugal
2 Electrical and Computer Eng.
Department,
University of Coimbra, Portugal

2
SPL-IT system
MediaEval 2014
| October 16-17 2014, Barcelona, SPAIN
Overview of the system:
 Fuses Dynamic Time Warping (DTW) modifications
 Fuses results from systems with phonetic recognizers for 3
languages

3
Phonetic Recognizer
MediaEval 2014
 Hard to extract good posteriorgrams with an HMM system (our in-
house system).
 Used 3 systems/languages (for 8 kHz) based on long temporal context
and neural networks from Brnu University of Technology (BUT):
 Czech
 Hungarian
 Russian
 Output: posteriorgrams (3 states per phoneme).
 Leading and trailing silence/noise removed
PhonemeState
Frame
State Posteriorgram example for one query

4
Dynamic Time Warping
MediaEval 2014
 Local Distance matrix:
 Dot Product of Query and Audio posterior probability vectors;
 Back-off with l =10-4
   , logD q x q x  
Distance Matrix of Query vs Audio

5
Dynamic Time Warping
MediaEval 2014
 Basic DTW strategy (A1):
 Smallest distance in identically
weighted unitary jumps:
Distance Matrix (top) and accumulated Distance matrix (bottom) of Query vs Audio

6
DTW Modifications
MediaEval 2014
 4 additional approaches:
(A2) – Cutting up to 250ms at the end of the query,
keeping the segment above 500ms
(A3) – Cutting up to 250ms at the beginning of the query,
keeping the segment above 500ms
QueryQuery
Audio
Query vs. Audio posterior distance matrix (top) and the best path from A2 (bottom)

7
DTW Modifications
MediaEval 2014
(A4) – Allowing one jump in the path up to ½ Query’s length,
can’t occur at initial and final 250ms of the query
can’t occur for queries shorter than 800ms
QueryQuery
Audio

8
DTW Modifications
MediaEval 2014
(A5) – Swaps: accounting for re-ordering of words.
Backtrack the best 5 candidates from (A1) from the end,
Find the best path for the beginning of the query, ahead of the
end of the first one, with restrictions similar to (A4).QueryQuery
Audio

9
Fusing systems
MediaEval 2014
 Different approaches:
 Minimum of the approaches – not the best.
 Harmonic mean found to be a good compromise.
 Per-query normalization (standard score):
 Different languages:
 Arithmetic mean of the 3 scores.

X

10
Submissions and Results
MediaEval 2014
 Primary: fusing (A1) and (A2) (basic and cutting the end)
 Late: fusing the 5 approaches.
 Late provided worse overall results
primary late
Cnxe, MinCnxe - Dev 0.6797, 0.5438 0.7106, 0.5881
Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240
ATWV, MTWV - Dev 0.4494, 0.4494 0.4051, 0.4052
ATWV, MTWV - Eval 0.4399, 0.4423 0.3918, 0.4218

11
Submissions and Results (cont.)
MediaEval 2014
 Primary: fusing (A1) and (A2) (basic and cutting the end)
 Late: fusing the 5 approaches.
Cnxe for isolated approaches on Eval:
 A1: 0.6823, A2: 0.6721, A3: 0.6947, A4: 0.6957 A5: 0.6999
 For Type 3 queries, late system was better:
 0.8049 Cnxe on primary to 0.7865 Cnxe on late
primary late
Cnxe, MinCnxe - Eval 0.6588, 0.5080 0.6708, 0.5240

12
Conclusions
MediaEval 2014
 Although this year’s task has an added difficulty, a simple DTW still works
well for most cases.
 Cutting queries at the end revealed to be the best strategy, and fusing it
with A1 was even better.
 Including the possibility of jumps and re-orders increased False Positives
overall, since these special cases are a small part of the database.
 We lacked an optimization method for Cnxe
 Which would greatly improve the results.

13
END – Thank You
MediaEval 2014
Processing Speed:
 Hardware – CRAY CX1 Cluster, running windows server 2008 HPC, using 16 of 56
cores (7 nodes with double Intel Xeon 5520 2.27GHz quad-core and 24GB RAM per
node).
 Indexing Speed Factor – 1.4
 Searching Speed Factor – 0.0029 per sec and per language
 Peak Memory – 0.098 GB

T he SPL - IT Query by Example Search on Speech system for MediaEval 2014

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (11)

Ähnlich wie T he SPL - IT Query by Example Search on Speech system for MediaEval 2014

Ähnlich wie T he SPL - IT Query by Example Search on Speech system for MediaEval 2014 (20)

Mehr von multimediaeval

Mehr von multimediaeval (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

T he SPL - IT Query by Example Search on Speech system for MediaEval 2014