SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Spoken Web Search at Mediaeval
2013
Xavier Anguera, Florian Metze, Andi
Buzo, Igor Szoke and Luis Javier
Rodriguez-Fuentes
Spoken Audio Search (or Query-by-Example
Spoken-Term Detection)
Given a spoken query we search for instances at lexical
level within spoken documents
It is similar to Spoken Term Detection (NIST STD2006,
OpenKWS 2013) but…
 Queries are spoken

 Different speakers
 Different acoustic conditions
 No prior knowledge of the
language(s) might be available
SWS history in Mediaeval
• SWS 2011 had 5 finishing participants and
focused on 4 Indian languages
• SWS 2012 had 9 finishing participants and
focused on 4 African Languages
• SWS 2013 has 13 finishing (18 registered)
participants and contains 9 languages
18
16

14

1400
#teams
1200

database size

1000

12
10

800

8

600

6

400

4
200

2
0

0
2011

2012

2013
SWS 2013 evaluation setup
• 1 single search corpus with ~20 hours of
data, collected from contributions of 9
languages
– No transcription or language information is given
to participants

• 500 queries for dev and 500 queries for eval
– For each query, participants need to return all
instances of that query in the search corpus
Mediaeval SWS 2013
• 9 languages in different acoustic contexts: 4 African
languages
(isixhosa, isizulu, sepedi, setswana), Albanian, Basqu
e, Czech, non-native English, Romanian
#utts

time

Avg. length/utt.

Search corpus

10762

19:57:55

6.67s

Dev Queries

505

0:11:26h

1.35s

Extended dev*

1046

0:08:42h

0.49s

Eval Queries

503

0:11:37h

1.38s

Extended eval*

1037

0:08:57h

0.51s

Total
13853
20:38:37h
*Only Basque (3x) and Czech (10x) queries have extended versions
Database distribution per language
Language

Number of
utterances / total
duration

Number of queries

Speech quality (original
sampling rate)

Recording environment

African - isixhosa

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - isizulu

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - sepedi

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - setswana

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

Albanian

968 / 127 min.

50 / 50

PC microphone, 16KHz

Lab environment, read
speech

Basque

1841 / 192 min.

100 / 100 (recorded
by mobile phone)

TV Broadcast news,
16KHz

Studio, read speech

Czech

3667 / 252 min.

94 / 93

Telephone speech, 8KHz

Telephone calls into
radio broadcasts,
spontaneous speech

Non-native English

434 / 141 min.

61 / 60

High quality mic, 44KHz

Conference lectures,
spontaneous speech

Romanian

2272 / 244 min.

100 / 100

PC microphone, 16KHz

Lab environment, read
speech
SWS 2013 participants
Dto. Electricidad y electrónica, Universidad Pais Vasco

Spain

Speec@FIT, Brno University of Technology

Czech Republic

Telefonica Research

Spain
Romania

School of Electrical and Computer Engineering, Georgia Institute of Technology

USA

L2F - INESC-ID

Portugal

Departament de sistemes informàtics I Computació, Universitat Politècnica de València

Spain

Audiolab, University of Zilina

Slovakia

LIA, University of Avignon

France

Technical University of Kosice

Slovakia

Universitat Pompeu Fabra

Spain

DSP-STL, Dept. of EE, The chinese University of Hong Kong

Hong Kong

International Institute of Information Technology- Hyderabad

Non-finishing

country

University Politechnica of Bucarest

organizers

Team name

India

IAIS, Fraunhofer Institute

Germany

TATA Consultancy Services Ltd.

India

Indian Statistical Institute

India

Northwestern Polytechnical University of Xi’an

China

Toyota Technological Institute at Chicago

USA
Possible approaches to QbE-STD
Pattern based
Language spoken
Acoustic models +

Lattice based
Language models +

Word-based
Followed approaches
Team name
Dto. Electricidad y electrónica, Universidad Pais Vasco
Speec@FIT, Brno University of Technology
Telefonica Research
University Politechnica of Bucarest
School of Electrical and Computer Engineering, Georgia Institute of Technology
L2F - INESC-ID
Dept. de sistemes informàtics I Computació, Universitat Politècnica de València
Audiolab, University of Zilina
LIA, University of Avignon
Technical University of Kosice
Universitat Pompeu Fabra
DSP-STL, Dept. of EE, The chinese University of Hong Kong
International Institute of Information Technology- Hyderabad

DTW-like

AKWS
Scoring metrics
• PRIMARY: Actual Term Weighted Value (ATWV) /
Maximum Term Weighted Value (MTWV)
• Actual/minimum Cnxe
• Real-time factor
• Memory usage
Primary metric (dev)
Primary metric (eval)
Per language results
Average for the 10-best systems
Per-language results: African (eval)
Per-language results: Albanian(eval)
Per-language results: Basque(eval)
Per-language results: Czech (eval)
Per-language results: Non-native English (eval)
Per-language results: Romanian (eval)
DET dev

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.417, Thr=5.204)
L2F (MTWV=0.390, Thr=3.428)
CUHK (MTWV=0.368, Thr=0.530)
BUT (MTWV=0.371, Thr=0.930)
CMTECHETAL (MTWV=0.264, Thr=16.535)
IIITH (MTWV=0.253, Thr=2.130)
ELIRF (MTWV=0.170, Thr=2.697)
TID (MTWV=0.116, Thr=4.085)
GTC (MTWV=0.116, Thr=3.248)
SPEED (MTWV=0.083, Thr=0.960)
LIA-Late (MTWV=0.005, Thr=13.065)
UNIZA-Late (MTWV=0.000, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (development)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
DET eval

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.399, Thr=5.243)
L2F (MTWV=0.342, Thr=3.551)
CUHK (MTWV=0.306, Thr=0.618)
BUT (MTWV=0.297, Thr=0.914)
CMTECHETAL (MTWV=0.257, Thr=18.153)
IIITH (MTWV=0.224, Thr=2.721)
ELIRF (MTWV=0.159, Thr=2.759)
TID (MTWV=0.093, Thr=5.051)
GTC (MTWV=0.084, Thr=3.341)
SPEED (MTWV=0.059, Thr=0.923)
LIA-Late (MTWV=0.000, Thr=1079.003)
UNIZA-Late (MTWV=0.001, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (evaluation)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
Cnxe metric
Cnxe

2.9
Min Cnxe (development)

Act Cnxe (development)

3
2.8
Act Cnxe (evaluation)

CUHK

2.7

L2F

Min Cnxe (evaluation)

GTTS

2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ELIRF

TID

GTC

Cnxe for primary systems

BUT CMTECHETAL IIITH

SpeeD

LIA

UNIZA

TUKE
Extended Queries
• 4 teams submitted 4 extended systems, making use of 3
repetitions of Basque queries and 10 repetitions of Czech
queries available
– TID: computes each query individually and then puts together all
results
– GTTS: DTW-aligns all queries above a minimum duration and searches
with the resulting query
– GeorgiaTech: builds a graphical keyword model using more than one
instance
Extended systems
Extended systems
Extended systems
Extended systems
Real-Time Factor versus Memory usage
Real-Time Factor versus Memory usage (partial)
Take home messages
• The task was more complicated than in 2012
– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on
2013 data)
– HKCU MTWV-12 = 0.74 (on 2012 data)

• It is possible to do QbE-STD on unknown/low
resources data
New things to watch out for in the posters session
• BUT:
– Fusion of 26 systems (13 AKWS + 13 DTW)
– M-norm normalization

• IIIT:
– Articulatory Bottleneck features

• CUHK:
– Tokenizer construction using Gaussian Component clustering
– Query expansion using PSOLA

• L2F
– DTW candidate pre-selection

• GTTS:
– Distance matrix normalization in DTW

• GeorgiaTech:
– Low-resource speech modeling using EHMM Models

• LIA:
– Use of I-vectors in SWS

• ARF
– DTW string matching algorithm with a novel scoring
System presentations
• 16:30-16:45 "GTTS Systems for the SWS Task at
MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE,
Universidad del País Vasco
• 16:45-17:00 "The L2F Spoken Web Search system for
Mediaeval 2013”, Alberto Abad, L2F, INESC-ID
• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL
APPROACH", Lucas Ondel, Speech@BUT, Brno
University of Technology
• 17:15-17:30 "The CMTECH Spoken Web Search System
for MediaEval 2013", Ciro Gracia, UPF
• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier
Anguera

Weitere ähnliche Inhalte

Ähnlich wie Mediaeval 2013 Spoken Web Search results slides

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation ModelThamme Gowda
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...NU_I_TODALAB
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognitionmultimediaeval
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014multimediaeval
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig mediaCarlos Turró Ribalta
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Taskmultimediaeval
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Andrea Matsunaga
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Coursesipij
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxGérard Chollet
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 introef-anat
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti1
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 

Ähnlich wie Mediaeval 2013 Spoken Web Search results slides (20)

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation Model
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognition
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig media
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Task
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 intro
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Mediaeval 2013 Spoken Web Search results slides

  • 1. Spoken Web Search at Mediaeval 2013 Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier Rodriguez-Fuentes
  • 2. Spoken Audio Search (or Query-by-Example Spoken-Term Detection) Given a spoken query we search for instances at lexical level within spoken documents It is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…  Queries are spoken  Different speakers  Different acoustic conditions  No prior knowledge of the language(s) might be available
  • 3. SWS history in Mediaeval • SWS 2011 had 5 finishing participants and focused on 4 Indian languages • SWS 2012 had 9 finishing participants and focused on 4 African Languages • SWS 2013 has 13 finishing (18 registered) participants and contains 9 languages 18 16 14 1400 #teams 1200 database size 1000 12 10 800 8 600 6 400 4 200 2 0 0 2011 2012 2013
  • 4. SWS 2013 evaluation setup • 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages – No transcription or language information is given to participants • 500 queries for dev and 500 queries for eval – For each query, participants need to return all instances of that query in the search corpus
  • 5. Mediaeval SWS 2013 • 9 languages in different acoustic contexts: 4 African languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basqu e, Czech, non-native English, Romanian #utts time Avg. length/utt. Search corpus 10762 19:57:55 6.67s Dev Queries 505 0:11:26h 1.35s Extended dev* 1046 0:08:42h 0.49s Eval Queries 503 0:11:37h 1.38s Extended eval* 1037 0:08:57h 0.51s Total 13853 20:38:37h *Only Basque (3x) and Czech (10x) queries have extended versions
  • 6. Database distribution per language Language Number of utterances / total duration Number of queries Speech quality (original sampling rate) Recording environment African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone) TV Broadcast news, 16KHz Studio, read speech Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech
  • 7. SWS 2013 participants Dto. Electricidad y electrónica, Universidad Pais Vasco Spain Speec@FIT, Brno University of Technology Czech Republic Telefonica Research Spain Romania School of Electrical and Computer Engineering, Georgia Institute of Technology USA L2F - INESC-ID Portugal Departament de sistemes informàtics I Computació, Universitat Politècnica de València Spain Audiolab, University of Zilina Slovakia LIA, University of Avignon France Technical University of Kosice Slovakia Universitat Pompeu Fabra Spain DSP-STL, Dept. of EE, The chinese University of Hong Kong Hong Kong International Institute of Information Technology- Hyderabad Non-finishing country University Politechnica of Bucarest organizers Team name India IAIS, Fraunhofer Institute Germany TATA Consultancy Services Ltd. India Indian Statistical Institute India Northwestern Polytechnical University of Xi’an China Toyota Technological Institute at Chicago USA
  • 8. Possible approaches to QbE-STD Pattern based Language spoken Acoustic models + Lattice based Language models + Word-based
  • 9. Followed approaches Team name Dto. Electricidad y electrónica, Universidad Pais Vasco Speec@FIT, Brno University of Technology Telefonica Research University Politechnica of Bucarest School of Electrical and Computer Engineering, Georgia Institute of Technology L2F - INESC-ID Dept. de sistemes informàtics I Computació, Universitat Politècnica de València Audiolab, University of Zilina LIA, University of Avignon Technical University of Kosice Universitat Pompeu Fabra DSP-STL, Dept. of EE, The chinese University of Hong Kong International Institute of Information Technology- Hyderabad DTW-like AKWS
  • 10. Scoring metrics • PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV) • Actual/minimum Cnxe • Real-time factor • Memory usage
  • 13. Per language results Average for the 10-best systems
  • 20. DET dev Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.417, Thr=5.204) L2F (MTWV=0.390, Thr=3.428) CUHK (MTWV=0.368, Thr=0.530) BUT (MTWV=0.371, Thr=0.930) CMTECHETAL (MTWV=0.264, Thr=16.535) IIITH (MTWV=0.253, Thr=2.130) ELIRF (MTWV=0.170, Thr=2.697) TID (MTWV=0.116, Thr=4.085) GTC (MTWV=0.116, Thr=3.248) SPEED (MTWV=0.083, Thr=0.960) LIA-Late (MTWV=0.005, Thr=13.065) UNIZA-Late (MTWV=0.000, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (development) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 21. DET eval Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.399, Thr=5.243) L2F (MTWV=0.342, Thr=3.551) CUHK (MTWV=0.306, Thr=0.618) BUT (MTWV=0.297, Thr=0.914) CMTECHETAL (MTWV=0.257, Thr=18.153) IIITH (MTWV=0.224, Thr=2.721) ELIRF (MTWV=0.159, Thr=2.759) TID (MTWV=0.093, Thr=5.051) GTC (MTWV=0.084, Thr=3.341) SPEED (MTWV=0.059, Thr=0.923) LIA-Late (MTWV=0.000, Thr=1079.003) UNIZA-Late (MTWV=0.001, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (evaluation) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 22. Cnxe metric Cnxe 2.9 Min Cnxe (development) Act Cnxe (development) 3 2.8 Act Cnxe (evaluation) CUHK 2.7 L2F Min Cnxe (evaluation) GTTS 2.6 2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ELIRF TID GTC Cnxe for primary systems BUT CMTECHETAL IIITH SpeeD LIA UNIZA TUKE
  • 23. Extended Queries • 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available – TID: computes each query individually and then puts together all results – GTTS: DTW-aligns all queries above a minimum duration and searches with the resulting query – GeorgiaTech: builds a graphical keyword model using more than one instance
  • 28. Real-Time Factor versus Memory usage
  • 29. Real-Time Factor versus Memory usage (partial)
  • 30. Take home messages • The task was more complicated than in 2012 – GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on 2013 data) – HKCU MTWV-12 = 0.74 (on 2012 data) • It is possible to do QbE-STD on unknown/low resources data
  • 31. New things to watch out for in the posters session • BUT: – Fusion of 26 systems (13 AKWS + 13 DTW) – M-norm normalization • IIIT: – Articulatory Bottleneck features • CUHK: – Tokenizer construction using Gaussian Component clustering – Query expansion using PSOLA • L2F – DTW candidate pre-selection • GTTS: – Distance matrix normalization in DTW • GeorgiaTech: – Low-resource speech modeling using EHMM Models • LIA: – Use of I-vectors in SWS • ARF – DTW string matching algorithm with a novel scoring
  • 32.
  • 33. System presentations • 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco • 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID • 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology • 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF • 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera

Hinweis der Redaktion

  1. AKWS means they use some sort of Viterbi alg.DTW-like means they use DTW algorithms to match different sorts of features
  2. La UPF te molt bona regularització per a trobat el optim score en tots els queries.TID I IIIT tenen mal matching entre ATWV I MTWVOnly the positive scores were plotted