SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Downloaden Sie, um offline zu lesen
Multimodal
Person Discovery
in Broadcast TV
Johann Poignant / Hervé Bredin / Claude Barras
2015
bredin@limsi.fr

herve.niderb.fr

@hbredin
Outline
• Motivations
• Definition of the task
• Datasets
• Baseline & Metadata
• Evaluation protocol
• Organization
• Results
• Conclusion
2
Motivations
3
"the usual suspects"
• Huge TV archives are useless if not searchable
• People ❤ people
• Need for person-based indexes
4
REPERE
• Evaluation campaigns in 2012, 2013 and 2014

Three French consortia funded by ANR
• Multimodal people recognition in TV documents

"who speaks when?" and "who appears when?"
• Led to significant progress in both supervised and
unsupervised multimodal person recognition
5
From REPERE to Person Discovery
• Speaking faces

Focus on "person of interest"
• Unsupervised approaches

People may not be "famous" at indexing time
• Evidence

Archivist/journalist use case
6
Definition of the task
7
Input
8
Broadcast TV pre-segmented into shots.
Speaking face
9
Tag each shot with the names of people
both speaking and appearing at the same time
Person discovery
• Prior biometric models are not allowed.
• Person names must be discovered automatically

in text overlay or speech utterances.
unsupervised approaches only
10
Evidence
11
Associate each name with a unique shot

prooving that the person actually holds this name
Evidence (cont.)
12
an image evidence is a shot during which a person is visible

and their name is written on screen

an audio evidence is a shot during which a person is visible
and their name is pronounced at least once
during a [shot start time - 5s, shot end time + 5s] neighborhood
shot #3
for B
shot #1
for A
Datasets
13
Datasets
DEV | REPERE TEST | INA
14
137 hours
two French TV channels
eight different types
106 hours (172 videos)
only one French TV channel
only one type (news)
dense audio annotations
speaker diarization
speech transcription
sparse video annotations
face detection & recognition
optical character recognition
no prior annotation
a posteriori
collaborative
annotation
Datasets
TEST | INA
15
106 hours (172 videos)
only one French TV channel
only one type (news)
no prior annotation
a posteriori
collaborative
annotation
http://dataset.ina.fr
Evaluation protocol
16
Information retrieval task
• Queries formatted as firstname_lastname

e.g. francois_hollande



"return all shots where François Hollande is
speaking and visible at the same time"
• Approximate search among submitted names 

e.g. francois_holande
• Select shots tagged with the most similar name
17
Evidence-weighted MAP
18
C(q) =
1 if ⇢q > 0.95 and q 2 E(E(nq
0 otherwise
To ensure participants do provide correct evidences fo
hypothesized name n 2 N, standard MAP is alter
EwMAP (Evidence-weighted Mean Average Precisio
o cial metric for the task:
EwMAP =
1
|Q|
X
q2Q
C(q) · AP(q)
Acknowledgment. This work was supported by the
National Agency for Research under grant ANR-12
0006-01. The open source CAMOMILE collaborativ
tation platform2
was used extensively throughout the
of the task: from the run submission script to the aut
leaderboard, including a posteriori collaborative ann
MAP =
1
| Q |
X
q2Q
AP(q)
C(q) measures the correctness of 

provided evidence for query q
Baseline & Metadata
19
the "multi" in multimedia
Task necessitates expertise

in various domains:



• multimedia

• computer vision

• speech processing

• natural language processing
20
Technological barriers to entry is lowered

by the provision of a baseline system.
github.com/MediaevalPersonDiscoveryTask
Baseline fusion
21
22
face tracking • •
face clustering • • • •
speaker diarization • • • • •
optical character recognition • • •
automatic speech recognition
speaking face detection • • • •
fusion • • • • • •
Was the baseline useful?
team relied on the baseline module
• team developed their own module
• team tried both
9 participants
23
focus on
monomodal
components
baseline
face tracking • •
face clustering • • • •
speaker diarization • • • • •
optical character recognition • • •
automatic speech recognition
speaking face detection • • • •
fusion • • • • • •
focus on
multimodal
fusion
Organization
24
Schedule
01.05 development set release

01.06 test set release



01.07 "out-domain" submission deadline



01.07 — 08.07 leaderboard updated every 6 hours

08.07 "in-domain" submission deadline



09.07 — 28.07 collaborative annotation

28.07 test set annotation release

28.07 — 28.08 adjudication
25
Leaderboard
26
computed on a secret

subset of the test set
updated every 6 hours
private leaderboard


participants know how they rank 



participants know their score
but do not know the score of others


Collaborative annotation
27
Thanks!
28
around 50 hours in total
Thanks!
29
half of the test set ✔
collaborative annotation platform
-projectgithub.com/
objectives
open source
serverclient
client
client
bring
your
own
client
JSON
data model
corpus
medium layer
annotation
medium fragment attached metadata
time
Lorem ipsum dolor sit amet, consectetur
adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi
ut aliquip ex ea commodo consequat.
multimedia
document
REST API
GET /corpus/:idcorpus
POST /corpus/:idcorpus/medium
PUT /layer/:idlayer/annotation
DEL /annotation/:idannotation
obtain information about a corpus
add a medium to a corpus
update an annotation
delete an annotation
and more...
permissions
annotation history
annotation queue
user authentication
provide an architecture for effective creation and sharing of
annotations of multimedia, multimodal and multilingual data
homogeneous
collection of
multimedia
documents
homogeneous
collection of
annotationsaudio
video
image
text
categorical value
numerical value
free text
raw content
enable novel collaborative and interactive annotation tools
be generic enough to support
as many use cases as possible
online documentation
http://camomile-project.github.io/camomile-server
JSON
JSON
A
D
VER
TISEM
EN
T
Results
31
Stats
• 14 (+ organizers) registered participants

received dev and test data
• 8 (+ organizers) participants submitted 70 runs
• 7 (+ organizers) submitted a working note paper
• 7 (+ organizers) attend the workshop
32
33
out-domain results
28k shots
2070 queries
1642 # vs. 428 $
EwMAP (%) for PRIMARY runs
34
out-domain results
28k shots
2070 queries
1642 # vs. 428 $
EwMAP (%) for PRIMARY runs
in-domain results
35
EwMAP (%) for PRIMARY runs
in-domain results
36
EwMAP (%) for PRIMARY runs
37
out- vs. in-domain
# people that no other team discovered
38
1200 01 4 0 0 0
speech transcripts?
0
anchors
EwMAP (%) for PRIMARY runs
Conclusion
• Had a great (and exhausting) time organizing this task
• The winning submission DID NOT make any use of
the face modality.
• (Almost) nobody used ASR

Most people are introduced by overlaid names in the test set
• No cross-show approaches
• Poster session later today at 16:00

Technical retreat tomorrow morning at 9:15
39
Next (oral)
• Mateusz Budnik / LIG at MediaEval 2015
Multimodal Person Discovery in Broadcast TV Task
• Meriem Bendris / PERCOLATTE: a multimodal
person discovery system in TV broadcast for the
Medieval 2015 evaluation campaign
• Rosalia Barros / GTM-UVigo Systems for Person
Discovery Task at MediaEval 2015
40
Next (poster)
• Claude Barras / Multimodal Person Discovery in Broadcast TV at
MediaEval 2015
• Johann Poignant / LIMSI at MediaEval 2015: Person Discovery in
Broadcast TV Task
• Paula Lopez Otero / GTM-UVigo Systems for Person Discovery Task
at MediaEval 2015
• Javier Hernando / UPC System for the 2015 MediaEval Multimodal
Person Discovery in Broadcast TV task
• Guillaume Gravier / SSIG and IRISA at Multimodal Person Discovery
• Nam Le / EUMSSI team at the MediaEval Person Discovery Challenge
41

Weitere ähnliche Inhalte

Ähnlich wie MediaEval 2015 - Multimodal Person Discovery in Broadcast TV

Are well-selected panelists better respondents?
Are well-selected panelists better respondents?Are well-selected panelists better respondents?
Are well-selected panelists better respondents?
Florian Tress
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Maria Eskevich
 

Ähnlich wie MediaEval 2015 - Multimodal Person Discovery in Broadcast TV (20)

Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Are well-selected panelists better respondents?
Are well-selected panelists better respondents?Are well-selected panelists better respondents?
Are well-selected panelists better respondents?
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Deepfake detection
Deepfake detection Deepfake detection
Deepfake detection
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
Deepfake detection
Deepfake detectionDeepfake detection
Deepfake detection
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Can we induce change with what we measure?
Can we induce change with what we measure?Can we induce change with what we measure?
Can we induce change with what we measure?
 
PH-HISTO-PPT.FINAL DEF.pptx
PH-HISTO-PPT.FINAL DEF.pptxPH-HISTO-PPT.FINAL DEF.pptx
PH-HISTO-PPT.FINAL DEF.pptx
 
Practitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault LocalizationPractitioners’ Expectations on Automated Fault Localization
Practitioners’ Expectations on Automated Fault Localization
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
World Future Society 2015 Professional Members Forum
World Future Society 2015 Professional Members ForumWorld Future Society 2015 Professional Members Forum
World Future Society 2015 Professional Members Forum
 
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
[DSC Europe 22] Starting deep learning projects without sufficient amount of ...
 
Huawei STW 2018 public
Huawei STW 2018 publicHuawei STW 2018 public
Huawei STW 2018 public
 
UX research
UX researchUX research
UX research
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)How to do science in a large IT company (ICPC World Finals 2021, Moscow)
How to do science in a large IT company (ICPC World Finals 2021, Moscow)
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 

Mehr von multimediaeval

Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
multimediaeval
 

Mehr von multimediaeval (20)

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...
 
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
HCMUS at MediaEval 2020: Ensembles of Temporal Deep Neural Networks for Table...
 
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...Sports Video Classification: Classification of Strokes in Table Tennis for Me...
Sports Video Classification: Classification of Strokes in Table Tennis for Me...
 
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
Predicting Media Memorability from a Multimodal Late Fusion of Self-Attention...
 
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 TaskEssex-NLIP at MediaEval Predicting Media Memorability 2020 Task
Essex-NLIP at MediaEval Predicting Media Memorability 2020 Task
 
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
Overview of MediaEval 2020 Predicting Media Memorability task: What Makes a V...
 
Fooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality EstimatorFooling an Automatic Image Quality Estimator
Fooling an Automatic Image Quality Estimator
 
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
Fooling Blind Image Quality Assessment by Optimizing a Human-Understandable C...
 
Pixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social ImagesPixel Privacy: Quality Camouflage for Social Images
Pixel Privacy: Quality Camouflage for Social Images
 
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-MatchingHCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
HCMUS at MediaEval 2020:Image-Text Fusion for Automatic News-Images Re-Matching
 
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
Efficient Supervision Net: Polyp Segmentation using EfficientNet and Attentio...
 
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
HCMUS at Medico Automatic Polyp Segmentation Task 2020: PraNet and ResUnet++ ...
 
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
Depth-wise Separable Atrous Convolution for Polyps Segmentation in Gastro-Int...
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
A Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image DetectionA Temporal-Spatial Attention Model for Medical Image Detection
A Temporal-Spatial Attention Model for Medical Image Detection
 
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
HCMUS-Juniors 2020 at Medico Task in MediaEval 2020: Refined Deep Neural Netw...
 
Fine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with AttentionFine-tuning for Polyp Segmentation with Attention
Fine-tuning for Polyp Segmentation with Attention
 
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
Bigger Networks are not Always Better: Deep Convolutional Neural Networks for...
 
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
Insights for wellbeing: Predicting Personal Air Quality Index using Regressio...
 
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ... Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
Use Visual Features From Surrounding Scenes to Improve Personal Air Quality ...
 

Kürzlich hochgeladen

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

MediaEval 2015 - Multimodal Person Discovery in Broadcast TV

  • 1. Multimodal Person Discovery in Broadcast TV Johann Poignant / Hervé Bredin / Claude Barras 2015 bredin@limsi.fr
 herve.niderb.fr
 @hbredin
  • 2. Outline • Motivations • Definition of the task • Datasets • Baseline & Metadata • Evaluation protocol • Organization • Results • Conclusion 2
  • 4. "the usual suspects" • Huge TV archives are useless if not searchable • People ❤ people • Need for person-based indexes 4
  • 5. REPERE • Evaluation campaigns in 2012, 2013 and 2014
 Three French consortia funded by ANR • Multimodal people recognition in TV documents
 "who speaks when?" and "who appears when?" • Led to significant progress in both supervised and unsupervised multimodal person recognition 5
  • 6. From REPERE to Person Discovery • Speaking faces
 Focus on "person of interest" • Unsupervised approaches
 People may not be "famous" at indexing time • Evidence
 Archivist/journalist use case 6
  • 9. Speaking face 9 Tag each shot with the names of people both speaking and appearing at the same time
  • 10. Person discovery • Prior biometric models are not allowed. • Person names must be discovered automatically
 in text overlay or speech utterances. unsupervised approaches only 10
  • 11. Evidence 11 Associate each name with a unique shot
 prooving that the person actually holds this name
  • 12. Evidence (cont.) 12 an image evidence is a shot during which a person is visible
 and their name is written on screen
 an audio evidence is a shot during which a person is visible and their name is pronounced at least once during a [shot start time - 5s, shot end time + 5s] neighborhood shot #3 for B shot #1 for A
  • 14. Datasets DEV | REPERE TEST | INA 14 137 hours two French TV channels eight different types 106 hours (172 videos) only one French TV channel only one type (news) dense audio annotations speaker diarization speech transcription sparse video annotations face detection & recognition optical character recognition no prior annotation a posteriori collaborative annotation
  • 15. Datasets TEST | INA 15 106 hours (172 videos) only one French TV channel only one type (news) no prior annotation a posteriori collaborative annotation http://dataset.ina.fr
  • 17. Information retrieval task • Queries formatted as firstname_lastname
 e.g. francois_hollande
 
 "return all shots where François Hollande is speaking and visible at the same time" • Approximate search among submitted names 
 e.g. francois_holande • Select shots tagged with the most similar name 17
  • 18. Evidence-weighted MAP 18 C(q) = 1 if ⇢q > 0.95 and q 2 E(E(nq 0 otherwise To ensure participants do provide correct evidences fo hypothesized name n 2 N, standard MAP is alter EwMAP (Evidence-weighted Mean Average Precisio o cial metric for the task: EwMAP = 1 |Q| X q2Q C(q) · AP(q) Acknowledgment. This work was supported by the National Agency for Research under grant ANR-12 0006-01. The open source CAMOMILE collaborativ tation platform2 was used extensively throughout the of the task: from the run submission script to the aut leaderboard, including a posteriori collaborative ann MAP = 1 | Q | X q2Q AP(q) C(q) measures the correctness of 
 provided evidence for query q
  • 20. the "multi" in multimedia Task necessitates expertise
 in various domains:
 
 • multimedia
 • computer vision
 • speech processing
 • natural language processing 20 Technological barriers to entry is lowered
 by the provision of a baseline system. github.com/MediaevalPersonDiscoveryTask
  • 22. 22 face tracking • • face clustering • • • • speaker diarization • • • • • optical character recognition • • • automatic speech recognition speaking face detection • • • • fusion • • • • • • Was the baseline useful? team relied on the baseline module • team developed their own module • team tried both 9 participants
  • 23. 23 focus on monomodal components baseline face tracking • • face clustering • • • • speaker diarization • • • • • optical character recognition • • • automatic speech recognition speaking face detection • • • • fusion • • • • • • focus on multimodal fusion
  • 25. Schedule 01.05 development set release
 01.06 test set release
 
 01.07 "out-domain" submission deadline
 
 01.07 — 08.07 leaderboard updated every 6 hours
 08.07 "in-domain" submission deadline
 
 09.07 — 28.07 collaborative annotation
 28.07 test set annotation release
 28.07 — 28.08 adjudication 25
  • 26. Leaderboard 26 computed on a secret
 subset of the test set updated every 6 hours private leaderboard 
 participants know how they rank 
 
 participants know their score but do not know the score of others 

  • 29. Thanks! 29 half of the test set ✔
  • 30. collaborative annotation platform -projectgithub.com/ objectives open source serverclient client client bring your own client JSON data model corpus medium layer annotation medium fragment attached metadata time Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. multimedia document REST API GET /corpus/:idcorpus POST /corpus/:idcorpus/medium PUT /layer/:idlayer/annotation DEL /annotation/:idannotation obtain information about a corpus add a medium to a corpus update an annotation delete an annotation and more... permissions annotation history annotation queue user authentication provide an architecture for effective creation and sharing of annotations of multimedia, multimodal and multilingual data homogeneous collection of multimedia documents homogeneous collection of annotationsaudio video image text categorical value numerical value free text raw content enable novel collaborative and interactive annotation tools be generic enough to support as many use cases as possible online documentation http://camomile-project.github.io/camomile-server JSON JSON A D VER TISEM EN T
  • 32. Stats • 14 (+ organizers) registered participants
 received dev and test data • 8 (+ organizers) participants submitted 70 runs • 7 (+ organizers) submitted a working note paper • 7 (+ organizers) attend the workshop 32
  • 33. 33 out-domain results 28k shots 2070 queries 1642 # vs. 428 $ EwMAP (%) for PRIMARY runs
  • 34. 34 out-domain results 28k shots 2070 queries 1642 # vs. 428 $ EwMAP (%) for PRIMARY runs
  • 35. in-domain results 35 EwMAP (%) for PRIMARY runs
  • 36. in-domain results 36 EwMAP (%) for PRIMARY runs
  • 38. # people that no other team discovered 38 1200 01 4 0 0 0 speech transcripts? 0 anchors EwMAP (%) for PRIMARY runs
  • 39. Conclusion • Had a great (and exhausting) time organizing this task • The winning submission DID NOT make any use of the face modality. • (Almost) nobody used ASR
 Most people are introduced by overlaid names in the test set • No cross-show approaches • Poster session later today at 16:00
 Technical retreat tomorrow morning at 9:15 39
  • 40. Next (oral) • Mateusz Budnik / LIG at MediaEval 2015 Multimodal Person Discovery in Broadcast TV Task • Meriem Bendris / PERCOLATTE: a multimodal person discovery system in TV broadcast for the Medieval 2015 evaluation campaign • Rosalia Barros / GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 40
  • 41. Next (poster) • Claude Barras / Multimodal Person Discovery in Broadcast TV at MediaEval 2015 • Johann Poignant / LIMSI at MediaEval 2015: Person Discovery in Broadcast TV Task • Paula Lopez Otero / GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 • Javier Hernando / UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task • Guillaume Gravier / SSIG and IRISA at Multimodal Person Discovery • Nam Le / EUMSSI team at the MediaEval Person Discovery Challenge 41