LIG MediaEval 2012 Affect Task Detection

LIG Quaero consortium at MediaEval 2012
Affect task: Violent Scenes Detection Task

Nadia Derbas, Franck Thollard, Bahjat Safadi and
Georges Quénot
UJF-LIG

4 October 2012

Outline
• Global system architecture
• Descriptors with optimization
• Classification
• Hierarchical fusion
• Conceptual feedback
• Re-ranking
• Submitted runs
• Conclusion

04/10/12 LIG - Nadia Derbas 2

The classical classification pipeline

0101

0101 Discourse of
President
Bill Clinton

President Clinton is 0101
basking in some good
news

Signal Semantics

Semantic gap

04/10/12
Text Audio Image

Descriptor extraction

Descriptor transformation

Classification

Descriptors and classifier
variants fusion

LIG - Nadia Derbas
Conceptual feedback Higher level
hierarchical fusion

Re-ranking (re-scoring)
The LIG classification pipeline

Classification score
4

Descriptors and variants

Descriptor extraction:
●
color: 4 x 4 x 4 RGB histogram;
●
texture: 8 orientations x 5 scales Gabor transform;
●
points of interest: bags of SIFTs: Harris-Laplace and dense
sampling, hard and fuzzy clustering, use of color opponent SIFTs
(van de Sande);
●
Audio: bag of MFCCs, MFCCs only and MFCCs plus their first and
second derivatives.
●
Motion

Descriptor optimization:
●
power normalization: x ← xα, α ~ 0.4: good for sparse descriptors;
●
principal component analysis: dimensionality reduction and noise
removal;


Use of multiple classifiers
• Tow different classification methods:
• KNN
• MSVM
• Use of multiple SVMs to address the unbalanced data problem
• Improves over regular SVM on highly imbalanced datasets

• MSVM is generally better than kNN but not always


Hierarchical fusion
• Late fusion of descriptor and classifier variants: get the
maximum from each descriptor type:
• fuse spatial variants
• then fuse other variants
• finally fuse classification results from different classifiers
• Further hierarchical late fusion: fuse across different
descriptors with similar types:
• all color together, all texture together ...
• then all visual together, all audio together ...
• finally everything together

A linear combination of the scores is used with weight
optimized on the MediaEval development set.


Conceptual feedback
●
Idea: using the probability(-like) scores predicted on the 11
concepts for building a new descriptor
●
11 component vector
●
Trained with classifiers as the signal-based descriptors

Late fusion between the original scores and the scores
computed from classification on these original scores yield
a small improvement on the MAP@100.


Temporal re-ranking
●
Fact: shot within a video are semantically related, especially if
they are close within the same video
●
Idea: update shot scores according to neighbors’ scores
●
May be done globally (whole video) (Mérialdo 2009) or locally
(window of a few shots) (Safadi 2010).

●
Case of the full video:
• Compute a global score for a whole video from the scores of all shots it
contains (typically average or a variant)
• Update the score of each shot using the global video shot (typically a
linear combination or a variant)


Submitted runs
●
LIG-1: 0.3138
●
Hierarchical fusion of all available descriptor/classifier combinations
including the concept score feedback descriptor including temporal re-
ranking
●
LIG-2: 0.3122
●
including temporal re-ranking
●
LIG-3: 0.3138
●
including the concept score feedback descriptor
●
LIG-4: 0.3122
●


Submitted runs

Metric MAP@100 MAP P@100

Best 0.6506 0.3183 0.4833
LIG-1 0.3138 0.1723 0.3167
LIG-2 0.3122 0.1731 0.3034
LIG-3 0.3138 0.1307 0.3166
LIG-4 0.3122 0.1259 0.3033
Median 0.3122 0.1249 0.2600


Conclusion

●
Temporal re-ranking always improve the result or has no significant
effect

●
Conceptual feedback improve the precision in the head of the
returned list (MAP@100, P@100)

●
Motion descriptors

●
Audio was used (small contribution) but not ASR

●
Improvements still possible


Thank you for your attention!

Questions?


LIG MediaEval 2012 Affect Task Detection

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie LIG MediaEval 2012 Affect Task Detection

Ähnlich wie LIG MediaEval 2012 Affect Task Detection (20)

Mehr von MediaEval2012

Mehr von MediaEval2012 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

LIG MediaEval 2012 Affect Task Detection