1. LIG Quaero consortium at MediaEval 2012
Affect task: Violent Scenes Detection Task
Nadia Derbas, Franck Thollard, Bahjat Safadi and
Georges Quénot
UJF-LIG
4 October 2012
2. Outline
• Global system architecture
• Descriptors with optimization
• Classification
• Hierarchical fusion
• Conceptual feedback
• Re-ranking
• Submitted runs
• Conclusion
04/10/12 LIG - Nadia Derbas 2
3. The classical classification pipeline
0101
0101 Discourse of
President
Bill Clinton
President Clinton is 0101
basking in some good
news
Signal Semantics
Semantic gap
04/10/12 LIG - Nadia Derbas 3
4. 04/10/12
Text Audio Image
Descriptor extraction
Descriptor transformation
Classification
Descriptors and classifier
variants fusion
LIG - Nadia Derbas
Conceptual feedback Higher level
hierarchical fusion
Re-ranking (re-scoring)
The LIG classification pipeline
Classification score
4
5. Descriptors and variants
Descriptor extraction:
●
color: 4 x 4 x 4 RGB histogram;
●
texture: 8 orientations x 5 scales Gabor transform;
●
points of interest: bags of SIFTs: Harris-Laplace and dense
sampling, hard and fuzzy clustering, use of color opponent SIFTs
(van de Sande);
●
Audio: bag of MFCCs, MFCCs only and MFCCs plus their first and
second derivatives.
●
Motion
Descriptor optimization:
●
power normalization: x ← xα, α ~ 0.4: good for sparse descriptors;
●
principal component analysis: dimensionality reduction and noise
removal;
04/10/12 LIG - Nadia Derbas 5
6. Use of multiple classifiers
• Tow different classification methods:
• KNN
• MSVM
• Use of multiple SVMs to address the unbalanced data problem
• Improves over regular SVM on highly imbalanced datasets
• MSVM is generally better than kNN but not always
04/10/12 LIG - Nadia Derbas 6
7. Hierarchical fusion
• Late fusion of descriptor and classifier variants: get the
maximum from each descriptor type:
• fuse spatial variants
• then fuse other variants
• finally fuse classification results from different classifiers
• Further hierarchical late fusion: fuse across different
descriptors with similar types:
• all color together, all texture together ...
• then all visual together, all audio together ...
• finally everything together
A linear combination of the scores is used with weight
optimized on the MediaEval development set.
04/10/12 LIG - Nadia Derbas 7
8. Conceptual feedback
●
Idea: using the probability(-like) scores predicted on the 11
concepts for building a new descriptor
●
11 component vector
●
Trained with classifiers as the signal-based descriptors
Late fusion between the original scores and the scores
computed from classification on these original scores yield
a small improvement on the MAP@100.
04/10/12 LIG - Nadia Derbas 8
9. Temporal re-ranking
●
Fact: shot within a video are semantically related, especially if
they are close within the same video
●
Idea: update shot scores according to neighbors’ scores
●
May be done globally (whole video) (Mérialdo 2009) or locally
(window of a few shots) (Safadi 2010).
●
Case of the full video:
• Compute a global score for a whole video from the scores of all shots it
contains (typically average or a variant)
• Update the score of each shot using the global video shot (typically a
linear combination or a variant)
04/10/12 LIG - Nadia Derbas 9
10. Submitted runs
●
LIG-1: 0.3138
●
Hierarchical fusion of all available descriptor/classifier combinations
including the concept score feedback descriptor including temporal re-
ranking
●
LIG-2: 0.3122
●
Hierarchical fusion of all available descriptor/classifier combinations
including temporal re-ranking
●
LIG-3: 0.3138
●
Hierarchical fusion of all available descriptor/classifier combinations
including the concept score feedback descriptor
●
LIG-4: 0.3122
●
Hierarchical fusion of all available descriptor/classifier combinations
04/10/12 LIG - Nadia Derbas 10
11. Submitted runs
Metric MAP@100 MAP P@100
Best 0.6506 0.3183 0.4833
LIG-1 0.3138 0.1723 0.3167
LIG-2 0.3122 0.1731 0.3034
LIG-3 0.3138 0.1307 0.3166
LIG-4 0.3122 0.1259 0.3033
Median 0.3122 0.1249 0.2600
04/10/12 LIG - Nadia Derbas 11
12. Conclusion
●
Temporal re-ranking always improve the result or has no significant
effect
●
Conceptual feedback improve the precision in the head of the
returned list (MAP@100, P@100)
●
Motion descriptors
●
Audio was used (small contribution) but not ASR
●
Improvements still possible
04/10/12 LIG - Nadia Derbas 12
13. Thank you for your attention!
Questions?
04/10/12 LIG - Nadia Derbas 13