In this paper, we present the systems developed by GTMUVigo team for the Multimedia Person Discovery in Broadcast TV task at MediaEval 2015. The systems propose two different strategies for person discovery in audio through speaker diarization (one based on an online clustering strategy with error correction using OCR information and the other based on agglomerative hierarchical clustering) as well as intrashot and intershot trategies for face clustering.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
MediaEval 2015 - GTM-UVigo Systems for Person Discovery Task at MediaEval 2015
1. GTM-UVigo Systems for Person Discovery Task
at MediaEval 2015
Paula L´opez Otero, Rosal´ıa Barros, Laura Doc´ıo Fern´andez,
Elisardo Gonz´alez Agulla, Jos´e Luis Alba Castro, Carmen Garc´ıa
Mateo
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 1/6
2. Main contributions
Error correction in speaker diarization using written names
Face tracking correction using quality scores
Visual Voice activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 2/6
3. Speaker diarization + written names
Speech activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
4. Speaker diarization + written names
Speech activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
5. Speaker diarization + written names
Speech activity detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
6. Speaker diarization + written names
Speaker segmentation
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
7. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
8. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
9. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
10. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
11. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
12. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
13. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
14. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
15. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
16. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
17. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
18. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
19. Speaker diarization + written names
Speaker clustering
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 3/6
20. Face diarization + shot segmentation
Face detection and tracking
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
21. Face diarization + shot segmentation
Face detection and tracking
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
22. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
23. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
24. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
25. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
26. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
27. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
28. Face diarization + shot segmentation
Quality Filter
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
29. Face diarization + shot segmentation
Visual Voice Activity Detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
30. Face diarization + shot segmentation
Visual Voice Activity Detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
31. Face diarization + shot segmentation
Visual Voice Activity Detection
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
32. Face diarization + shot segmentation
Face recognition
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 4/6
33. Results
REPERE INA
EwMAP MAP C EwMAP MAP C
fusion 75.76 % 77.10 % 78.03 % 80.34 % 80.61 % 92.42 %
audio 69.37 % 70.90 % 78.48 % 89.38 % 89.76 % 97.34 %
video 73.94 % 75.29 % 78.03 % 80.66 % 80.94 % 92.46 %
baseline 63.58 % 63.93 % 71.75 % 78.35 % 78.64 % 92.71 %
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 5/6
34. Conclusions
Difficult scenarios:
Audio: background music, noise.
Video: face pose and distance to the camara, video quality.
Face approaches work better in REPERE, but speech
approach works better in INA.
Future work: finding a smarter way to combine speech and
video.
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 6/6
35. GTM-UVigo Systems for Person Discovery Task
at MediaEval 2015
L´opez Otero, Barros et al. — GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 6/6