The objective of this paper is to provide an overview of the Synchronization of Multi-User Event Media (SEM) Task, which is part of the MediaEval Benchmark for Multimedia Evaluation. The SEM task was initially presented at MediaEval in 2014, with the goal of proposing a challenge in aligning multiple users’ photo galleries related to the same event but with unreliable timestamps. Besides aligning the pictures on a common timeline, participants were also required to detect the sub-events and cluster the pictures accordingly. For 2015 we have decided to extend the task also to other types of media, thus including audio and video information for a more complete and diversified representation of the analyzed event.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
MediaEval 2015 - Synchronization of Multi-User Event Media at MediaEval 2015: Task Description, Datasets and Evaluation
1. Synchronization of
Multi-User Event Media (SEM) Task
2015
Nicola Conci (Univ. of Trento)
Francesco G.B. De Natale (Univ. of Trento)
Vasileios Mezaris (ITI – CERTH)
Mike Matton (VRT)
2. Motivation
• People collect and share dozens of media through social
networks, cloud services, Internet.
• Having access to all this data, users can create their own
version of the event:
– Summaries.
– Stories, presenting the media on a single timeline.
– Personalized albums, allowing the selection of media that concern a
specific user.
– Contextualized albums, containing information about the event
captured by different users.
3. Motivation
• Such a large amount of data is often unstructured and
heterogeneous.
• It is desirable to find a consistent way of presenting the media
galleries captured during an event.
• This task is not trivial, since timing and location information
attached to the captured media (mostly timestamps and GPS)
could be inaccurate or missing.
4. Aims and Objectives
• Assuming a multi-users scenario (10+), each collecting a
certain number of media (photos, videos, audio files), the goal
is to align them along a common timeline (time
synchronization).
• Detect the main sub-events in the entire gallery (sub-event
clustering).
Given N image collections (galleries) taken by different users/devices at the
same event, find the best (relative) time alignment among them and detect
the significant sub-events over the whole gallery
5. Datasets
• The working assumptions are as follows:
– Media of each dataset are split to galleries (the media of a single
user).
– Each gallery may be composed of photos and video clips (or audio
files) taken from the same device.
– Each gallery will be consistent in terms of time and location
information, when available.
– Teams can use any kind of available information related to the media
items: tags, annotation, timestamp, GPS, content, as well as possibly
related information available on the internet.
6. Datasets
• We have provided 4 different datasets:
– Tour De France 2014 (TDF14): Photos taken during an annual multiple
stage bicycle race and collected from Flickr.
– NAMM Show 2015 (NAMM15): One of the world's largest trade-only
event for the music products industry, with several booths and live
shows.
– Salford Test Shoot (SAL): A series of musical performances captured
using both professional- and consumer-grade equipment performed
by an ensemble of ten musicians from the BBC Philharmonic
Orchestra.
– Spring Parti Salesiani 2015 (SPS15): A dataset recorded during a local
music and food event in Trento, Italy. Composed of videos and photos
captured by the attendees during the event.
7. Tour De France 2014 Dataset
Leeds – Harrogate (United Kingdom),
July 5th, 2014
Évry – Paris Champs-Élysées,
July 27th, 2014
Gérardmer – Mulhouse,
July 13th 2014
Saint-Gaudens –
Saint-Lary-Soulan Pla d’Adet,
July 24th, 2014
. . .
. . .
8. NAMM Show 2015 Dataset
Josh Damigo
performing on the Marriott Stage,
January 23rd, 2015
Deer Park Avenue
performing in the Gibson Guitars showroom,
January 25th, 2015
The Bangles
performing at the WiMN "She Rocks" Awards,
January 23rd, 2015
Dilana
performing on the GoPro stage,
January 24th, 2015
. . .
. . .
10. Spring Parti Salesiani Dataset
. . .
. . .
Preparation Bands live show
Testimonies & Speeches Dj Party
11. Datasets
Number of
photos in
the dataset
Number of
videos in the
dataset
Number of
audio files in
the dataset
Number of
galleries
Number of sub-
events consisting the
event
TDF14 2471 - - 33 89
NAMM15 420 32 - 19 97
SAL - 129 894 34 10
SPS15 189 101 - 11 4
12. Datasets
• Datasets consist of various media types (photos, videos and
audio files). The videos also have an audio track.
• The ground truth for the datasets was built by considering the
acquisition time of the media and manually verified to check
the consistency with respect to the captured event.
13. Datasets
• SEM 2015 datasets are publicly available for download and
use by the research community at:
http://mmlab.disi.unitn.it/MediaEvalSEM2015/
except SAL dataset, which is available at:
https://icosole.lab.vrt.be/viewer/home
(dataset + ground truth + evaluation script)
• SEM 2014 datasets (Vancouver and London Olympic games)
are also available at:
http://mmlab.disi.unitn.it/MediaEvalSEM2014/
14. Metrics for evaluation
• For the synchronization the goal is to maximize the number of
galleries, for which the synchronization error is below a
predefined threshold, (with respect to a reference
gallery).
– Precision measures the number of galleries (M) over the total number
of galleries (N-1, excluding the reference):
– Accuracy is the average temporal offset calculated over the
synchronized collections, normalized with respect to ∆����:
15. Metrics for evaluation
• For the sub-event clustering evaluation we use the F1 score:
In the formulation above we declare a true positive (TP) when two photos related
to the same sub-event are put in the same cluster. False positives (FP) occur when
two photos are assigned to the same cluster although belonging to different sub-
events, and a false negative (FN) when two photos belonging to different sub-
events are assigned to the same cluster.
17. Team scores
TDF14 NAMM15 SAL SPS15
F1 Score F1 Score F1 Score F1 Score
JRS 0.2538 0.1454 - -
CERTH-ITI-MM
(task organizer) 0.1134 0.3658 0.1640 -
Sub-event Clustering
18. Conclusions
• Datasets this year contain a mix of different file types (still
photos, various formats of video files, audio files).
• Due to the considerable diversity of datasets, we conclude
that it is very challenging for a single approach to effectively
handle this data.
• We notice, depending on the dataset, teams have either
achieved good precision (synchronized most of the galleries),
or good accuracy (synchronized galleries correctly).
• 2 participants and a total of 6 runs make it difficult to draw
more detailed conclusions.
19. Thank you for your attention!
Questions?
More information and contact:
Dr. Vasileios Mezaris
bmezaris@iti.gr
http://www.iti.gr/~bmezaris