This document provides an overview of the Retrieving Diverse Social Images Task that took place in 2017. The task involved refining image search results for queries by providing a ranked list of up to 50 photos that are both relevant and diverse representations of the query. 6 teams participated by submitting a total of 29 runs that were evaluated based on precision, cluster recall, and F1-measure at different cut-off levels. The best performing run used neural network-based clustering and text relevance feedback to improve over the initial Flickr search results. The document also discusses the provided dataset, participant results, and brief conclusions about successful methods and opportunities for future improvements.
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
1. Retrieving Diverse Social Images Task
- task overview -
2017
University Politehnica
of Bucharest
Maia Zaharieva (TUW, Austria)
Bogdan Ionescu (UPB, Romania)
Alexandru Lucian Gînscǎ (CEA LIST, France)
Rodrygo L.T. Santos (UFMG, Brazil)
Henning Müller (HES-SO in Sierre, Switzerland)
Bogdan Boteanu (UPB, Romania)
September 13-15, Dublin, Irelandce
Universidade Federal de
Minas Gerais, Brazil
2. The Retrieving Diverse Social Images Task
Dataset and Evaluation
Participants
Results
Discussion and Perspectives
2
Outline
3. 3
Diversity Task: Objective & Motivation
Objective: image search result diversification in the context of
social photo retrieval.
Why diversifying search results?
- to respond to the needs of different users;
- as a method of tackling queries with unclear information needs;
- to widen the pool of possible results (increase performance);
- to reduce the number/redundancy of the returned items;
…
7. 7
Diversity Task: Definition
For each query, participants receive a ranked list of photos retrieved
from Flickr using its default “relevance” algorithm.
Query = general-purpose, multi-topic term
e.g.: autumn colors, bee on a flower, home office, snow in
the city, holding hands, ...
Goal of the task: refine the results by providing a ranked list of up
to 50 photos (summary) that are considered to be both relevant and
diverse representations of the query.
relevant: a common photo representation of the query topics (all at once);
bad quality photos (e.g., severely blurred, out of focus) are not considered
relevant in this scenario
diverse: depicting different visual characteristics of the query topics and
subtopics with a certain degree of complementarity, i.e., most of the
perceived visual information is different from one photo to another.
8. 8
Dataset: General Information & Resources
Provided information:
query text formulation;
ranked list of Creative Commons photos from Flickr*
(up to 300 photos per query);
metadata from Flickr (e.g., tags, description, views,
comments, date-time photo was taken, username, userid, etc);
visual, text & user annotation credibility descriptors;
semantic vectors for general English terms computed on top of
the English Wikipedia(wikiset);
relevance and diversity ground truth.
Photos:
Development: 110 queries 32,340 photos
Test: 84 queries 24,986 photos
9. 9
Dataset: Provided Descriptors
General purpose visual descriptors:
e.g., Auto Color Correlogram, Color and Edge Directivity
Descriptor, Pyramid of Histograms of Orientation Gradients, etc;
Convolutional Neural Network based descriptors:
Caffe framework based;
General purpose text descriptors:
e.g., term frequency information, document frequency
information and their ratio, i.e., TF-IDF;
User annotation credibility descriptors (give an automatic
estimation of the quality of users' tag-image content relationships):
e.g., measure of user image relevance, total number of images a
user shared, the percentage of images with faces.
11. 11
Dataset: Ground Truth - annotations
Relevance and diversity annotations were carried out by
expert annotators:
devset:
relevance: 8 annotators + 1 master (3 annotations/query)
diversity: 1 annotation/query
testset:
relevance: 8 annotators + 1 master (3 annotations/query)
diversity: 12 annotators (3 annotations/query)
Lenient majority voting for relevance
12. 12
Evaluation: Run Specification
Participants are required to submit up to 5 runs:
required runs:
run 1: automated using visual information only;
run 2: automated using textual information only;
run 3: automated using textual-visual fused without other
resources than provided by the organizers;
general runs:
run 4: everything allowed, e.g. human-based or hybrid human-
machine approaches, including using data from external
sources, (e.g., Internet) or pre-trained models obtained from
external datasets related to this task;
run 5: everything allowed.
13. 13
Evaluation: Official Metrics
Cluster Recall* @ X = Nc/N (CR@X)
where X is the cutoff point, N is the total number of clusters for the
current query (from ground truth, N<=25) and Nc is the number of
different clusters represented in the X ranked images;
*cluster recall is computed only for the relevant images.
Precision @ X = R/X (P@X)
where R is the number of relevant images;
F1-measure @ X = harmonic mean of CR and P (F1@X)
Metrics are reported for different values of X (5, 10, 20, 30, 40 & 50)
on per topic as well as overall (average).
official ranking F1@20
14. 14
Participants: Basic Statistics
Survey:
- 22 respondents were interested in the task;
Registration:
- 14 teams registered (1 team is organizer related);
Run submission:
- 6 teams finished the task, including 1 organizer related team;
- 29 runs were submitted;
Workshop participation:
- 5 teams are represented at the workshop.
25. 22
Brief Discussion
Methods:
this year mainly classification/clustering (& fusion), re-ranking,
relevance feedback, & neural-network based;
best run F1@20: improving relevancy (text) + neural network-based
clustering; use of visual-text information (team NLE).
Dataset:
getting very complex (read diverse);
still low resources for Creative Commons on Flickr;
descriptors were very well received (employed by all of the
participants as provided).
26. 23
Acknowledgements
Task auxiliaries:
Bogdan Boteanu, UPB, Romania & Mihai Lupu, Vienna University of
Technology, Austria
Task supporters:
Alberto Ueda, Bruno Laporais, Felipe Moraes, Lucas Chaves, Jordan
Silva, Marlon Dias, Rafael Glater
Catalin Mitrea, Mihai Dogariu, Liviu Stefan, Gabriel Petrescu, Alexandru
Toma, Alina Banica, Andreea Roxana, Mihaela Radu, Bogdan Guliman,
Sebastian Moraru