1. In-depth Exploration of Geotagging
Performance
using sampling strategies on YFCC100M
George Kordopatis-Zilos, Symeon Papadopoulos, Yiannis
Kompatsiaris
Information Technologies Institute, Thessaloniki, Greece
MMCommons Workshop, October 16, 2016 @ Amsterdam, NL
2. Where is it?
Depicted landmark
Eiffel Tower
Location
Paris, Tennessee
Keyword “Tennesee” is very important
to correctly place the photo.
Source (Wikipedia):
http://en.wikipedia.org/wiki/Eiffel_Tow
er_(Paris,_Tennessee)
3. Motivation
Evaluating multimedia retrieval systems
• What do we evaluate?
• How?
• What decisions do we make based on it?
MM system
(black box) Test Collection
Comparison to ground truth
Evaluation measure
Decision
4. Problem Formulation
• Test collection creation Evaluation bias
• Performance reduced to a single measure
miss a lot of nuances of performance
• Test problem: Geotagging = predicting the
geographic location of a multimedia item
based on its content
5. Example: Evaluating geotagging
• Test collection #1: 1M images, 700K located in US
• Assume we use P@1km as an evaluation measure
• System 1: almost perfect precision in US (100%), very poor for
rest of the world (10%) P@1km = 0.7*100 + 0.3*10 = 73%
• System 2: approximately the same precision all over the world
(65%) P@1km = 65%
• Test collection #2: 1M images, 500K depicting cats
and puppies on white background
• Then, for 50% of the collection any prediction is
essentially random.
6. Multimedia Geotagging
• Problem of estimating the geographic location of a
multimedia item (e.g. Flickr image + metadata)
• Variety of approaches:
• Text-based: use the text metadata (tags)
• Gazetteer-based
• Statistical methods (associations between tags & locations)
• Visual
• Similarity-based (find most similar and use their location)
• Model-based (learn visual model of an area)
• Hybrid
• Combine text and visual
7. Language Model
• Most likely cell: 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑘=1
𝑁
𝑝(𝑡 𝑘|𝑐𝑖)
• Tag-cell probability: 𝑝 𝑡 𝑐 =
𝑁 𝑢
𝑁𝑡
We will refer to this as:
Base LM (or Basic)
8. Language Model Extensions
• Feature selection
• Discard tags that do not provide any geographical cues
• Selection criterion: locality > 0
• Feature weighting
• More importance to tags with geographic information
• Linear combination of locality and spatial entropy
• Multiple grids
• Consider two grids: fine and coarse – if the estimate from the
fine grid falls within that of the coarse, then use that one
• Similarity Search
• Out of the selected cell, use lat/lon of most similar item to
refine location estimation
We will refer to this as:
Full LM (or Full)
9. MediaEval Placing Task
• Benchmarking activity in the context of MediaEval
• Dataset:
• Flickr images and videos (different each year)
• Training and test set
• Also possible to test systems that use external data
Edition Training Set Test Set
2015 4,695,149 949,889
2014 5,025,000 510,000
2013 8,539,050 262,000
10. Proposed Evaluation Framework
• Initial (reference) test collection Dref
• Sampling function f: Dref Dtest
• Performance volatility
• p(D): performance score achieved in collection D
• In our case, we consider two such measures:
• P@1km
• Median distance error
11. Sampling Strategies
A variety of approaches for Placing Task collection:
• Geographical Uniform Sampling
• User Uniform Sampling
• Text-based Sampling
• Text Diversity Sampling
• Geographically Focused Sampling
• Ambiguity-based Sampling
• Visual Sampling
12. Uniform Sampling
• Geographic Uniform Sampling
• Divide earth surface into square areas of approximately
the same size (~10x10km)
• Select N items from each area (N=median of items/area)
• User Uniform Sampling
• Select only one item per user
13. Text Sampling
• Text-based Sampling
• Select only items with more than M terms (M: median
of terms/item)
• Text Diversity Sampling
• Represent items using bag-of-words
• Use MinHash to generate a binary code per BoW vector
• Select one item per code (bucket) B
14. Other Sampling Strategies
• Geographically Focused Sampling
• Pick items from a selected place (continent/country)
• Ambiguity-based Sampling
• Select the set of items that are associated with
ambiguous place names (or the complementary set)
• Ambiguity defined with the help of entropy
• Visual Sampling
• Select only items associated with a given visual concept
• Select only items associated with concepts related to
buildings
15. Experiments - Setup
• Placing Task 2015 dataset: 949,889 images (subset
of YFCC100M)
• Test four variants of Language Model method:
• Basic-PT: Base LM method trained on PT dataset (=4.7
geotagged images released by the task organizers)
• Full-PT: Full LM method trained on PT dataset
• Basic-Y: Base LM method trained on YFCC dataset
(=40M geotagged images of YFCC100M)
• Full-Y: Full LM method trained on YFCC dataset