In-depth Exploration of Geotagging Performance

Symeon Papadopoulos
Symeon PapadopoulosResearcher at CERTH-ITI, Co-founder at infalia um infalia
In-depth Exploration of Geotagging
Performance
using sampling strategies on YFCC100M
George Kordopatis-Zilos, Symeon Papadopoulos, Yiannis
Kompatsiaris
Information Technologies Institute, Thessaloniki, Greece
MMCommons Workshop, October 16, 2016 @ Amsterdam, NL
Where is it?
Depicted landmark
Eiffel Tower
Location
Paris, Tennessee
Keyword “Tennesee” is very important
to correctly place the photo.
Source (Wikipedia):
http://en.wikipedia.org/wiki/Eiffel_Tow
er_(Paris,_Tennessee)
Motivation
Evaluating multimedia retrieval systems
• What do we evaluate?
• How?
• What decisions do we make based on it?
MM system
(black box) Test Collection
Comparison to ground truth
Evaluation measure
Decision
Problem Formulation
• Test collection creation  Evaluation bias
• Performance reduced to a single measure 
miss a lot of nuances of performance
• Test problem: Geotagging = predicting the
geographic location of a multimedia item
based on its content
Example: Evaluating geotagging
• Test collection #1: 1M images, 700K located in US
• Assume we use P@1km as an evaluation measure
• System 1: almost perfect precision in US (100%), very poor for
rest of the world (10%)  P@1km = 0.7*100 + 0.3*10 = 73%
• System 2: approximately the same precision all over the world
(65%)  P@1km = 65%
• Test collection #2: 1M images, 500K depicting cats
and puppies on white background
• Then, for 50% of the collection any prediction is
essentially random.
Multimedia Geotagging
• Problem of estimating the geographic location of a
multimedia item (e.g. Flickr image + metadata)
• Variety of approaches:
• Text-based: use the text metadata (tags)
• Gazetteer-based
• Statistical methods (associations between tags & locations)
• Visual
• Similarity-based (find most similar and use their location)
• Model-based (learn visual model of an area)
• Hybrid
• Combine text and visual
Language Model
• Most likely cell: 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑘=1
𝑁
𝑝(𝑡 𝑘|𝑐𝑖)
• Tag-cell probability: 𝑝 𝑡 𝑐 =
𝑁 𝑢
𝑁𝑡
We will refer to this as:
Base LM (or Basic)
Language Model Extensions
• Feature selection
• Discard tags that do not provide any geographical cues
• Selection criterion: locality > 0
• Feature weighting
• More importance to tags with geographic information
• Linear combination of locality and spatial entropy
• Multiple grids
• Consider two grids: fine and coarse – if the estimate from the
fine grid falls within that of the coarse, then use that one
• Similarity Search
• Out of the selected cell, use lat/lon of most similar item to
refine location estimation
We will refer to this as:
Full LM (or Full)
MediaEval Placing Task
• Benchmarking activity in the context of MediaEval
• Dataset:
• Flickr images and videos (different each year)
• Training and test set
• Also possible to test systems that use external data
Edition Training Set Test Set
2015 4,695,149 949,889
2014 5,025,000 510,000
2013 8,539,050 262,000
Proposed Evaluation Framework
• Initial (reference) test collection Dref
• Sampling function f: Dref  Dtest
• Performance volatility
• p(D): performance score achieved in collection D
• In our case, we consider two such measures:
• P@1km
• Median distance error
Sampling Strategies
A variety of approaches for Placing Task collection:
• Geographical Uniform Sampling
• User Uniform Sampling
• Text-based Sampling
• Text Diversity Sampling
• Geographically Focused Sampling
• Ambiguity-based Sampling
• Visual Sampling
Uniform Sampling
• Geographic Uniform Sampling
• Divide earth surface into square areas of approximately
the same size (~10x10km)
• Select N items from each area (N=median of items/area)
• User Uniform Sampling
• Select only one item per user
Text Sampling
• Text-based Sampling
• Select only items with more than M terms (M: median
of terms/item)
• Text Diversity Sampling
• Represent items using bag-of-words
• Use MinHash to generate a binary code per BoW vector
• Select one item per code (bucket) B
Other Sampling Strategies
• Geographically Focused Sampling
• Pick items from a selected place (continent/country)
• Ambiguity-based Sampling
• Select the set of items that are associated with
ambiguous place names (or the complementary set)
• Ambiguity defined with the help of entropy
• Visual Sampling
• Select only items associated with a given visual concept
• Select only items associated with concepts related to
buildings
Experiments - Setup
• Placing Task 2015 dataset: 949,889 images (subset
of YFCC100M)
• Test four variants of Language Model method:
• Basic-PT: Base LM method trained on PT dataset (=4.7
geotagged images released by the task organizers)
• Full-PT: Full LM method trained on PT dataset
• Basic-Y: Base LM method trained on YFCC dataset
(=40M geotagged images of YFCC100M)
• Full-Y: Full LM method trained on YFCC dataset
Reference Results
Geographical Uniform Sampling
• Initial distribution 
• Uniform distribution:
• select three items/cell
User Uniform Sampling
Text-based Sampling
Select only images
with >7 tags/item
Text Diversity Sampling
• After MinHash, 478,817 buckets were created.
Geographically Focused Sampling
Results of Full-Y
Ambiguity-based Sampling
Visual Sampling
Summary of Results
Thank you!
Data/Code:
• https://github.com/MKLab-ITI/multimedia-geotagging/
Get in touch:
• George Kordopatis-Zilos: georgekordopatis@iti.gr
• Symeon Papadopoulos: papadop@iti.gr / @sympap
With the support of:
1 von 25

Recomendados

Placing Images with Refined Language Models and Similarity Search with PCA-re... von
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
406 views14 Folien
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S... von
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...
MediaEval 2016 - Placing Images with Refined Language Models and Similarity S...multimediaeval
147 views14 Folien
ボリュームレンダリング入門 von
ボリュームレンダリング入門ボリュームレンダリング入門
ボリュームレンダリング入門Hisanari Otsu
7K views53 Folien
linkIn_CVPR15 von
linkIn_CVPR15linkIn_CVPR15
linkIn_CVPR15Xinchao Li
303 views17 Folien
Enhancement of Old Images and Documents by Digital Image Processing Techniques. von
Enhancement of Old Images and Documents by Digital Image Processing Techniques.Enhancement of Old Images and Documents by Digital Image Processing Techniques.
Enhancement of Old Images and Documents by Digital Image Processing Techniques.Triloki Gupta
612 views23 Folien
Tracking emerges by colorizing videos von
Tracking emerges by colorizing videosTracking emerges by colorizing videos
Tracking emerges by colorizing videosOh Yoojin
217 views13 Folien

Más contenido relacionado

Was ist angesagt?

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos von
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet PhotosPR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos광희 이
558 views20 Folien
HARMONIOUS - 3D reconstruction and Stream flow monitoring von
HARMONIOUS - 3D reconstruction and Stream flow monitoringHARMONIOUS - 3D reconstruction and Stream flow monitoring
HARMONIOUS - 3D reconstruction and Stream flow monitoringSalvatore Manfreda
74 views14 Folien
EUSIPCO19 von
EUSIPCO19EUSIPCO19
EUSIPCO19Julián Tachella
216 views16 Folien
Human tracking using thermal imaging von
Human tracking  using thermal imagingHuman tracking  using thermal imaging
Human tracking using thermal imagingChandrashekhar Padole
209 views42 Folien
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing... von
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...FAO
386 views15 Folien
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU von
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
1.1K views32 Folien

Was ist angesagt?(20)

PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos von 광희 이
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet PhotosPR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
PR098: MegaDepth: Learning Single-View Depth Prediction from Internet Photos
광희 이558 views
HARMONIOUS - 3D reconstruction and Stream flow monitoring von Salvatore Manfreda
HARMONIOUS - 3D reconstruction and Stream flow monitoringHARMONIOUS - 3D reconstruction and Stream flow monitoring
HARMONIOUS - 3D reconstruction and Stream flow monitoring
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing... von FAO
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...
e-SOTER Regional pilot platform as EU contribution to a Global Soil observing...
FAO386 views
MediaEval 2016 - ININ Submission to Zero Cost ASR Task von multimediaeval
MediaEval 2016 - ININ Submission to Zero Cost ASR TaskMediaEval 2016 - ININ Submission to Zero Cost ASR Task
MediaEval 2016 - ININ Submission to Zero Cost ASR Task
multimediaeval146 views
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w... von University of Bologna
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
Real-Time Logo Detection and Tracking von melgeorge
Real-Time Logo Detection and TrackingReal-Time Logo Detection and Tracking
Real-Time Logo Detection and Tracking
melgeorge6.4K views
Provenance Analytics at AAAI Human Computation Conference 2013 von T Dong Huynh
Provenance Analytics at AAAI Human Computation Conference 2013Provenance Analytics at AAAI Human Computation Conference 2013
Provenance Analytics at AAAI Human Computation Conference 2013
T Dong Huynh272 views
MediaEval 2016 - MLPBOON Predicting Media Interestingness System von multimediaeval
MediaEval 2016 - MLPBOON Predicting Media Interestingness SystemMediaEval 2016 - MLPBOON Predicting Media Interestingness System
MediaEval 2016 - MLPBOON Predicting Media Interestingness System
multimediaeval107 views
Is Industrialism A Blessing? A Study of Anjali Deshpande’s Impeachment. von IJERA Editor
Is Industrialism A Blessing? A Study of Anjali Deshpande’s Impeachment.Is Industrialism A Blessing? A Study of Anjali Deshpande’s Impeachment.
Is Industrialism A Blessing? A Study of Anjali Deshpande’s Impeachment.
IJERA Editor155 views
Remote sensing and mapping tool development of NFA Project in Vietnam von FAO
Remote sensing and mapping tool development of NFA Project in VietnamRemote sensing and mapping tool development of NFA Project in Vietnam
Remote sensing and mapping tool development of NFA Project in Vietnam
FAO923 views
Fast directional weighted median filter for removal of random valued impulse ... von Waqas Nawaz
Fast directional weighted median filter for removal of random valued impulse ...Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...
Waqas Nawaz1.7K views
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster von multimediaeval
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
multimediaeval84 views
Evaluating effectiveness of radiometric correction for optical satellite imag... von Dang Le
Evaluating effectiveness of radiometric correction for optical satellite imag...Evaluating effectiveness of radiometric correction for optical satellite imag...
Evaluating effectiveness of radiometric correction for optical satellite imag...
Dang Le936 views

Similar a In-depth Exploration of Geotagging Performance

Geotagging Social Media Content with a Refined Language Modelling Approach von
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachSymeon Papadopoulos
1.5K views33 Folien
Geotagging Social Media Content with a Refined Language Modelling Approach von
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachREVEAL - Social Media Verification
593 views33 Folien
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by... von
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...広樹 本間
74 views40 Folien
DutchMLSchool 2022 - History and Developments in ML von
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
24 views60 Folien
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop von
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopHannes Fassold
98 views15 Folien
Artificial Intelligence for Automated Software Testing von
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software TestingLionel Briand
3.2K views147 Folien

Similar a In-depth Exploration of Geotagging Performance(20)

Geotagging Social Media Content with a Refined Language Modelling Approach von Symeon Papadopoulos
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
Symeon Papadopoulos1.5K views
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by... von 広樹 本間
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
2019 dynamically composing_domain-data_selection_with_clean-data_selection_by...
広樹 本間74 views
DutchMLSchool 2022 - History and Developments in ML von BigML, Inc
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
BigML, Inc24 views
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop von Hannes Fassold
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
Hannes Fassold98 views
Artificial Intelligence for Automated Software Testing von Lionel Briand
Artificial Intelligence for Automated Software TestingArtificial Intelligence for Automated Software Testing
Artificial Intelligence for Automated Software Testing
Lionel Briand3.2K views
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ... von SAIL_QU
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
SAIL_QU126 views
MLSEV Virtual. Searching for Anomalies von BigML, Inc
MLSEV Virtual. Searching for AnomaliesMLSEV Virtual. Searching for Anomalies
MLSEV Virtual. Searching for Anomalies
BigML, Inc310 views
Revisiting the Notion of Diversity in Software Testing von Lionel Briand
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
Lionel Briand227 views
Developing a Tutorial for Grouping Analysis in ArcGIS von COGS Presentations
Developing a Tutorial for Grouping Analysis in ArcGISDeveloping a Tutorial for Grouping Analysis in ArcGIS
Developing a Tutorial for Grouping Analysis in ArcGIS
COGS Presentations5.4K views
The Planets Testbed von Max Kaiser
The Planets TestbedThe Planets Testbed
The Planets Testbed
Max Kaiser488 views
Unsupervised Neural Machine Translation for Low-Resource Domains von taeseon ryu
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
taeseon ryu51 views
Deep Learning: Chapter 11 Practical Methodology von Jason Tsai
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
Jason Tsai2K views
Object extraction from satellite imagery using deep learning von Aly Abdelkareem
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
Aly Abdelkareem5.1K views
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive... von Vienna Data Science Group
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
The Power of Auto ML and How Does it Work von Ivo Andreev
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev2.8K views
Big Data Palooza Talk: Aspects of Semantic Processing von Na'im Tyson
Big Data Palooza Talk: Aspects of Semantic ProcessingBig Data Palooza Talk: Aspects of Semantic Processing
Big Data Palooza Talk: Aspects of Semantic Processing
Na'im Tyson1.2K views

Más de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... von
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
875 views29 Folien
Deepfakes: An Emerging Internet Threat and their Detection von
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
1.5K views50 Folien
Knowledge-based Fusion for Image Tampering Localization von
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
133 views24 Folien
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... von
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
168 views19 Folien
COVID-19 Infodemic vs Contact Tracing von
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
205 views11 Folien
Similarity-based retrieval of multimedia content von
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
814 views61 Folien

Más de Symeon Papadopoulos(20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... von Symeon Papadopoulos
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Deepfakes: An Emerging Internet Threat and their Detection von Symeon Papadopoulos
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
Symeon Papadopoulos1.5K views
Knowledge-based Fusion for Image Tampering Localization von Symeon Papadopoulos
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... von Symeon Papadopoulos
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Aggregating and Analyzing the Context of Social Media Content von Symeon Papadopoulos
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
Symeon Papadopoulos5.9K views
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers von Symeon Papadopoulos
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Perceived versus Actual Predictability of Personal Information in Social Netw... von Symeon Papadopoulos
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Web and Social Media Image Forensics for News Professionals von Symeon Papadopoulos
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
Symeon Papadopoulos1.2K views
Predicting News Popularity by Mining Online Discussions von Symeon Papadopoulos
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
Symeon Papadopoulos1.2K views

Último

3196 The Case of The East River von
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
16 views4 Folien
Organic Shopping in Google Analytics 4.pdf von
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdfGA4 Tutorials
16 views13 Folien
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
5 views12 Folien
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx von
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptxDataScienceConferenc1
5 views15 Folien
UNEP FI CRS Climate Risk Results.pptx von
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 Folien
Data Journeys Hard Talk workshop final.pptx von
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptxinfo828217
10 views18 Folien

Último(20)

Organic Shopping in Google Analytics 4.pdf von GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials16 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx von DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
UNEP FI CRS Climate Risk Results.pptx von pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Data Journeys Hard Talk workshop final.pptx von info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
SUPER STORE SQL PROJECT.pptx von khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Survey on Factuality in LLM's.pptx von NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
CRIJ4385_Death Penalty_F23.pptx von yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
CRM stick or twist workshop von info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821710 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... von DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... von DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
Short Story Assignment by Kelly Nguyen von kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
Advanced_Recommendation_Systems_Presentation.pptx von neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Cross-network in Google Analytics 4.pdf von GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views

In-depth Exploration of Geotagging Performance

  • 1. In-depth Exploration of Geotagging Performance using sampling strategies on YFCC100M George Kordopatis-Zilos, Symeon Papadopoulos, Yiannis Kompatsiaris Information Technologies Institute, Thessaloniki, Greece MMCommons Workshop, October 16, 2016 @ Amsterdam, NL
  • 2. Where is it? Depicted landmark Eiffel Tower Location Paris, Tennessee Keyword “Tennesee” is very important to correctly place the photo. Source (Wikipedia): http://en.wikipedia.org/wiki/Eiffel_Tow er_(Paris,_Tennessee)
  • 3. Motivation Evaluating multimedia retrieval systems • What do we evaluate? • How? • What decisions do we make based on it? MM system (black box) Test Collection Comparison to ground truth Evaluation measure Decision
  • 4. Problem Formulation • Test collection creation  Evaluation bias • Performance reduced to a single measure  miss a lot of nuances of performance • Test problem: Geotagging = predicting the geographic location of a multimedia item based on its content
  • 5. Example: Evaluating geotagging • Test collection #1: 1M images, 700K located in US • Assume we use P@1km as an evaluation measure • System 1: almost perfect precision in US (100%), very poor for rest of the world (10%)  P@1km = 0.7*100 + 0.3*10 = 73% • System 2: approximately the same precision all over the world (65%)  P@1km = 65% • Test collection #2: 1M images, 500K depicting cats and puppies on white background • Then, for 50% of the collection any prediction is essentially random.
  • 6. Multimedia Geotagging • Problem of estimating the geographic location of a multimedia item (e.g. Flickr image + metadata) • Variety of approaches: • Text-based: use the text metadata (tags) • Gazetteer-based • Statistical methods (associations between tags & locations) • Visual • Similarity-based (find most similar and use their location) • Model-based (learn visual model of an area) • Hybrid • Combine text and visual
  • 7. Language Model • Most likely cell: 𝑐𝑗 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑘=1 𝑁 𝑝(𝑡 𝑘|𝑐𝑖) • Tag-cell probability: 𝑝 𝑡 𝑐 = 𝑁 𝑢 𝑁𝑡 We will refer to this as: Base LM (or Basic)
  • 8. Language Model Extensions • Feature selection • Discard tags that do not provide any geographical cues • Selection criterion: locality > 0 • Feature weighting • More importance to tags with geographic information • Linear combination of locality and spatial entropy • Multiple grids • Consider two grids: fine and coarse – if the estimate from the fine grid falls within that of the coarse, then use that one • Similarity Search • Out of the selected cell, use lat/lon of most similar item to refine location estimation We will refer to this as: Full LM (or Full)
  • 9. MediaEval Placing Task • Benchmarking activity in the context of MediaEval • Dataset: • Flickr images and videos (different each year) • Training and test set • Also possible to test systems that use external data Edition Training Set Test Set 2015 4,695,149 949,889 2014 5,025,000 510,000 2013 8,539,050 262,000
  • 10. Proposed Evaluation Framework • Initial (reference) test collection Dref • Sampling function f: Dref  Dtest • Performance volatility • p(D): performance score achieved in collection D • In our case, we consider two such measures: • P@1km • Median distance error
  • 11. Sampling Strategies A variety of approaches for Placing Task collection: • Geographical Uniform Sampling • User Uniform Sampling • Text-based Sampling • Text Diversity Sampling • Geographically Focused Sampling • Ambiguity-based Sampling • Visual Sampling
  • 12. Uniform Sampling • Geographic Uniform Sampling • Divide earth surface into square areas of approximately the same size (~10x10km) • Select N items from each area (N=median of items/area) • User Uniform Sampling • Select only one item per user
  • 13. Text Sampling • Text-based Sampling • Select only items with more than M terms (M: median of terms/item) • Text Diversity Sampling • Represent items using bag-of-words • Use MinHash to generate a binary code per BoW vector • Select one item per code (bucket) B
  • 14. Other Sampling Strategies • Geographically Focused Sampling • Pick items from a selected place (continent/country) • Ambiguity-based Sampling • Select the set of items that are associated with ambiguous place names (or the complementary set) • Ambiguity defined with the help of entropy • Visual Sampling • Select only items associated with a given visual concept • Select only items associated with concepts related to buildings
  • 15. Experiments - Setup • Placing Task 2015 dataset: 949,889 images (subset of YFCC100M) • Test four variants of Language Model method: • Basic-PT: Base LM method trained on PT dataset (=4.7 geotagged images released by the task organizers) • Full-PT: Full LM method trained on PT dataset • Basic-Y: Base LM method trained on YFCC dataset (=40M geotagged images of YFCC100M) • Full-Y: Full LM method trained on YFCC dataset
  • 17. Geographical Uniform Sampling • Initial distribution  • Uniform distribution: • select three items/cell
  • 19. Text-based Sampling Select only images with >7 tags/item
  • 20. Text Diversity Sampling • After MinHash, 478,817 buckets were created.
  • 25. Thank you! Data/Code: • https://github.com/MKLab-ITI/multimedia-geotagging/ Get in touch: • George Kordopatis-Zilos: georgekordopatis@iti.gr • Symeon Papadopoulos: papadop@iti.gr / @sympap With the support of: