SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Media REVEALr: A social multimedia
monitoring and intelligence system for Web
multimedia verification
Katerina Andreadou1, Symeon Papadopoulos1, Lazaros Apostolidis1,
Anastasia Krithara2 and Yiannis Kompatsiaris1,
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
2National Centre for Scientific Research ‘Demokritos’ (NCSR ’D’)
PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam
Can multimedia on the Web be trusted?
#2
Real photo
captured April 2011 by WSJ
but
heavily tweeted during Hurricane Sandy
(29 Oct 2012)
Tweeted by multiple sources &
retweeted multiple times
Original online at:
http://blogs.wsj.com/metropolis/2011/04/28/weather-
journal-clouds-gathered-but-no-tornado-damage/
The Problem
• Everyone can easily publish content on the Web
• Content can be easily repurposed and manipulated
• News outlets are competing for views and clicks 
Pressure for airing stories very quickly leaves very
little room for verification.  Very often, even well-
reputed news providers fall for fake news content.
• Multiple tools and services available for individual
tasks  complex verification process
Very hard and time consuming to check the veracity
of Web multimedia
#3
Media REVEALr
• Developed within the REVEAL project:
http://revealproject.eu/
• Framework for collecting, indexing and browsing
multimedia content from the Web and social media
• Support for verification:
– Near-duplicate detection against an indexed collection
– Clustering of social media posts by visual similarity 
comparative view of the same incident
– Aggregation and visualization of Named Entities around an
incident
#4
Related Work
• Majority of works have focused on problem of topic
detection and summarization:
– TwitInfo (Marcus et al., 2011)
– Twittermonitor (Mathioudakis & Koudas, 2010)
– Meme detection & prediction (Weng et al., 2014)
• Visual memes and clustering
– Visual meme tracking (Xie et al., 2011)
– Supervised multimodal clustering (Petkos et al., 2012)
• Image manipulation tracking
– Internet image archaeology (Kennedy & Chang, 2008)
#5
Overview of Media REVEALr
#6
Media collection
Media pre-processing &
feature extraction
Media analysis, mining &
indexing
Persistence
Access (API)
Visualization, front-end
TEXT VISUAL
Named Entity Detection
• Brevity and noisy nature of text in social media poses
a serious challenge
• Employed solution:
– Pre-processing: tokenization, user mention resolution, text
cleaning
– Stanford NER + user mention resolution
– Regular expressions to remove special characters and
symbols (e.g., #, @, URLs, etc.)
#7
Visual Indexing
• Content-based image retrieval to solve Near-
Duplicate Search (NDS) problem
• Based on local descriptors (SURF), aggregation
(VLAD), dimensionality reduction (PCA), quantization
(PQ) and indexing (IVFADC)
• State-of-the-art visual similarity search
– High precision/recall
– Very efficient and scalable implementation (search many
millions of images in a few msec, maintain full index in
memory using ~1GB/10M images)
#8
Improving NDS Resilience (NDS+)
• Often, NDS performance suffers from overlay
graphics and fonts
• To address this issue, we integrate a descriptor-level
classifier that tries to remove the font/graphic
descriptors from the VLAD vector
#9
Example: Filtering Out Font Descriptors
• Assuming that in most cases the classifier is correct,
the resulting VLAD vector is of much higher quality
compared to the one without filtering
#10
Classifier Details
• Random Forest used as base classifier
• Cost Sensitive meta-classifier to penalize
misclassification of True Positives
• Challenge due to Class Imbalance (overlay
descriptors << useful image content descriptors)
– Cost Sensitive meta-classifier performs over-sampling of
minority class to balance the training set
• Training set created by collecting images with
overlays (e.g. memes) from the Web and manually
annotating them (selecting areas w. fonts/overlays)
#11
Mining: Clustering and Aggregation
• Visual aggregation
– DBSCAN on the visual feature representation (PCA-
reduced VLAD vectors)
– Element (tweet) selected based on the largest amount of
keywords (expected to result in more information)
• Entity aggregation
– NER on individual items
– Entity categorization ( Persons, Location, Organizations)
– Entity ranking based on frequency of occurrence
#12
User Interface: Collections View
#13
User Interface: Items View & Search
#14
User Interface: Clusters View
#15
User Interface: Entities View
#16
Evaluation: NER
• Manual annotation of 400 tweets from the SNOW
Data Challenge dataset (Papadopoulos et al., 2014)
• Measure: Accuracy  instance is considered correct
when both entity and type are correctly identified
• Three competing solutions:
– Base Stanford NER (S-NER)
– S-NER + Extensions/Post-processiong (S-NER+)
– Ellogon library (http://www.ellogon.org)
#17
Evaluation: NDS
• Benchmark Datasets
– Holidays: 1,491 images, 500 queries (Jegou et al., 2008)
– Oxford: 5,063 images, 55 queries (Philbin et al., 2008)
– Paris: 6,412 images, 55 queries (Philbin et al., 2008)
• Accuracy: mean Average Precision (mAP)
#18
CLEAN DATASET NOISY DATASET
Evaluation: NDS
• Execution Time (msec)
• Example
#19
INDEXED IMAGE
QUERY IMAGE
NDS: #27
NDS+: #1
Use Cases: Real-world Datasets
#20
sandy boston malaysia ferry
NDS Use Case (boston)
#21
Clustering Use Case (boston)
• Visual clustering enables comparative view and analysis over
time (in this case showing increasing confidence on picture).
• When journalists see many similar photos of the same scene,
they have more confidence that it is real and not fabricated.
#22
Entity Aggregation Use Case (snow)
#23
LOCATIONS PERSONS ORGANIZATIONS
Conclusion
• Key contributions
– Framework and web application offering valuable
verification support for Web multimedia
– High-quality individual components for NER, NDS,
clustering and aggregation
• Future Work
– Incremental image clustering
– Temporal views to explore evolution of a story
– Multimedia forensics toolbox (splice, copy-move
detection)
#24
Future Work: Web Multimedia Forensics
• Possibility to offer image manipulation detection as a
service for arbitrary Web images
– challenges: social media platforms incur additional
transformations (scaling, JPEG recompression, etc.) making
the problem much more complex
#25
References (1/2)
• A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller.
Twitinfo: Aggregating and visualizing microblogs for event exploration. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
CHI '11, pages 227-236, New York, NY, USA, 2011. ACM
• M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter
stream. In Proceedings of the 2010 ACM SIGMOD International Conference on
Management of Data, SIGMOD '10, pages 1155-1158, New York, NY, USA, 2010.
ACM
• G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Social event detection using
multimodal clustering and integrating supervisory signals. In Proceedings of the
2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 23:1-
23:8, New York, NY, USA, 2012. ACM
• L. Weng, F. Menczer, and Y. Ahn. Predicting successful memes using network and
community structure. CoRR, abs/1403.6199, 2014
• L. Xie, A. Natsev, J. R. Kender, M. Hill, and J. R. Smith. Visual memes in social
media: Tracking real-world news in youtube videos. In Proceedings of the 19th
ACM International Conference on Multimedia, MM '11, pages 53{62, New York,
NY, USA, 2011. ACM
#26
References (2/2)
• L. Kennedy and S.-F. Chang. Internet image archaeology: Automatically
tracing the manipulation history of photographs on the web. In
Proceedings of the 16th ACM International Conference on Multimedia,
MM '08, pages 349{358, New York, NY, USA, 2008. ACM
• H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak
geometric consistency for large scale image search. In Proceedings of the
10th European Conference on Computer Vision: Part I, ECCV '08, pages
304-317, Berlin, Heidelberg, 2008. Springer-Verlag
• S. Papadopoulos, D. Corney, and L. M. Aiello. SNOW 2014 Data Challenge:
Assessing the performance of news topic detection methods in social
media. In Proceedings of the SNOW 2014 Data Challenge Workshop co-
located with 23rd International World Wide Web Conference (WWW
2014), Seoul, Korea, April 8, 2014, pages 1-8, 2014.
• J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in
quantization: Improving particular object retrieval in large scale image
databases. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2008), pages 1-8, June 2008.
#27
Thank you!
• Resources:
Slides: http://www.slideshare.net/sympapadopoulos/mediarevealr
Code: https://github.com/MKLab-ITI/reveal-media-crawler
https://github.com/MKLab-ITI/multimedia-indexing
Data: https://github.com/MKLab-ITI/image-verification-corpus
• Get in touch:
@sympapadopoulos / papadop@iti.gr
@kandreads / kandreadou@iti.gr
#28

Weitere ähnliche Inhalte

Was ist angesagt?

Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...
Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...
Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...summersocialwebshop
 
Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Shalin Hai-Jew
 
Stanford Info Seminar: Unfollowing and Emotion on Twitter
Stanford Info Seminar: Unfollowing and Emotion on TwitterStanford Info Seminar: Unfollowing and Emotion on Twitter
Stanford Info Seminar: Unfollowing and Emotion on Twittermor
 
Machine Learning:
Machine Learning:Machine Learning:
Machine Learning:butest
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Final thesis preso april 2013
Final thesis preso april 2013Final thesis preso april 2013
Final thesis preso april 2013Molly Sauter
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis, University of Maryland
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionCory Andrew Henson
 
Online social network
Online social networkOnline social network
Online social networkingenioustech
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smithMarc Smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLocal Social Summit
 
Generative models of online discussion threads (ASONAM 2018 tutorial)
Generative models of online discussion threads (ASONAM 2018 tutorial)Generative models of online discussion threads (ASONAM 2018 tutorial)
Generative models of online discussion threads (ASONAM 2018 tutorial)Pablo Aragón
 

Was ist angesagt? (16)

Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...
Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...
Jessica Vitak, "When Contexts Collapse: Managing Self-Presentation Across Soc...
 
Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL
 
Stanford Info Seminar: Unfollowing and Emotion on Twitter
Stanford Info Seminar: Unfollowing and Emotion on TwitterStanford Info Seminar: Unfollowing and Emotion on Twitter
Stanford Info Seminar: Unfollowing and Emotion on Twitter
 
Machine Learning:
Machine Learning:Machine Learning:
Machine Learning:
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Final thesis preso april 2013
Final thesis preso april 2013Final thesis preso april 2013
Final thesis preso april 2013
 
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
Information Visualization for Social Network Analysis,
 Information Visualization for Social Network Analysis,  Information Visualization for Social Network Analysis,
Information Visualization for Social Network Analysis,
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Aahb workshop
Aahb workshopAahb workshop
Aahb workshop
 
NodeXL Research
NodeXL ResearchNodeXL Research
NodeXL Research
 
Online social network
Online social networkOnline social network
Online social network
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
Generative models of online discussion threads (ASONAM 2018 tutorial)
Generative models of online discussion threads (ASONAM 2018 tutorial)Generative models of online discussion threads (ASONAM 2018 tutorial)
Generative models of online discussion threads (ASONAM 2018 tutorial)
 

Andere mochten auch

News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...REVEAL - Social Media Verification
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingREVEAL - Social Media Verification
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachREVEAL - Social Media Verification
 
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, GermanyREVEAL - Social Media Verification
 
Veracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsVeracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsREVEAL - Social Media Verification
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches REVEAL - Social Media Verification
 
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ..."Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ...REVEAL - Social Media Verification
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...REVEAL - Social Media Verification
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Andere mochten auch (15)

WWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
WWW2015 - RDSM2015 Workshop - Trust and Credibility AnalysisWWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
WWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
 
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawling
 
REVEAL Project - Trust and Credibility Analysis
REVEAL Project - Trust and Credibility AnalysisREVEAL Project - Trust and Credibility Analysis
REVEAL Project - Trust and Credibility Analysis
 
Prix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social NewsgatheringPrix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social Newsgathering
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 
Veracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsVeracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking News
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches
 
News-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networksNews-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networks
 
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ..."Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Ähnlich wie Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Eventsmor
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discoveryruss9595
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsunyil96
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide webunyil96
 
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYBUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYJonathon Hare
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software AnalyticsMargaret-Anne Storey
 
Social databases - A brief overview
Social databases - A brief overviewSocial databases - A brief overview
Social databases - A brief overviewIván Sanchez Vera
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics:  Why Big Data Needs Rich DataLies, Damned Lies and Software Analytics:  Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich DataMargaret-Anne Storey
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSotiris Beis
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Design and Development of an Algorithm for Image Clustering In Textile Image ...
Design and Development of an Algorithm for Image Clustering In Textile Image ...Design and Development of an Algorithm for Image Clustering In Textile Image ...
Design and Development of an Algorithm for Image Clustering In Textile Image ...IJCSEA Journal
 
The network structure of visited locations according to geotagged social medi...
The network structure of visited locations according to geotagged social medi...The network structure of visited locations according to geotagged social medi...
The network structure of visited locations according to geotagged social medi...Zaenal Akbar
 
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...maranlar
 

Ähnlich wie Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
ECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for EventsECIR 2013 Keynote - Time for Events
ECIR 2013 Keynote - Time for Events
 
Ben Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of DiscoveryBen Shneiderman: Thrill of Discovery
Ben Shneiderman: Thrill of Discovery
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Image retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systemsImage retrieval from the world wide web issues, techniques, and systems
Image retrieval from the world wide web issues, techniques, and systems
 
Image retrieval from the world wide web
Image retrieval from the world wide webImage retrieval from the world wide web
Image retrieval from the world wide web
 
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORYBUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
BUILDING A SCALABLE MULTIMEDIA WEB OBSERVATORY
 
Visualization for Software Analytics
Visualization for Software AnalyticsVisualization for Software Analytics
Visualization for Software Analytics
 
Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010
 
Social databases - A brief overview
Social databases - A brief overviewSocial databases - A brief overview
Social databases - A brief overview
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
unit 1 DATA MINING.ppt
unit 1 DATA MINING.pptunit 1 DATA MINING.ppt
unit 1 DATA MINING.ppt
 
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics:  Why Big Data Needs Rich DataLies, Damned Lies and Software Analytics:  Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Datta
DattaDatta
Datta
 
Design and Development of an Algorithm for Image Clustering In Textile Image ...
Design and Development of an Algorithm for Image Clustering In Textile Image ...Design and Development of an Algorithm for Image Clustering In Textile Image ...
Design and Development of an Algorithm for Image Clustering In Textile Image ...
 
The network structure of visited locations according to geotagged social medi...
The network structure of visited locations according to geotagged social medi...The network structure of visited locations according to geotagged social medi...
The network structure of visited locations according to geotagged social medi...
 
Social Multimedia as Sensors
Social Multimedia as SensorsSocial Multimedia as Sensors
Social Multimedia as Sensors
 
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
Multimedia Information Retrieval: Bytes and pixels meet the challenges of hum...
 

Kürzlich hochgeladen

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfROWELL MARQUINA
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxKunal Gupta
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Memoori
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023Joshua Flannery
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Antonio de Llamas
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerAnchore
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementDianaGray10
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxatharvdev2010
 

Kürzlich hochgeladen (20)

React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdfHCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
HCI Lesson 1 - Introduction to Human-Computer Interaction.pdf
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Dublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptxDublin_mulesoft_meetup_API_specifications.pptx
Dublin_mulesoft_meetup_API_specifications.pptx
 
Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!Laying the Data Foundations for Artificial Intelligence!
Laying the Data Foundations for Artificial Intelligence!
 
QMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdfQMMS Lesson 2 - Using MS Excel Formula.pdf
QMMS Lesson 2 - Using MS Excel Formula.pdf
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+
 
Software Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey HightowerSoftware Security in the Real World w/Kelsey Hightower
Software Security in the Real World w/Kelsey Hightower
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Automation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions managementAutomation Ops Series: Session 3 - Solutions management
Automation Ops Series: Session 3 - Solutions management
 
A PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptxA PowerPoint Presentation on Vikram Lander pptx
A PowerPoint Presentation on Vikram Lander pptx
 

Mediarevealr: A social multimedia monitoring and intelligence system for Web multimedia verication

  • 1. Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verification Katerina Andreadou1, Symeon Papadopoulos1, Lazaros Apostolidis1, Anastasia Krithara2 and Yiannis Kompatsiaris1, 1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) 2National Centre for Scientific Research ‘Demokritos’ (NCSR ’D’) PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam
  • 2. Can multimedia on the Web be trusted? #2 Real photo captured April 2011 by WSJ but heavily tweeted during Hurricane Sandy (29 Oct 2012) Tweeted by multiple sources & retweeted multiple times Original online at: http://blogs.wsj.com/metropolis/2011/04/28/weather- journal-clouds-gathered-but-no-tornado-damage/
  • 3. The Problem • Everyone can easily publish content on the Web • Content can be easily repurposed and manipulated • News outlets are competing for views and clicks  Pressure for airing stories very quickly leaves very little room for verification.  Very often, even well- reputed news providers fall for fake news content. • Multiple tools and services available for individual tasks  complex verification process Very hard and time consuming to check the veracity of Web multimedia #3
  • 4. Media REVEALr • Developed within the REVEAL project: http://revealproject.eu/ • Framework for collecting, indexing and browsing multimedia content from the Web and social media • Support for verification: – Near-duplicate detection against an indexed collection – Clustering of social media posts by visual similarity  comparative view of the same incident – Aggregation and visualization of Named Entities around an incident #4
  • 5. Related Work • Majority of works have focused on problem of topic detection and summarization: – TwitInfo (Marcus et al., 2011) – Twittermonitor (Mathioudakis & Koudas, 2010) – Meme detection & prediction (Weng et al., 2014) • Visual memes and clustering – Visual meme tracking (Xie et al., 2011) – Supervised multimodal clustering (Petkos et al., 2012) • Image manipulation tracking – Internet image archaeology (Kennedy & Chang, 2008) #5
  • 6. Overview of Media REVEALr #6 Media collection Media pre-processing & feature extraction Media analysis, mining & indexing Persistence Access (API) Visualization, front-end TEXT VISUAL
  • 7. Named Entity Detection • Brevity and noisy nature of text in social media poses a serious challenge • Employed solution: – Pre-processing: tokenization, user mention resolution, text cleaning – Stanford NER + user mention resolution – Regular expressions to remove special characters and symbols (e.g., #, @, URLs, etc.) #7
  • 8. Visual Indexing • Content-based image retrieval to solve Near- Duplicate Search (NDS) problem • Based on local descriptors (SURF), aggregation (VLAD), dimensionality reduction (PCA), quantization (PQ) and indexing (IVFADC) • State-of-the-art visual similarity search – High precision/recall – Very efficient and scalable implementation (search many millions of images in a few msec, maintain full index in memory using ~1GB/10M images) #8
  • 9. Improving NDS Resilience (NDS+) • Often, NDS performance suffers from overlay graphics and fonts • To address this issue, we integrate a descriptor-level classifier that tries to remove the font/graphic descriptors from the VLAD vector #9
  • 10. Example: Filtering Out Font Descriptors • Assuming that in most cases the classifier is correct, the resulting VLAD vector is of much higher quality compared to the one without filtering #10
  • 11. Classifier Details • Random Forest used as base classifier • Cost Sensitive meta-classifier to penalize misclassification of True Positives • Challenge due to Class Imbalance (overlay descriptors << useful image content descriptors) – Cost Sensitive meta-classifier performs over-sampling of minority class to balance the training set • Training set created by collecting images with overlays (e.g. memes) from the Web and manually annotating them (selecting areas w. fonts/overlays) #11
  • 12. Mining: Clustering and Aggregation • Visual aggregation – DBSCAN on the visual feature representation (PCA- reduced VLAD vectors) – Element (tweet) selected based on the largest amount of keywords (expected to result in more information) • Entity aggregation – NER on individual items – Entity categorization ( Persons, Location, Organizations) – Entity ranking based on frequency of occurrence #12
  • 14. User Interface: Items View & Search #14
  • 17. Evaluation: NER • Manual annotation of 400 tweets from the SNOW Data Challenge dataset (Papadopoulos et al., 2014) • Measure: Accuracy  instance is considered correct when both entity and type are correctly identified • Three competing solutions: – Base Stanford NER (S-NER) – S-NER + Extensions/Post-processiong (S-NER+) – Ellogon library (http://www.ellogon.org) #17
  • 18. Evaluation: NDS • Benchmark Datasets – Holidays: 1,491 images, 500 queries (Jegou et al., 2008) – Oxford: 5,063 images, 55 queries (Philbin et al., 2008) – Paris: 6,412 images, 55 queries (Philbin et al., 2008) • Accuracy: mean Average Precision (mAP) #18 CLEAN DATASET NOISY DATASET
  • 19. Evaluation: NDS • Execution Time (msec) • Example #19 INDEXED IMAGE QUERY IMAGE NDS: #27 NDS+: #1
  • 20. Use Cases: Real-world Datasets #20 sandy boston malaysia ferry
  • 21. NDS Use Case (boston) #21
  • 22. Clustering Use Case (boston) • Visual clustering enables comparative view and analysis over time (in this case showing increasing confidence on picture). • When journalists see many similar photos of the same scene, they have more confidence that it is real and not fabricated. #22
  • 23. Entity Aggregation Use Case (snow) #23 LOCATIONS PERSONS ORGANIZATIONS
  • 24. Conclusion • Key contributions – Framework and web application offering valuable verification support for Web multimedia – High-quality individual components for NER, NDS, clustering and aggregation • Future Work – Incremental image clustering – Temporal views to explore evolution of a story – Multimedia forensics toolbox (splice, copy-move detection) #24
  • 25. Future Work: Web Multimedia Forensics • Possibility to offer image manipulation detection as a service for arbitrary Web images – challenges: social media platforms incur additional transformations (scaling, JPEG recompression, etc.) making the problem much more complex #25
  • 26. References (1/2) • A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and visualizing microblogs for event exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '11, pages 227-236, New York, NY, USA, 2011. ACM • M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1155-1158, New York, NY, USA, 2010. ACM • G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 23:1- 23:8, New York, NY, USA, 2012. ACM • L. Weng, F. Menczer, and Y. Ahn. Predicting successful memes using network and community structure. CoRR, abs/1403.6199, 2014 • L. Xie, A. Natsev, J. R. Kender, M. Hill, and J. R. Smith. Visual memes in social media: Tracking real-world news in youtube videos. In Proceedings of the 19th ACM International Conference on Multimedia, MM '11, pages 53{62, New York, NY, USA, 2011. ACM #26
  • 27. References (2/2) • L. Kennedy and S.-F. Chang. Internet image archaeology: Automatically tracing the manipulation history of photographs on the web. In Proceedings of the 16th ACM International Conference on Multimedia, MM '08, pages 349{358, New York, NY, USA, 2008. ACM • H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I, ECCV '08, pages 304-317, Berlin, Heidelberg, 2008. Springer-Verlag • S. Papadopoulos, D. Corney, and L. M. Aiello. SNOW 2014 Data Challenge: Assessing the performance of news topic detection methods in social media. In Proceedings of the SNOW 2014 Data Challenge Workshop co- located with 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, pages 1-8, 2014. • J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pages 1-8, June 2008. #27
  • 28. Thank you! • Resources: Slides: http://www.slideshare.net/sympapadopoulos/mediarevealr Code: https://github.com/MKLab-ITI/reveal-media-crawler https://github.com/MKLab-ITI/multimedia-indexing Data: https://github.com/MKLab-ITI/image-verification-corpus • Get in touch: @sympapadopoulos / papadop@iti.gr @kandreads / kandreadou@iti.gr #28

Hinweis der Redaktion

  1. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/