SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Challenges of
Computational Verification in Social Media
Christina Boididou1, Symeon Papadopoulos1, Yiannis Kompatsiaris1,
Steve Schifferes2, Nic Newman2
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
2City University London – Journalism Department
WWW’14, April 8, Seoul, Korea
How trustworthy is Web multimedia?
#2
Real photo
captured April 2011 by WSJ
but
heavily tweeted during Hurricane Sandy
(29 Oct 2012)
Tweeted by multiple sources &
retweeted multiple times
Original online at:
http://blogs.wsj.com/metropolis/2011/04/28/weather-
journal-clouds-gathered-but-no-tornado-damage/
Disseminating (real?) content on Twitter
• Twitter is the platform for sharing newsworthy
content in real-time.
• Pressure for airing stories very quickly leaves very
little room for verification.
• Very often, even well-reputed news providers fall for
fake news content.
• Here, we examine the feasibility and challenges of
conducting verification of shared media content with
the help of a machine learning framework.
#3
Related: Web & OSN Spam
• Web spam is a relatively old problem, wherein the spammer
tries to “trick” search engines into thinking that a webpage is
high-quality, while it’s not (Gyongyi & Garcia-Molina, 2005).
• Spam revived in the age of social media. For instance,
spammers try to promote irrelevant links using popular
hashtags (Benevenuto et al., 2010; Stringhini et al., 2010).
Mainly focused on characterizing/detecting sources of spam
(websites, twitter accounts) rather than spam content.
#4
Z. Gyongyi and H. Garcia-Molina. Web spam taxonomy. In First international workshop on
adversarial information retrieval on the web (AIRWeb), 2005
F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In
Collaboration, Electronic messaging, Anti-abuse and Spam conference (CEAS), volume 6, 2010
G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of
the 26th Annual Computer Security Applications Conference, pages 1–9. ACM, 2010.
Related: Diffusion of Spam
• In many cases, the propagation patterns between
real and fake content are different, e.g. in the case of
the large Chile earthquakes (Mendoza et al., 2010)
• Using a few nodes of the network as “monitors”, one
could try to identify sources of fake rumours (Seo
and Mohapatra, 2012).
Still, such methods are very hard to use in real-time
settings or very soon after an event starts.
#5
M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? In
Proceedings of the first Workshop on Social Media Analytics, pages 71–79. ACM, 2010
E. Seo, P. Mohapatra, and T. Abdelzaher. Identifying rumors and their sources in social networks.
In SPIE Defense, Security, and Sensing, 2012
Related: Assessing Content Credibility
• Four types of features are considered: message,
user, topic and propagation (Castillo et al., 2011).
• Classify tweets with images as fake or not using a
machine learning approach (Gupta et al., 2013) 
Reports an accuracy of ~97%, which is a gross over-
estimation of expected real-world accuracy.
#6
C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In Proceedings of the
20th international conference on World Wide Web, pages 675–684. ACM, 2011.
A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying
fake images on twitter during hurricane sandy. In Proceedings of the 22nd international
conference on World Wide Web companion, pages 729–736, 2013
Goals/Contributions
• Distinguish between fake and real content shared on
Twitter using a supervised approach
• Provide closer to reality estimates of automatic
verification performance
• Explore methodological issues with respect to
evaluating classifier performance
• Create reusable resources
– Fake (and real) tweets (incl. images) corpus
– Open-source implementation
#7
Methodology
• Corpus Creation
– Topsy API
– Near-duplicate image detection
• Feature Extraction
– Content-based features
– User-based features
• Classifier Building & Evaluation
– Cross-validation
– Independent photo sets
– Cross-dataset training
#8
Corpus Creation
• Define a set of keywords K around an event of interest.
• Use Topsy API (keyword-based search) and keep only
tweets containing images T.
• Using independent online sources, define a set of fake
images IF and a set of real ones IR.
• Select TC ⊂ T of tweets that contain any of the images in
IF or IR.
• Use near-duplicate visual search (VLAD+SURF) to extend
TC with tweets that contain near-duplicate images.
• Manually check that the returned near-duplicates indeed
correspond to the images of IF or IR.
#9
Features
#10
# User Feature
1 Username
2 Number of friends
3 Number of followers
4 Number of followers/number of friends
5 Number of times the user was listed
6 If the user’s status contains URL
7 If the user is verified or not
# Content Feature
1 Length of the tweet
2 Number of words
3 Number of exclamation marks
4 Number of quotation marks
5 Contains emoticon (happy/sad)
6 Number of uppercase characters
7 Number of hashtags
8 Number of mentions
9 Number of pronouns
10 Number of URLs
11 Number of sentiment words
12 Number of retweets
Training and Testing the Classifier
• Care should be taken to make sure that no
knowledge from the training set enters the
test set.
• This is NOT the case when using standard
cross-validation.
#11
The Problem with Cross-Validation
#12
Training/Test tweets are randomly selected.
One of the reference fake images Multiple tweets per reference image.
Independence of Training-Test Set
#13
Training/Test tweets are constraint to correspond to
different reference images.
Cross-dataset Training-Testing
• In the most unfavourable case, the dataset used for
training should refer to a different event than the
one used for testing.
• Simulates real-world scenario of a breaking story,
where no prior information is available to news
professionals.
• Variants:
– Different event, same domain
– Different event, different domain (very challenging!)
#14
Evaluation
• Datasets
– Hurricane Sandy
– Boston Marathon bombings
• Evaluation of two sets of features
(content/user)
• Evaluation of different classifier settings
#15
Dataset – Hurricane Sandy
#16
Natural disaster held around the USA from October 22nd to 31st, 2012. Fake
images and content, such as sharks inside New York and flooded Statue of
Liberty, went viral.
Hashtags
Hurricane Sandy #hurricaneSandy
Hurricane #hurricane
Sandy #Sandy
Dataset – Boston Marathon Bombings
#17
The bombings occurred on 15 April, 2013 during the Boston Marathon
when two pressure cooker bombs exploded at 2:49 pm EDT, killing three
people and injuring an estimated 264 others.
Hashtags
Boston Marathon #bostonMarathon
Boston bombings #bostonbombings
Boston suspect #bostonSuspect
manhunt #manhunt
watertown #watertown
Tsarnaev #Tsarnaev
4chan #4chan
Sunil Tripathi #prayForBoston
Dataset Statistics
#18
Tweets with other image URLs 343939
Tweets with fake images 10758
Tweets with real images 3540
Hurricane Sandy Boston Marathon
Tweets with other image URLs 112449
Tweets with fake images 281
Tweets with real images 460
Prediction accuracy (1)
#19
• 10-fold cross validation results using different classifiers
~80%
Prediction accuracy (2)
• Results using different training and testing set from the
Hurricane Sandy dataset
#20
• Results using Hurricane Sandy for training and Boston
Marathon for testing
~75%
~58%
Sample Results
#21
• Real tweet
My friend's sister's Trampolene in Long Island.
#HurricaneSandy
Classified as real
• Real tweet
23rd street repost from @wendybarton
#hurricanesandy #nyc
Classified as fake
• Fake tweet
Sharks in people's front yard #hurricane #sandy #bringing
#sharks #newyork #crazy http://t.co/PVewUIE1
Classified as fake
• Fake tweet
Statue of Liberty + crushing waves. http://t.co/7F93HuHV
#hurricaneparty #sandy
Classified as real
Conclusion
• Challenges
– Data Collection: (a) Fake content is often removed (either
by user or by OSN admin), (b) API limitations make very
difficult the collection after an event takes place
– Classifier accuracy: Purely content-based classification can
only be of limited use, especially when used in a context of
a different event. However, one could imagine that
separate classifiers might be built for certain types of
incidents, cf. AIDR use for the recent Chile Earthquake
• Future Work
– Extend features: (a) geographic location of user (wrt.
location of incident), (b) time the tweet was posted
– Extend dataset: More events, more fake examples
#22
Thank you!
• Resources:
Slides: http://www.slideshare.net/sympapadopoulos/computational-
verification-challenges-in-social-media
Code: https://github.com/socialsensor/computational-verification
Dataset: https://github.com/MKLab-ITI/image-verification-corpus
Help us make it bigger!
• Get in touch:
@sympapadopoulos / papadop@iti.gr
@CMpoi / boididou@iti.gr
#23
#24
Sample fake and real images in Sandy
• Fake pictures shared on social media
• Real pictures shared on social media

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing Domino Data Lab
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolFarida Vis
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15Katrin Weller
 
Jan 2012 Threats Trend Report
Jan 2012 Threats Trend ReportJan 2012 Threats Trend Report
Jan 2012 Threats Trend ReportCyren, Inc
 
How information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurHow information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurFarida Vis
 
Jacque lewis - Senior Project -w/o script
Jacque lewis - Senior Project -w/o scriptJacque lewis - Senior Project -w/o script
Jacque lewis - Senior Project -w/o scriptJacque Lewis
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analyticsrupika08
 
Big data in social sciences and IT developments (ethics considerations)
Big data in social sciences and IT developments (ethics considerations)Big data in social sciences and IT developments (ethics considerations)
Big data in social sciences and IT developments (ethics considerations)Efthimios Tambouris
 
Weller pleasures+perils social media
Weller pleasures+perils social mediaWeller pleasures+perils social media
Weller pleasures+perils social mediaKatrin Weller
 
A research of software vulnerabilities
A research of software vulnerabilitiesA research of software vulnerabilities
A research of software vulnerabilitiesAlireza Aghamohammadi
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Prashant Khare
 
Crisis Information Processing - with the power of A.I.
Crisis Information Processing - with the power of A.I.Crisis Information Processing - with the power of A.I.
Crisis Information Processing - with the power of A.I.The Open University
 
Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesPetter Bae Brandtzæg
 
HiveSight - How It Works
HiveSight - How It WorksHiveSight - How It Works
HiveSight - How It Workshivesight
 
The Role of Social Influence In Security Feature Adoption, at CSCW 2015
The Role of Social Influence In Security Feature Adoption, at CSCW 2015The Role of Social Influence In Security Feature Adoption, at CSCW 2015
The Role of Social Influence In Security Feature Adoption, at CSCW 2015Jason Hong
 
Keeping up: strategic use of online social networks for librarian current awa...
Keeping up: strategic use of online social networks for librarian current awa...Keeping up: strategic use of online social networks for librarian current awa...
Keeping up: strategic use of online social networks for librarian current awa...suelibrarian
 
Leslie townsend communities - 2013
Leslie townsend   communities - 2013Leslie townsend   communities - 2013
Leslie townsend communities - 2013Ray Poynter
 
Community-based Crowdsourcing
Community-based CrowdsourcingCommunity-based Crowdsourcing
Community-based CrowdsourcingAndrea Mauri
 
Detailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and MethodsDetailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and MethodsMilap Bhanderi
 

Was ist angesagt? (20)

Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15
 
Jan 2012 Threats Trend Report
Jan 2012 Threats Trend ReportJan 2012 Threats Trend Report
Jan 2012 Threats Trend Report
 
How information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occurHow information spreads on social networks when unexpected events occur
How information spreads on social networks when unexpected events occur
 
Jacque lewis - Senior Project -w/o script
Jacque lewis - Senior Project -w/o scriptJacque lewis - Senior Project -w/o script
Jacque lewis - Senior Project -w/o script
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analytics
 
Big data in social sciences and IT developments (ethics considerations)
Big data in social sciences and IT developments (ethics considerations)Big data in social sciences and IT developments (ethics considerations)
Big data in social sciences and IT developments (ethics considerations)
 
Weller pleasures+perils social media
Weller pleasures+perils social mediaWeller pleasures+perils social media
Weller pleasures+perils social media
 
A research of software vulnerabilities
A research of software vulnerabilitiesA research of software vulnerabilities
A research of software vulnerabilities
 
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)
 
Crisis Information Processing - with the power of A.I.
Crisis Information Processing - with the power of A.I.Crisis Information Processing - with the power of A.I.
Crisis Information Processing - with the power of A.I.
 
Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sites
 
HiveSight - How It Works
HiveSight - How It WorksHiveSight - How It Works
HiveSight - How It Works
 
The Role of Social Influence In Security Feature Adoption, at CSCW 2015
The Role of Social Influence In Security Feature Adoption, at CSCW 2015The Role of Social Influence In Security Feature Adoption, at CSCW 2015
The Role of Social Influence In Security Feature Adoption, at CSCW 2015
 
Keeping up: strategic use of online social networks for librarian current awa...
Keeping up: strategic use of online social networks for librarian current awa...Keeping up: strategic use of online social networks for librarian current awa...
Keeping up: strategic use of online social networks for librarian current awa...
 
Leslie townsend communities - 2013
Leslie townsend   communities - 2013Leslie townsend   communities - 2013
Leslie townsend communities - 2013
 
Community-based Crowdsourcing
Community-based CrowdsourcingCommunity-based Crowdsourcing
Community-based Crowdsourcing
 
Detailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and MethodsDetailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and Methods
 

Ähnlich wie Computational Verification Challenges in Social Media

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Presentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationInVID Project
 
APEC TEL 58: Social Media Security
APEC TEL 58: Social Media SecurityAPEC TEL 58: Social Media Security
APEC TEL 58: Social Media SecurityAPNIC
 
Risks and Security of Internet and System
Risks and Security of Internet and SystemRisks and Security of Internet and System
Risks and Security of Internet and SystemParam Nanavati
 
Social Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationSocial Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationAxel Bruns
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches REVEAL - Social Media Verification
 
Jochen Spangenberg - Sourcing stories on social media
Jochen Spangenberg - Sourcing stories on social mediaJochen Spangenberg - Sourcing stories on social media
Jochen Spangenberg - Sourcing stories on social mediaJournalism.co.uk
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22News Leaders Association's NewsTrain
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsJie Bao
 
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019Weverify
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...REVEAL - Social Media Verification
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Symeon Papadopoulos
 
When Online Computational Data Meets Offline Real World Events
When Online Computational Data Meets Offline Real World EventsWhen Online Computational Data Meets Offline Real World Events
When Online Computational Data Meets Offline Real World EventsTunghai University
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
 
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...IIIT Hyderabad
 

Ähnlich wie Computational Verification Challenges in Social Media (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Presentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verification
 
APEC TEL 58: Social Media Security
APEC TEL 58: Social Media SecurityAPEC TEL 58: Social Media Security
APEC TEL 58: Social Media Security
 
Risks and Security of Internet and System
Risks and Security of Internet and SystemRisks and Security of Internet and System
Risks and Security of Internet and System
 
Social Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationSocial Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)information
 
Identification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A SurveyIdentification and Analysis of Malicious Content on Facebook: A Survey
Identification and Analysis of Malicious Content on Facebook: A Survey
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches
 
Jochen Spangenberg - Sourcing stories on social media
Jochen Spangenberg - Sourcing stories on social mediaJochen Spangenberg - Sourcing stories on social media
Jochen Spangenberg - Sourcing stories on social media
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22
Becoming a verification ninja - Sona Patel - Fresno NewsTrain 4.22-23.22
 
Semantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer AppsSemantic Web: In Quest for the Next Generation Killer Apps
Semantic Web: In Quest for the Next Generation Killer Apps
 
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019
Presentation / invited talk by Kalina Bontcheva at Digilience 2019, Oct 2019
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
When Online Computational Data Meets Offline Real World Events
When Online Computational Data Meets Offline Real World EventsWhen Online Computational Data Meets Offline Real World Events
When Online Computational Data Meets Offline Real World Events
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...
Designing and Evaluating Techniques to
 Mitigate Misinformation Spread on 
Mi...
 

Mehr von Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 

Mehr von Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Kürzlich hochgeladen

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Kürzlich hochgeladen (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Computational Verification Challenges in Social Media

  • 1. Challenges of Computational Verification in Social Media Christina Boididou1, Symeon Papadopoulos1, Yiannis Kompatsiaris1, Steve Schifferes2, Nic Newman2 1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) 2City University London – Journalism Department WWW’14, April 8, Seoul, Korea
  • 2. How trustworthy is Web multimedia? #2 Real photo captured April 2011 by WSJ but heavily tweeted during Hurricane Sandy (29 Oct 2012) Tweeted by multiple sources & retweeted multiple times Original online at: http://blogs.wsj.com/metropolis/2011/04/28/weather- journal-clouds-gathered-but-no-tornado-damage/
  • 3. Disseminating (real?) content on Twitter • Twitter is the platform for sharing newsworthy content in real-time. • Pressure for airing stories very quickly leaves very little room for verification. • Very often, even well-reputed news providers fall for fake news content. • Here, we examine the feasibility and challenges of conducting verification of shared media content with the help of a machine learning framework. #3
  • 4. Related: Web & OSN Spam • Web spam is a relatively old problem, wherein the spammer tries to “trick” search engines into thinking that a webpage is high-quality, while it’s not (Gyongyi & Garcia-Molina, 2005). • Spam revived in the age of social media. For instance, spammers try to promote irrelevant links using popular hashtags (Benevenuto et al., 2010; Stringhini et al., 2010). Mainly focused on characterizing/detecting sources of spam (websites, twitter accounts) rather than spam content. #4 Z. Gyongyi and H. Garcia-Molina. Web spam taxonomy. In First international workshop on adversarial information retrieval on the web (AIRWeb), 2005 F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida. Detecting spammers on twitter. In Collaboration, Electronic messaging, Anti-abuse and Spam conference (CEAS), volume 6, 2010 G. Stringhini, C. Kruegel, and G. Vigna. Detecting spammers on social networks. In Proceedings of the 26th Annual Computer Security Applications Conference, pages 1–9. ACM, 2010.
  • 5. Related: Diffusion of Spam • In many cases, the propagation patterns between real and fake content are different, e.g. in the case of the large Chile earthquakes (Mendoza et al., 2010) • Using a few nodes of the network as “monitors”, one could try to identify sources of fake rumours (Seo and Mohapatra, 2012). Still, such methods are very hard to use in real-time settings or very soon after an event starts. #5 M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? In Proceedings of the first Workshop on Social Media Analytics, pages 71–79. ACM, 2010 E. Seo, P. Mohapatra, and T. Abdelzaher. Identifying rumors and their sources in social networks. In SPIE Defense, Security, and Sensing, 2012
  • 6. Related: Assessing Content Credibility • Four types of features are considered: message, user, topic and propagation (Castillo et al., 2011). • Classify tweets with images as fake or not using a machine learning approach (Gupta et al., 2013)  Reports an accuracy of ~97%, which is a gross over- estimation of expected real-world accuracy. #6 C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In Proceedings of the 20th international conference on World Wide Web, pages 675–684. ACM, 2011. A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web companion, pages 729–736, 2013
  • 7. Goals/Contributions • Distinguish between fake and real content shared on Twitter using a supervised approach • Provide closer to reality estimates of automatic verification performance • Explore methodological issues with respect to evaluating classifier performance • Create reusable resources – Fake (and real) tweets (incl. images) corpus – Open-source implementation #7
  • 8. Methodology • Corpus Creation – Topsy API – Near-duplicate image detection • Feature Extraction – Content-based features – User-based features • Classifier Building & Evaluation – Cross-validation – Independent photo sets – Cross-dataset training #8
  • 9. Corpus Creation • Define a set of keywords K around an event of interest. • Use Topsy API (keyword-based search) and keep only tweets containing images T. • Using independent online sources, define a set of fake images IF and a set of real ones IR. • Select TC ⊂ T of tweets that contain any of the images in IF or IR. • Use near-duplicate visual search (VLAD+SURF) to extend TC with tweets that contain near-duplicate images. • Manually check that the returned near-duplicates indeed correspond to the images of IF or IR. #9
  • 10. Features #10 # User Feature 1 Username 2 Number of friends 3 Number of followers 4 Number of followers/number of friends 5 Number of times the user was listed 6 If the user’s status contains URL 7 If the user is verified or not # Content Feature 1 Length of the tweet 2 Number of words 3 Number of exclamation marks 4 Number of quotation marks 5 Contains emoticon (happy/sad) 6 Number of uppercase characters 7 Number of hashtags 8 Number of mentions 9 Number of pronouns 10 Number of URLs 11 Number of sentiment words 12 Number of retweets
  • 11. Training and Testing the Classifier • Care should be taken to make sure that no knowledge from the training set enters the test set. • This is NOT the case when using standard cross-validation. #11
  • 12. The Problem with Cross-Validation #12 Training/Test tweets are randomly selected. One of the reference fake images Multiple tweets per reference image.
  • 13. Independence of Training-Test Set #13 Training/Test tweets are constraint to correspond to different reference images.
  • 14. Cross-dataset Training-Testing • In the most unfavourable case, the dataset used for training should refer to a different event than the one used for testing. • Simulates real-world scenario of a breaking story, where no prior information is available to news professionals. • Variants: – Different event, same domain – Different event, different domain (very challenging!) #14
  • 15. Evaluation • Datasets – Hurricane Sandy – Boston Marathon bombings • Evaluation of two sets of features (content/user) • Evaluation of different classifier settings #15
  • 16. Dataset – Hurricane Sandy #16 Natural disaster held around the USA from October 22nd to 31st, 2012. Fake images and content, such as sharks inside New York and flooded Statue of Liberty, went viral. Hashtags Hurricane Sandy #hurricaneSandy Hurricane #hurricane Sandy #Sandy
  • 17. Dataset – Boston Marathon Bombings #17 The bombings occurred on 15 April, 2013 during the Boston Marathon when two pressure cooker bombs exploded at 2:49 pm EDT, killing three people and injuring an estimated 264 others. Hashtags Boston Marathon #bostonMarathon Boston bombings #bostonbombings Boston suspect #bostonSuspect manhunt #manhunt watertown #watertown Tsarnaev #Tsarnaev 4chan #4chan Sunil Tripathi #prayForBoston
  • 18. Dataset Statistics #18 Tweets with other image URLs 343939 Tweets with fake images 10758 Tweets with real images 3540 Hurricane Sandy Boston Marathon Tweets with other image URLs 112449 Tweets with fake images 281 Tweets with real images 460
  • 19. Prediction accuracy (1) #19 • 10-fold cross validation results using different classifiers ~80%
  • 20. Prediction accuracy (2) • Results using different training and testing set from the Hurricane Sandy dataset #20 • Results using Hurricane Sandy for training and Boston Marathon for testing ~75% ~58%
  • 21. Sample Results #21 • Real tweet My friend's sister's Trampolene in Long Island. #HurricaneSandy Classified as real • Real tweet 23rd street repost from @wendybarton #hurricanesandy #nyc Classified as fake • Fake tweet Sharks in people's front yard #hurricane #sandy #bringing #sharks #newyork #crazy http://t.co/PVewUIE1 Classified as fake • Fake tweet Statue of Liberty + crushing waves. http://t.co/7F93HuHV #hurricaneparty #sandy Classified as real
  • 22. Conclusion • Challenges – Data Collection: (a) Fake content is often removed (either by user or by OSN admin), (b) API limitations make very difficult the collection after an event takes place – Classifier accuracy: Purely content-based classification can only be of limited use, especially when used in a context of a different event. However, one could imagine that separate classifiers might be built for certain types of incidents, cf. AIDR use for the recent Chile Earthquake • Future Work – Extend features: (a) geographic location of user (wrt. location of incident), (b) time the tweet was posted – Extend dataset: More events, more fake examples #22
  • 23. Thank you! • Resources: Slides: http://www.slideshare.net/sympapadopoulos/computational- verification-challenges-in-social-media Code: https://github.com/socialsensor/computational-verification Dataset: https://github.com/MKLab-ITI/image-verification-corpus Help us make it bigger! • Get in touch: @sympapadopoulos / papadop@iti.gr @CMpoi / boididou@iti.gr #23
  • 24. #24 Sample fake and real images in Sandy • Fake pictures shared on social media • Real pictures shared on social media