Learning to detect Misleading Content on Twitter

Symeon Papadopoulos
Symeon PapadopoulosResearcher at CERTH-ITI, Co-founder at infalia um infalia
Learning to detect Misleading
Content on Twitter
Christina Boididou, Symeon Papadopoulos,
Lazaros Apostolidis, Yiannis Kompatsiaris
Information Technologies Institute, CERTH, Thessaloniki, Greece
ACM International Conference on Multimedia Retrieval
June 6-9, Bucharest, Romania
REAL OR FAKE: THE VERIFICATION PROBLEM
FAKE PHOTO
Photoshopped!
REAL OR FAKE: THE VERIFICATION PROBLEM
REAL PHOTO
Captured in Dublin’s Olympia Theatre
BUT
Mislabeled on social media as showing
the crowd at the Bataclan theatre just
before gunmen began firing.
TYPES OF FAKE
Reposting of real
multimedia
content
Reposting of
synthetic
Digital
tampering
Speculations
fake is any post (tweet) that shares multimedia content that does not faithfully represent the event that it refers to
Verification
Corpus
CL11 CL12 CL1n
CL2nCL22CL21
..
..
Tweet
FRAMEWORK OVERVIEW
Visualization
Tweet-based
features
User-based
features
Tweet-based
features
User-based
features
Predictive
model
Predictive
model
Prediction
Prediction
Label
majority vote
majority vote
Training
Testing
Fusion
FEATURE EXTRACTION
TWEET-BASED
Features related to tweets
• Text-based
• Language-specific
• Twitter-specific
• Link-based
USER-BASED
Features related to users
• User-specific
• Link-based num of
uppercase
characters: 13
num of
words: 24
num of slang
words: 1
Contains
first order
pronoun
num of
retweets: 3
Num of
favorites: 13
num of
mentions: 2
text
readability: 73
FEATURE EXTRACTION
TWEET-BASED
Features related to tweets
• Text-based
• Language-specific
• Twitter-specific
• Link-based
USER-BASED
Features related to users
• User-specific
• Link-based
Verified?
AGREEMENT-BASED RETRAINING
Verification
Corpus
Testing
Set
Tweet-based
features
User-based
features
Predictive
model
Predictive
model
Prediction
Prediction
Predictions
agreed?
Agreed
samples
Predictive
model
Disagreed
samples
Predictions
for agreed
no
yes
Predictions for
disagreed
Training
Testing
VERIFICATION CORPUS
COLLECTION
Set of tweets T collected with a set of keywords K
Tweets contain multimedia content (Image or Video)
GROUND TRUTH
Reputable online resources which debunk
images/videos
Publicly available corpus here:
https://github.com/MKLab-ITI/image-verification-
corpus
193real Images & Videos
6,225real Tweets
220fake Images & Videos
9,596fake Tweets
17events
EXPERIMENTAL STUDY
AIM
Evaluate the fake detection accuracy on samples from new events
Accuracy: 𝑎 =
𝑁 𝑐
𝑁
EXPERIMENTS
Kind of event-based cross-validation
For each event Ei -> training: 16 remaining events, testing: Ei
Additional split proposed on MediaEval task [1]
Random Forest of 100 trees
[1] Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis
Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
Baseline Features Total Features
Effect of bagging across the models and the feature groups
Tweet-based model Tweet-based model (bagging)
User-based model User-based model (bagging)
Baseline Features
Proposed in our previous work
Total Features
Baseline Features +
Newly proposed ones
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Average
Agreement levels and agreed accuracy across the trials
Agreement percentage Agreed accuracy
EXPERIMENTAL STUDY
50
55
60
65
70
75
80
85
90
95
100
Average values
50
55
60
65
70
75
80
85
90
95
100
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18
Agreement levels and agreed, disagreed, overall accuracy
across the trials
Agreed accuracy Disagreed accuracy Overall accuracy
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
English Spanish No language Dutch French
Accuracy for most frequent languages Samples distribution per language
English
Spanish
No language
Dutch
French
COMPARISON WITH OTHER METHODS
METHOD F1-SCORE
MEDIAEVAL
2015
UoS-ITI 0.830
MCG-ICT 0.942
CERTH-UNITN 0.911
MEDIAEVAL
2016
Linkmedia 0.8246
MMLAB@DISI 0.8283
MCG-ICT 0.6761
VMU 0.9116
Proposed 0.934
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MCG-ICT (2015) method:
• Approach tailored to the given MediaEval dataset
• Preprocessing step that first groups tweets by their multimedia content
• Difficult to apply in realistic setting
TWEET VERIFICATION ASSISTANT
ABOUT
Visualize the verification result
Present list of extracted features
and their values
Compare values in comparison to
the ones from the verification
corpus
HOW TO USE
Provide URL or tweet ID
Inspect the features and the
verification result (fake/real)
Find the Tweet Verification Assistant here: http://reveal-mklab.iti.gr/reveal/fake/
TWEET VERIFICATION ASSISTANT: EXAMPLE
CHALLENGES AND FUTURE WORK
CHALLENGES
Making the tool usable and easy to understand by non-computer scientists
• Interpretation of Machine Learning outputs is challenging
• Difficult to create an application that journalists could rely on and trust
FUTURE WORK
Test the Verification Assistant usefulness when used by journalists/news editors
Extend the framework to other social media
Leverage method output for other verification problems [1]
[1] Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos and Yiannis Kompatsiaris. Web Video
Verification using Contextual Cues
Thank you!
Get in touch:
• Christina Boididou: christina.mpoid@gmail.com / @CMpoi
• Symeon Papadopoulos: papadop@iti.gr / @sympap
• Lazaros Apostolidis: laaposto@iti.gr
• Verification Corpus: https://github.com/MKLab-ITI/image-verification-corpus
• Tweet Verification Assistant: http://reveal-mklab.iti.gr/reveal/fake/
With the support of:
1 von 19

Recomendados

Com score tubemogul_online_video von
Com score tubemogul_online_videoCom score tubemogul_online_video
Com score tubemogul_online_videoadtechanz
470 views25 Folien
BAQMaR - Conference Evening von
BAQMaR - Conference EveningBAQMaR - Conference Evening
BAQMaR - Conference EveningBAQMaR
814 views198 Folien
Social TV in Australia von
Social TV in AustraliaSocial TV in Australia
Social TV in AustraliaCam Parker
552 views22 Folien
Def COMMIT Demoboekje V2 LR von
Def COMMIT Demoboekje V2 LRDef COMMIT Demoboekje V2 LR
Def COMMIT Demoboekje V2 LRMieke van den Berg
560 views67 Folien
"Extracting Attributed Verification and Debunking Reports from Social Media: ... von
"Extracting Attributed Verification and Debunking Reports from Social Media: ..."Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ...REVEAL - Social Media Verification
608 views11 Folien
Digiday Hot Topic Advanced TV | CBS von
Digiday Hot Topic Advanced TV | CBSDigiday Hot Topic Advanced TV | CBS
Digiday Hot Topic Advanced TV | CBSDigiday
241 views26 Folien

Más contenido relacionado

Similar a Learning to detect Misleading Content on Twitter

Presentation of the InVID tool for social media verification von
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationInVID Project
8.8K views24 Folien
Extracting evidence from unstructured data von
Extracting evidence from unstructured dataExtracting evidence from unstructured data
Extracting evidence from unstructured dataEFSA EU
883 views19 Folien
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013 von
Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013Caveon Test Security
810 views45 Folien
201409 Online Tuesday - Jeroen Elfferich von
201409 Online Tuesday - Jeroen Elfferich201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen ElfferichJeroen Elfferich
2.8K views22 Folien
The Unfinished a11y agenda: Closing the Loop von
The Unfinished a11y agenda:  Closing the LoopThe Unfinished a11y agenda:  Closing the Loop
The Unfinished a11y agenda: Closing the LoopMike Paciello
833 views25 Folien
Connectivity in the Workplace von
Connectivity in the Workplace Connectivity in the Workplace
Connectivity in the Workplace TPGmarketing
788 views80 Folien

Similar a Learning to detect Misleading Content on Twitter(17)

Presentation of the InVID tool for social media verification von InVID Project
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verification
InVID Project8.8K views
Extracting evidence from unstructured data von EFSA EU
Extracting evidence from unstructured dataExtracting evidence from unstructured data
Extracting evidence from unstructured data
EFSA EU883 views
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013 von Caveon Test Security
Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013
201409 Online Tuesday - Jeroen Elfferich von Jeroen Elfferich
201409 Online Tuesday - Jeroen Elfferich201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen Elfferich
Jeroen Elfferich2.8K views
The Unfinished a11y agenda: Closing the Loop von Mike Paciello
The Unfinished a11y agenda:  Closing the LoopThe Unfinished a11y agenda:  Closing the Loop
The Unfinished a11y agenda: Closing the Loop
Mike Paciello833 views
Connectivity in the Workplace von TPGmarketing
Connectivity in the Workplace Connectivity in the Workplace
Connectivity in the Workplace
TPGmarketing788 views
Nielsen´s Total Audience Report von Jonathan Blum
Nielsen´s Total Audience ReportNielsen´s Total Audience Report
Nielsen´s Total Audience Report
Jonathan Blum1.2K views
How technology has changed our lives von Tracy Robinson
How technology has changed our livesHow technology has changed our lives
How technology has changed our lives
Tracy Robinson2.4K views
How telemetry can be your best friend von Matteo Emili
How telemetry can be your best friendHow telemetry can be your best friend
How telemetry can be your best friend
Matteo Emili181 views
The Tipping Point von Wei Li
The Tipping PointThe Tipping Point
The Tipping Point
Wei Li2.7K views
Aggregating and Analyzing the Context of Social Media Content von Symeon Papadopoulos
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
Symeon Papadopoulos5.9K views
Capitalizing on OTT Breakfast Forum-Heavy Reading for posting von Verimatrix
Capitalizing on OTT Breakfast Forum-Heavy Reading for postingCapitalizing on OTT Breakfast Forum-Heavy Reading for posting
Capitalizing on OTT Breakfast Forum-Heavy Reading for posting
Verimatrix826 views
Realeyes and Mediacom at IieX 2016 von Realeyes
Realeyes and Mediacom at IieX 2016Realeyes and Mediacom at IieX 2016
Realeyes and Mediacom at IieX 2016
Realeyes714 views
Wave 9 – The Meaning of Moments von Liz Haas
Wave 9 – The Meaning of MomentsWave 9 – The Meaning of Moments
Wave 9 – The Meaning of Moments
Liz Haas1.2K views
Comparative survey on technology transfer the cases of the d'annunzio and lud... von Fabiano Madonna
Comparative survey on technology transfer the cases of the d'annunzio and lud...Comparative survey on technology transfer the cases of the d'annunzio and lud...
Comparative survey on technology transfer the cases of the d'annunzio and lud...
Fabiano Madonna137 views

Más de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... von
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
914 views29 Folien
Deepfakes: An Emerging Internet Threat and their Detection von
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
1.5K views50 Folien
Knowledge-based Fusion for Image Tampering Localization von
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
133 views24 Folien
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... von
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
168 views19 Folien
COVID-19 Infodemic vs Contact Tracing von
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
205 views11 Folien
Similarity-based retrieval of multimedia content von
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
815 views61 Folien

Más de Symeon Papadopoulos(20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno... von Symeon Papadopoulos
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
Deepfakes: An Emerging Internet Threat and their Detection von Symeon Papadopoulos
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
Symeon Papadopoulos1.5K views
Knowledge-based Fusion for Image Tampering Localization von Symeon Papadopoulos
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
Deepfake Detection: The Importance of Training Data Preprocessing and Practic... von Symeon Papadopoulos
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers von Symeon Papadopoulos
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Placing Images with Refined Language Models and Similarity Search with PCA-re... von Symeon Papadopoulos
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Perceived versus Actual Predictability of Personal Information in Social Netw... von Symeon Papadopoulos
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Web and Social Media Image Forensics for News Professionals von Symeon Papadopoulos
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
Symeon Papadopoulos1.2K views
Predicting News Popularity by Mining Online Discussions von Symeon Papadopoulos
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
Symeon Papadopoulos1.2K views

Último

Social Media Marketing Strategies you should try. von
Social Media Marketing Strategies  you should try.Social Media Marketing Strategies  you should try.
Social Media Marketing Strategies you should try.JayPanchal149194
5 views18 Folien
Are Negative Yelp Reviews Ruining Your Business Solutions Inside von
Are Negative Yelp Reviews Ruining Your Business Solutions InsideAre Negative Yelp Reviews Ruining Your Business Solutions Inside
Are Negative Yelp Reviews Ruining Your Business Solutions InsideOutreach Digital Marketing
9 views22 Folien
SOCO 12.pdf von
SOCO 12.pdfSOCO 12.pdf
SOCO 12.pdfSocioCosmos
9 views1 Folie
cyberbullying among teenagers .pdf von
cyberbullying among teenagers .pdfcyberbullying among teenagers .pdf
cyberbullying among teenagers .pdfawatefalshehhi2
5 views14 Folien
digital marketing von
digital marketing digital marketing
digital marketing mdZafar18
10 views1 Folie
Jack the Drawer: Journey to the West Chapters 1-17 **unedited** von
Jack the Drawer: Journey to the West Chapters 1-17 **unedited** Jack the Drawer: Journey to the West Chapters 1-17 **unedited**
Jack the Drawer: Journey to the West Chapters 1-17 **unedited** freetop498
13 views22 Folien

Último(9)

Social Media Marketing Strategies you should try. von JayPanchal149194
Social Media Marketing Strategies  you should try.Social Media Marketing Strategies  you should try.
Social Media Marketing Strategies you should try.
digital marketing von mdZafar18
digital marketing digital marketing
digital marketing
mdZafar1810 views
Jack the Drawer: Journey to the West Chapters 1-17 **unedited** von freetop498
Jack the Drawer: Journey to the West Chapters 1-17 **unedited** Jack the Drawer: Journey to the West Chapters 1-17 **unedited**
Jack the Drawer: Journey to the West Chapters 1-17 **unedited**
freetop49813 views
Trails Carolina Death Unraveling a Troubled History of Allegations.pdf von Azura Everhart
Trails Carolina Death Unraveling a Troubled History of Allegations.pdfTrails Carolina Death Unraveling a Troubled History of Allegations.pdf
Trails Carolina Death Unraveling a Troubled History of Allegations.pdf
Azura Everhart6 views
Optimizing Meeting Room Efficiency__A Comprehensive Guide.pptx von Harriet Davis
Optimizing Meeting Room Efficiency__A Comprehensive Guide.pptxOptimizing Meeting Room Efficiency__A Comprehensive Guide.pptx
Optimizing Meeting Room Efficiency__A Comprehensive Guide.pptx
Harriet Davis6 views

Learning to detect Misleading Content on Twitter

  • 1. Learning to detect Misleading Content on Twitter Christina Boididou, Symeon Papadopoulos, Lazaros Apostolidis, Yiannis Kompatsiaris Information Technologies Institute, CERTH, Thessaloniki, Greece ACM International Conference on Multimedia Retrieval June 6-9, Bucharest, Romania
  • 2. REAL OR FAKE: THE VERIFICATION PROBLEM FAKE PHOTO Photoshopped!
  • 3. REAL OR FAKE: THE VERIFICATION PROBLEM REAL PHOTO Captured in Dublin’s Olympia Theatre BUT Mislabeled on social media as showing the crowd at the Bataclan theatre just before gunmen began firing.
  • 4. TYPES OF FAKE Reposting of real multimedia content Reposting of synthetic Digital tampering Speculations fake is any post (tweet) that shares multimedia content that does not faithfully represent the event that it refers to
  • 5. Verification Corpus CL11 CL12 CL1n CL2nCL22CL21 .. .. Tweet FRAMEWORK OVERVIEW Visualization Tweet-based features User-based features Tweet-based features User-based features Predictive model Predictive model Prediction Prediction Label majority vote majority vote Training Testing Fusion
  • 6. FEATURE EXTRACTION TWEET-BASED Features related to tweets • Text-based • Language-specific • Twitter-specific • Link-based USER-BASED Features related to users • User-specific • Link-based num of uppercase characters: 13 num of words: 24 num of slang words: 1 Contains first order pronoun num of retweets: 3 Num of favorites: 13 num of mentions: 2 text readability: 73
  • 7. FEATURE EXTRACTION TWEET-BASED Features related to tweets • Text-based • Language-specific • Twitter-specific • Link-based USER-BASED Features related to users • User-specific • Link-based Verified?
  • 9. VERIFICATION CORPUS COLLECTION Set of tweets T collected with a set of keywords K Tweets contain multimedia content (Image or Video) GROUND TRUTH Reputable online resources which debunk images/videos Publicly available corpus here: https://github.com/MKLab-ITI/image-verification- corpus 193real Images & Videos 6,225real Tweets 220fake Images & Videos 9,596fake Tweets 17events
  • 10. EXPERIMENTAL STUDY AIM Evaluate the fake detection accuracy on samples from new events Accuracy: 𝑎 = 𝑁 𝑐 𝑁 EXPERIMENTS Kind of event-based cross-validation For each event Ei -> training: 16 remaining events, testing: Ei Additional split proposed on MediaEval task [1] Random Forest of 100 trees [1] Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
  • 11. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 Baseline Features Total Features Effect of bagging across the models and the feature groups Tweet-based model Tweet-based model (bagging) User-based model User-based model (bagging) Baseline Features Proposed in our previous work Total Features Baseline Features + Newly proposed ones
  • 12. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Average Agreement levels and agreed accuracy across the trials Agreement percentage Agreed accuracy
  • 13. EXPERIMENTAL STUDY 50 55 60 65 70 75 80 85 90 95 100 Average values 50 55 60 65 70 75 80 85 90 95 100 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Agreement levels and agreed, disagreed, overall accuracy across the trials Agreed accuracy Disagreed accuracy Overall accuracy
  • 14. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 English Spanish No language Dutch French Accuracy for most frequent languages Samples distribution per language English Spanish No language Dutch French
  • 15. COMPARISON WITH OTHER METHODS METHOD F1-SCORE MEDIAEVAL 2015 UoS-ITI 0.830 MCG-ICT 0.942 CERTH-UNITN 0.911 MEDIAEVAL 2016 Linkmedia 0.8246 MMLAB@DISI 0.8283 MCG-ICT 0.6761 VMU 0.9116 Proposed 0.934 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MCG-ICT (2015) method: • Approach tailored to the given MediaEval dataset • Preprocessing step that first groups tweets by their multimedia content • Difficult to apply in realistic setting
  • 16. TWEET VERIFICATION ASSISTANT ABOUT Visualize the verification result Present list of extracted features and their values Compare values in comparison to the ones from the verification corpus HOW TO USE Provide URL or tweet ID Inspect the features and the verification result (fake/real) Find the Tweet Verification Assistant here: http://reveal-mklab.iti.gr/reveal/fake/
  • 18. CHALLENGES AND FUTURE WORK CHALLENGES Making the tool usable and easy to understand by non-computer scientists • Interpretation of Machine Learning outputs is challenging • Difficult to create an application that journalists could rely on and trust FUTURE WORK Test the Verification Assistant usefulness when used by journalists/news editors Extend the framework to other social media Leverage method output for other verification problems [1] [1] Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos and Yiannis Kompatsiaris. Web Video Verification using Contextual Cues
  • 19. Thank you! Get in touch: • Christina Boididou: christina.mpoid@gmail.com / @CMpoi • Symeon Papadopoulos: papadop@iti.gr / @sympap • Lazaros Apostolidis: laaposto@iti.gr • Verification Corpus: https://github.com/MKLab-ITI/image-verification-corpus • Tweet Verification Assistant: http://reveal-mklab.iti.gr/reveal/fake/ With the support of:

Hinweis der Redaktion

  1. Recent years, we have seen a tremendous increase in the use of social media platforms as means of sharing content. The simplicity of sharing has led to large volumes of news content reaching huge numbers of readers in short time. Especially multimedia content can easily become viral as easily consumed and carrying entertainment value. Given the speed of the news and the competition of journalists to publish first, the verification of the content is neglected or carried out in superficial manner. This leads to the online appearance of misleading multimedia content, or for the sake of brevity fake content. For example, let’s look at this picture: Can you make a guess? Is it real or fake? Even though Sharuman could well attend this meeting, this image was ultimately found to be photoshopped.
  2. Now, let’s have a look at this image. What is your guess now? Here we deal with an other type of fake photos. It is a real photo but was mislabeled on social media as showing the crowd at the Bataclan theatre just before gunmen started firing.
  3. So, as misleading or fake we consider any twitter post that shares multimedia content that does not faithfully represent the event that it refers to. This could include Reposting of real multimedia content, Reposting of synthetic/artworks, Digital tampering/photoshop or Speculations.
  4. In order to deal with the verification problem, we present a robust approach for detecting in real time whether a tweet that shares a multimedia item is fake or real. The proposed framework relies on two independent classification models built on the training data (verification corpus) using different sets of features, tweet-based and user based features. A bagging technique is used when building the models. We use n subsets of tweets including equal number of samples for each class leading to the creation of n classifiers. The final prediction is the majority vote among the n predictions. At prediction time, an agreement based retraining technique is employed which combines the outputs of the two models. The outome is then visualized to the users, using information of the labelled verification corpus.
  5. The selection of our features was carried out following a thorough study of the way journalists verify content on the web. We have defined two sets of features, the tweet-based extracted from the tweet itself. Assess the trust of the website
  6. A key novelty in our approach is the ABR technique (fusion block). We combine the outputs as follows: for each sample, we compare the predictions and depending on the agreement we divide the test set in agreed and disagreed samples. The agreed samples are assigned the agreed label (fake or real) assuming that it is correct with high likelihood and they consistute the predictions for the agreed samples. Then, we use a retraining technique. First we select the most effective of the independent classifiers based on their performance on the training set with cross validation. Then we use the agreed samples together with the initial training samples of the VC to predict labels for the disagreed samples. The goal is to adapt the initial model to the characteristics of the new unseen event.
  7. Our VC is a publicly available dataset with fake and real tweets. It consists to tweets related to 17 events compromising in total 220 cases blah blah. The tweets were collected using a set of keywords and they were debunked using reputable online resources. Only tweets with a multimedia item of these ones were included in the dataset and several manual steps were necessary to come up with those.
  8. The aim of the conducted experiments was to evaluate the fake detection accuracy on samples from new events. We consider this very important aspect of a verification framework as the nature of fake tweets may vary across different events. The employed scheme can be thought as an event-based cross-validation
  9. We first assess the contribution of the features on the method’s accuracy. We compare the performance using the baseline and the full set of features. The baseline features are just a subset of the features that we used on our previous work. Then, we assess the bagging we applied in our method. We can see that the full set of features and the bagging in both the tweet and user based features model led to considerably improved accuracy.
  10. In this graph, we present the agreement level and the accuracy of the classifiers on the agreed set. We note that the higher the agreed level the higher the achieved accuracy. The last column is the average percentage of the classifiers across the different trials.
  11. This bar chart shows the agreed accuracy, the disagreed accuracy and finally the overall across the trials. On the right chart, we can see the average accuracy levels of them with green orange and grey respectively. The last columns, with the blue color, are the performance of each of the models when tested individually on the test set. One can see a clear improvement (about 5%) compared to the overall accuracy.
  12. We also assessed the model on tweets written in different languages. Five most used languages in the corpus. No lang -> not detected or not much text Accuracy is stable independent of the language
  13. We also compare our model with methods sybmitted to Mediaeval 2015 verification task against their best run. Our proposed method achieves the second best performance reaching almost equals to the best run.
  14. One of the biggest challenges we are facing is making the tool usable and easy to understand by non computer scientists. Our experience with media experts from Deutsche Welle & AFP (Agence France Presse) shows that the …