SlideShare a Scribd company logo
1 of 65
Social media mining for sensing and responding to
real-world trends and events
Dr. Yiannis Kompatsiaris, ikom@iti.gr
Multimedia, Knowledge and Social Media Analytics Lab, Head
CERTH-ITI
CLEF 2020
Thessaloniki, Greece
September 2020
CLEF 2020 Social Media Mining
Overview
• Introduction
– Motivation – Challenges
– Conceptual architecture
• Fighting Disinformation
– Tweet Credibility Classification
– Image Verification Assistant
• Crisis Management
– Location estimation and classification
• Fighting abuse
– Multiple identities detection
• Contributions – Support - Conclusions
2
CLEF 2020 Social Media Mining3
Pope Francis
Pope Benedict
2007: iPhone release
2008: Android release
2010: iPad release
http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
CLEF 2020 Social Media Mining
Hillary Clinton's Epic Group Selfie
CLEF 2020 Social Media Mining
User
Profile
Tags
Social Media aspects
CLEF 2020 Social Media Mining
Multi-modal graphs
#
CLEF 2020 Social Media Mining7
rise of the
networks
(hubs,
communiti
centrality,
etc)
Intelligence Processi
Unit (IPU) – graphco
CLEF 2020 Social Media Mining
Social Networks as Graphs
CLEF 2020 Social Media Mining9
Social Networks as Real-Life Sensors
• Social Networks is a data source with an
extremely dynamic nature that reflects
events and the evolution of community
focus (user’s interests)
• Huge smartphones and mobile devices
penetration provides real-time and
location-based user feedback
• Transform individually rare but
collectively frequent media to meaningful
topics, events, points of interest, emotional
states and social connections
• Present in an efficient way for a variety of
applications (news, security (cyber and
physical), marketing, science, health)
CLEF 2020 Social Media Mining10
Real-life Social Networks
• Social networks have emergent
properties. Emergent properties
are new attributes of a whole
that arise from the interaction
and interconnection of the parts
• Emotions, Health, Sexual
relationships depend on our
connections (e.g. number of
them) and on our position -
structure in the social graph
• Central – Hub
• Outlier
• Transitivity (connections between
friends)
CLEF 2020 Social Media Mining11
CLEF 2020 Social Media Mining
Example – twitter and earthquakes
12
CLEF 2020 Social Media Mining13
API Wrapper
Website Wrapper
Scheduler
CRAWLING
Visual Indexing
Near-duplicates
Text Indexing
INDEXING
Media Fetcher
SNA
Sentiment - Influence
Trends - Topics
MINING
Model Building
Concepts
Relevance
Diversity
Popularity
RANKING
Veracity
Crawling Specs
Sources
Interaction
Responsiveness
Aggregation
VISUALIZATION
Aesthetics
Conceptual Architecture
ANALYSIS
PRESENTATION
CLEF 2020 Social Media Mining14
Challenges – Content (Indexing - Mining)
•Multi-modality: e.g. image + tags, video, audio
•Rich social context: spatio-temporal, social connections,
relations and social graph
•Specific messages: short, conversations, errors, no context
•Inconsistent quality: noise, spam, fake, propaganda
•Huge volume: Massively produced and disseminated
•Multi-source: may be generated by different applications and
user communities
•Dynamic: Fast updates, real-time
CLEF 2020 Social Media Mining
Policy – Licensing – Legal challenges
• Fragmented access to data
– Separate wrappers/APIs for each source (Twitter, Facebook, etc.)
– Different data collection/crawling policies
• Limitations imposed by API providers (“Walled Gardens”)
• Full access to data impossible or extremely expensive (e.g. see data
licensing plans for GNIP and DataSift)
• Non-transparent data access practices (e.g. access is provided to an
organization/person if they have a contact in Twitter)
• Constant change of model and ToS of social APIs
– No backwards compatibility, additional development costs
• Ephemeral nature of content
• Social search results often lead to removed content  inconsistent
and unreliable referencing
• User Privacy & Purpose of use
• Fuzzy regulatory framework regarding mining user-contributed data
15
Fighting Disinformation in Social
Media
CLEF 2020 Social Media Mining
The Rise of Fake News
17
https://trends.google.com/trends/explore?date=all&geo=US&q=fake%20news
US Elections 2016
Volume for query “fake news” over time: A key milestone has
been the US Elections in 2016, which marked the beginning of
large-scale coordinated disinformation campaigns.
CLEF 2020 Social Media Mining
Key Concepts
• Fake news: popular term to refer to the phenomenon of
disinformation, but currently avoided from Academics and the
EC due to the fact that it is often misused by Trump and the
alt-right
• Disinformation: general term that typically refers to
intentional (and often coordinated) efforts to spread
misleading information to the public
• Misinformation: refers to misleading content and information
but not necessarily intentional
• Propaganda: refers to coordinated campaigns aiming to
spread a particular ideology or belief
• Manipulated content: Also known as tampered or doctored.
Refers to multimedia content that has been digitally altered
typically for malicious purposes.
18
CLEF 2020 Social Media Mining
The Diffusion of Fake News
Example cascade
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news
online. Science, 359(6380), 1146-1151.
Number of cascades
Topic frequency
Misleading posts tend to
spread faster and wider
compared to accurate ones.
CLEF 2020 Social Media Mining
The Famous Shark
https://www.snopes.com/photos/animals/puertorico.asp
2005
CLEF 2020 Social Media Mining
CLEF 2020 Social Media Mining
CLEF 2020 Social Media Mining
A bit of “historical” background
2011-2014 2013-2016 2016-2018
Trend detection
Social media search Quality & veracity of
social media
Media forensics
Social media video
verification
Reverse video search
2018-2021
Deepfake detection
Deep learning-assisted
forensics and analysis
23
EU funded projects
CLEF 2020 Social Media Mining
Overview of Media Verification Resources
Tools/Approaches
• Social media verification
– Tweet Credibility Classification
– Context Analysis and Aggregation
• Multimedia forensics
– Image Verification Assistant
– Video forensics
• Reverse-image and video search
Datasets
• Tweet verification Corpus
• Fake Video Corpus
• FIVR-200K
24
CLEF 2020 Social Media Mining
Tweet Credibility Classification - Features
Credibility cues (aka features)
CLEF 2020 Social Media Mining
Tweet Credibility Classification - Model
CLEF 2020 Social Media Mining
Tweet Credibility Classification - Evaluation
92.5% accuracy in identifying misleading posts
88-98% accuracy depending on language
(major languages tested: en, fr, es, nl)
New features and agreement-based retraining led to
significant improvements! One of the top performing
methods in the MediaEval VMU 2015 & 2016 tasks!
Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O.,
& Kompatsiaris, Y. (2018). Detection and visualization of misleading content on
Twitter. International Journal of Multimedia Information Retrieval, 7(1), 71-86.
CLEF 2020 Social Media Mining
Image Verification Assistant – Intro (1/2)
copy-move splicing
in-painting retouching
Types of Multimedia Manipulation
CLEF 2020 Social Media Mining
Image Verification Assistant – Intro (2/2)
Lens
Optical
filter
CFA pattern
Real-world
scene
R
G
G
B
Imaging
sensor
(e.g. CCD)
CFA
interpolat.
In-camera
SW
processing
In-camera
JPEG
compress.
DIGITAL CAMERA
Digital image
Out of camera SW
processing
Piva, A. (2013). An overview on image forensics. ISRN Signal Processing, 2013.
Image Capturing & Tampering Process
CLEF 2020 Social Media Mining
Image Verification Assistant – Forensics
Assume that when a “foreign” object is inserted into an
image, some traces of it will be possible to detect.
• Noise-based methods try to locate areas where the
noise patterns are different compared to the rest.
• JPEG compression analysis methods try to locate
areas where some JPEG-specific property is different,
e.g. 8x8 grid, DCT quantization, etc.
• Machine learning-based methods try to locate areas
that look like areas of tampered images that were
used to “train” them.
MeVer – Media Verification (mever.iti.gr) 30
CLEF 2020 Social Media Mining
Image Verification Assistant – Forensics
Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2015). Detecting image splicing in the wild (web).
In International Conference on Multimedia & Expo Workshops (ICMEW), 2015 (pp. 1-6). IEEE
The Challenge of Image
Forensics on the (Wild) Web!
CLEF 2020 Social Media Mining
Image Verification Assistant - UI
http://reveal-mklab.iti.gr/
Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., Bouwmeester, R., & Spangenberg, J. (2016, April). Web
and Social Media Image Forensics for News Professionals. In SMN@ ICWSM.
CLEF 2020 Social Media Mining
Image Verification Assistant - Comparison
33MeVer – Media Verification (mever.iti.gr)
FotoForensics1 Forensically2 Ghiro3 Ours
ELA X X X X
Ghost X
DW Noise X
Median Noise X X
Block Artifact X
Double Quantization X
Deep Learning-based X
Copy-move X* X
Thumbnail X X
Metadata X X X X
Geotagging X X X X
Reverse search X
*Forensically implements a very simple block-matching algorithm with low robustness
1 http://fotoforensics.com
2 http://29a.ch/photo-forensics/
3 http://www.imageforensic.org/
CLEF 2020 Social Media Mining
The Fake Video Corpus - Overview
• 200 fake and 188 real newsworthy videos
• 2206 fake and 1209 real near-duplicates
• 388 cascades of near-duplicate videos
https://mklab.iti.gr/results/fake-video-corpus/
CLEF 2020 Social Media Mining
The Fake Video Corpus - Analysis
• Fake videos keep
reappearing years
later
• Real videos tend
to be reproduced
mostly during the
first month
Papadopoulou, O., Zampoglou, M., Papadopoulos, S., & Kompatsiaris, I. (2019). A corpus
of debunked and verified user-generated videos. Online information review.
CLEF 2020 Social Media Mining
The Rise of Deepfakes
• Synthetic media become increasingly realistic mainly
using Generative Adversarial Networks
• We seem to get into an arms race on disinformation!
• Novel solutions beyond supervised learning models
will be needed!
Social Data mining in Crisis
Management
CLEF 2020 Social Media Mining
Approach
• Thousands of tweets are generated during a crisis
event in a specific location
38
256% rise in Italian tweets about floods
on Thu, 01 November 2018 16:12 in Veneto
CLEF 2020 Social Media Mining
Posts during emergencies
39
CLEF 2020 Social Media Mining
Problem – Challenges – Existing Limitations
• Civil protection agencies and local authorities require
timely access to citizen observations during a crisis event
to estimate the
– Location of a crisis event (e.g. floods, fires, etc.)
– Relevance of each tweet
– Concepts of the image (e.g. people in danger)
• Challenges and existing limitations include:
– Management of large streams of data for event
detection
– Disambiguation from multimodal content (text/image)
– Limited location information (only as mention in text)
40
CLEF 2020 Social Media Mining
Social Media Data Mining
• Focusing on Twitter posts, collected with Twitter Streaming API
https://developer.twitter.com/en/docs/tweets/filter-realtime/overview
• Various analysis techniques to obtain further knowledge on the tweets
• The complete flow:
new
tweet
Search terms:
• Keywords
• Accounts
• Bounding
Boxes
Keys & Tokens
Twitter
Streaming API
Client
receives
tweets
Fake tweets
detection
Text
classification
Image
classification
Get tweet in
JSON format &
find matching
use case
Nudity
detection
Tweets
localisation
Concept
extraction
tweet
has
image
Inputs:
CLEF 2020 Social Media Mining
Datasets
• Benchmark datasets (e.g. MediaEval tasks)
• Collected datasets about crisis events
42
10 m. about
fires in Spain 75 k.
about
floods in
Italy
74 k.
about
heatwave
in Greece
42 k.
about
snow in
Finland
CLEF 2020 Social Media Mining
• Results of the NER task for English
Dataset (CoNLL2003) Precision Recall F1-
score
Our system (ELMo
embeddings)
91.63 93.01 92.32
Best-scoring
CoNLL2003 system:
Florian et al., 2003
88.99 88.54 88.76
Baevski, A. et al. 2019 (not
reported)
(not
reporte
d)
93.5
• Localisation steps after Named
Entity Recognition (NER) has been
performed on available tweets
Dataset (EVALITA2009) Precisio
n
Recall F1-
score
Our system (GloVe
embeddings)
75.49 75.60 75.37
Best-scoring shared task
system:
FBK_ZanoliPianta
84.07 80.02 82.00
Nguyen and Moschitti,
2012
85.99 82.73 84.33
• Results of the NER task for Italian
Estimation of the location mentioned in a tweet
CLEF 2020 Social Media Mining
Concept Detection in Social Media Images
• Extracts high-level concepts from visual low-level information
• Fine-tune pre-trained 22-layer GoogleNet DCNN network to recognize the 345
TRECVID INS concepts and thresholding to keep concepts with higher probability
• Concept examples: animal, boat_ship, clouds, waterscape_waterfront
CLEF 2020 Social Media Mining
CERTH-ITI participation in MediaEval 2018
First in the social media image classification (Average F1-score)
https://www.youtube.com/watch?v=yq1nIPc6dWw&list=PLOPR
p1vNOG9ahE5viJmF6Gx8XDk8hG9MP&index=2&t=0s
CLEF 2020 Social Media Mining
Demo
• Social media dashboard in EOPEN project:
– https://eopen.spaceapplications.com/dashboard/
– Dashboards  Social Media
46
Fighting abuse, extremism, and
terrorism in Social Media
CLEF 2020 Social Media Mining
Multiple identities detection in social media:
sockpuppets, doppelgängers, and more
• Users often hold several accounts in their effort to multiply the
spread of their thoughts, ideas, and viewpoints
• Illegal & abusive activities: creation of multiple accounts to bypass
the combating measures enforced by social media platforms
48
Figure: Kumar et al. “An Army of Me: Sockpuppets in Online Discussion Communities” WWW 2017
User Identity Linkage
Detect accounts likely to belong to
the same natural person
(“linked accounts”)
CLEF 2020 Social Media Mining
Approach
Feature extraction
• Profile (P)
• Activity (A)
• Linguistic (L)
• Network (N)
Data
Collection
Linked
accounts
detection
User Modeling
• Individual
representation
• Joint
representation
Classification
• Probabilistic
• Tree-based
• Ensemble
• Neural networks
CLEF 2020 Social Media Mining
Feature Extraction
• Profile: e.g., demographic information, biography, avatar
• Activity: e.g., number of posts, lists, shares, favorited tweets, mentions,
hashtags, posts’ inter-arrival time
• Linguistic: i.e., character-based, word-based, sentence-based, dictionary-
based, syntactic-based
• Network: e.g., # followers, # friends, authority, hub, # triangles, eigenvector,
PageRank, clustering coefficient
Feature extraction
• Profile (P)
• Activity (A)
• Linguistic (L)
• Network (N)
Data
Collection
Linked
accounts
detection
User Modeling
• Individual
representation
• Joint
representation
Classification
• Probabilistic
• Tree-based
• Ensemble
• Neural networks
CLEF 2020 Social Media Mining
• 𝑢𝑖: 𝑉𝑆 𝑢 𝑖
= < 𝑓𝑆 𝑖1
, 𝑓𝑆 𝑖2
, … , 𝑓𝑆 𝑖 𝑗
, … , 𝑓𝑆 𝑖 𝑛
>,
Feature sets: S = {P, A, L, N}
User Modeling: Individual representation
𝑗𝑡ℎ feature of
category S for 𝑢𝑖
Total number of features
for category S
Example: 𝑉𝑁 𝑣 𝑖
= < 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑡𝑦𝑖, ℎ𝑢𝑏𝑖, … , 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘𝑖 >
Feature extraction
• Profile (P)
• Activity (A)
• Linguistic (L)
• Network (N)
Data
Collection
Linked
accounts
detection
User Modeling
• Individual
representation
• Joint
representation
Classification
• Probabilistic
• Tree-based
• Ensemble
• Neural networks
CLEF 2020 Social Media Mining
User Modeling: Joint representation
1. abs: absolute difference of feature vectors of 𝑢𝑖, 𝑢𝑗
2. sim: similarity of the per-category feature vector (Cosine similarity,
Euclidean distance, Manhattan distance)
3. Similarity of the content posted by users 𝑢𝑖, 𝑢𝑗
• edits: edit distance - Levenshtein distance
• sem: semantic similarity - vector space model approach (word
embeddings)
Feature extraction
• Profile (P)
• Activity (A)
• Linguistic (L)
• Network (N)
Data
Collection
Linked
accounts
detection
User Modeling
• Individual
representation
• Joint
representation
Classification
• Probabilistic
• Tree-based
• Ensemble
• Neural networks
CLEF 2020 Social Media Mining
Classification
• Probabilistic: Naïve Bayes, BayesNet
• Tree-based: J48, LADTree, LMT
• Ensemble: Random Forest (RF), AdaBoost and voting
ensembles
• Deep Neural Network
• Recurrent Neural Network (RNN)
• Combined Network: Text classification network +
Metadata network
Feature extraction
• Profile (P)
• Activity (A)
• Linguistic (L)
• Network (N)
Data
Collection
Linked
accounts
detection
User Modeling
• Individual
representation
• Joint
representation
Classification
• Probabilistic
• Tree-based
• Ensemble
• Neural networks
CLEF 2020 Social Media Mining
Comparison to other approaches
[1] Fredrik Johansson, Lisa Kaati, and Amendra Shrestha (2013) Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference
on advances in social networks analysis and mining.
[2] Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in
Social Media. ACL.
[3] Michail Tsikerdekis and Sherali Zeadally. (2014) Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on
Information Forensics and Security 9, 8 (2014).
[4] Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. (2015) Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015)
[5] Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian (2017) An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th
International Conference on World Wide Web.
[6] Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. (2019) Hi Doppelgänger : Towards Detecting Manipulation in News Comments. In Companion
Proceedings of The 2019 World Wide Web Conference.
[7] Despoina Chatzakou, Juan Soler-Company, Theodora Tsikrika, Leo Wanner, Stefanos Vrochidis, Ioannis Kompatsiaris, (2020) User Identity Linkage in Social Media
Using Linguistic and Social Interaction Features”. In Proceedings of the 2020 ACM on Web Science Conference
Features Classifier
Activity Linguistic Network Traditional
ML
NN
Character Word Sentence Dictionary Syntactic Distribution Segmentation Connection
Johansson et al. [1] X X X X
Solorio et al. [2] X X X X X
Tsikerdekis et al. [3] X X
Johansson et al. [4] X X X X X
Kumar et al. [5] X X X X X X X X X
Pennekamp et al. [6] X X X X X
Ours [7] X X X X X X X X X X X
CLEF 2020 Social Media Mining
Datasets and Ground Truth
Manual creation of the ground truth due to the absence of ground truth that
indicates which user accounts belong to the same person
• Split each account 𝑢𝑖 (its posts) in two distinct accounts: 𝑢𝑖𝑎 and 𝑢𝑖𝑏
• linked accounts: (𝑢𝑖𝑎, 𝑢𝑖𝑏)
• non-linked accounts: (𝑢𝑖𝑎, 𝑢𝑗𝑏), where 𝑖 ≠ 𝑗
• 10% of linked and 90% non-linked accounts
Abusive Dataset
• June to August 2016
• Relevant to Gamergate
controversy
• Abusive-related English
hashtags
• 650K tweets and 312K users
Terrorism Dataset
• February 2017 to June 2018
• Relevant to Jihadist terrorism
• Terrorism-related Arabic
keywords
• 65K tweets and 35K users
CLEF 2020 Social Media Mining
Experimental Methodology: Features Combination
𝑨𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒂𝒃𝒔
𝑳𝒊𝒏𝒈𝒖𝒊𝒔𝒕𝒊𝒄 𝒂𝒃𝒔
𝑵𝒆𝒕𝒘𝒐𝒓𝒌 𝒂𝒃𝒔
𝑨𝒍𝒍 𝒂𝒃𝒔
Baseline
abs: absolute difference
sim: similarity of feature vectors
edits: edit distance (Levenshtein)
sem: semantic similarity
𝑨𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
𝑳𝒊𝒏𝒈𝒖𝒊𝒔𝒕𝒊𝒄 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
𝑵𝒆𝒕𝒘𝒐𝒓𝒌 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
𝑨𝒍𝒍 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
𝑨𝒍𝒍 𝒔𝒊𝒎
𝑨𝒍𝒍 𝒔𝒊𝒎 + 𝑨𝒍𝒍 𝒂𝒃𝒔
𝑨𝒍𝒍 𝒔𝒊𝒎 + 𝑨𝒍𝒍 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
CLEF 2020 Social Media Mining
Experimental Phases
• Phase 1: 10% linked & 90% non-linked
• #𝑙𝑖𝑛𝑘𝑒𝑑 𝑎𝑐𝑜𝑢𝑛𝑡𝑠: 200
• #non-linked accounts: 1,800
• Phase 2: Varying number of linked accounts
• # linked accounts: 200 to 500 with step 100
• # non-linked accounts: 1,800
• Phase 3: Varying number of non-linked accounts
• # linked accounts: 200
• # non-linked accounts: 1,800 to 39,800 with step 1,800
CLEF 2020 Social Media Mining
Results: Abusive dataset (Phase 1)
Features
• Traditional classifiers: Network features perform better (combined or not with the edit & sim
features)
• Neural Network: Linguistic features result to a better performance (combined or not with the edit &
sim features)
Classifiers
• Random Forest achieves the best performance (AUC: 99.50%)
CLEF 2020 Social Media Mining
Results: Abusive dataset (Phases 2 & 3)
Varied linked accounts
Varied non-linked accounts
• From 200 to 300: slight increase (precision, recall, accuracy)
• Stable performance: 99% AUC
• Even with the highest number of non-linked user accounts,
AUC remains at around 87.30%
• Increase of precision & recall when more data are available
• At ~24k non-linked accounts, precision & recall converge
Results obtained by using Random Forest as classifier
CLEF 2020 Social Media Mining
Results: Terrorism dataset (Phase 1)
Features
• J48: Network features perform better compared to the Activity and Linguistic
• Random Forest, BayesNet, Neural Network: Linguistic features result to a better performance compared to the
Activity and Network
• In most cases all feature categories (using the abs) combined with similarity feature vectors result to the best
performance
Classifiers
• Random Forest achieves the best performance (AUC: 99.50%)
CLEF 2020 Social Media Mining
Results: Terrorism dataset (Phases 2 & 3)
• Higher number of linked user accounts
=> higher precision, recall & accuracy
• Stable performance: 99% AUC
• AUC fluctuates from 94% to 99.50%
• Precision & recall fluctuate from 97.1% to 99%
• Stable model even with a quite unbalanced dataset
Varied linked accounts
Varied non-linked accounts
Results obtained by using Random Forest as classifier
CLEF 2020 Social Media Mining
Conclusions
• Social media data useful in many applications
– From confirming existing and known correlations to prediction and decision-
making
• Many challenges exist
– Data availability and representativeness (of society, real-event)
– Coverage, robustness and reproducibility
– Authenticity (threat to democratic society)
– Real-time and scalable approaches
– Fusion of various modalities (Content, social, temporal, location)
• Required contribution from various disciplines
– Content Analytics
– Machine Learning
– Network Analysis
– Psychology – Social Sciences (patterns of presentation, sharing)
– Visualization
• Currently mostly an auxiliary means for real-events assessment and decision-
making, which can generate additional insights
63
CLEF 2020 Social Media Mining
With Contributions from
• Dr. Symeon Papadopoulos
– Social network analysis, social media content mining and multimedia indexing and
retrieval
– http://mklab.iti.gr/people/papadop
– Twitter: @sympap
• Dr. Ilias Gialampoukidis
– Social media mining and classification, topic detection, community and key-player
identification, multimodal fusion and multimedia retrieval
– http://www.researchgate.net/profile/Ilias_Gialampoukidis
• Dr. Theodora Tsikrika
– Web and social media search and mining, multimedia indexing and retrieval, AI-
based multimodal analytics, evaluation
– https://www.iti.gr/iti/people/Theodora_Tsikrika.html
• Dr. Stefanos Vrochidis
– Multimodal data fusion, web and social media mining, multimedia analysis and
retrieval, multimodal analytics
– https://sites.google.com/site/stevrochidis/
64
CLEF 2020 Social Media Mining
Support
Tools and services for Social
Media verification from a
journalistic and enterprise
perspective.
65
Video verification platform
including video forensics, reverse
video search and context
analysis and aggregation.
Social media verification
platform including deepfake
detection and a database of
known fakes.
InterCONnected NEXt-
Generation Immersive IoT
Platform of Crime and
Terrorism DetectiON,
PredictiON, InvestigatiON, and
PreventiON Services
EU funded projects
opEn interOperable
Platform for unified access
and analysis of Earth
observatioN data
Enhancing decision
support and
management services
in extreme weather
climate events
Thank you for your attention!
ikom@iti.gr
http://mklab.iti.gr

More Related Content

What's hot

Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Scott A. Hale
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...UNDP Eurasia
 
B Rampf Session 3
B Rampf Session 3B Rampf Session 3
B Rampf Session 3guesta3ce6f
 
Social media Marketing Presentation by vaibhavjain
Social media Marketing Presentation by vaibhavjainSocial media Marketing Presentation by vaibhavjain
Social media Marketing Presentation by vaibhavjainVaibhav Jain
 
Social Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationSocial Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationAxel Bruns
 
Social media, journalism & climate change in Africa: presentation
Social media, journalism & climate change in Africa: presentationSocial media, journalism & climate change in Africa: presentation
Social media, journalism & climate change in Africa: presentationAgnes Lesage-Possolo
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodssmyrnaios
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social mediaFarida Vis
 
Existence of Social Media in Pandemic Boon or Bane
Existence of Social Media in Pandemic Boon or BaneExistence of Social Media in Pandemic Boon or Bane
Existence of Social Media in Pandemic Boon or Baneijtsrd
 
Social media, Group 1, Chapter 2
Social media, Group 1, Chapter 2Social media, Group 1, Chapter 2
Social media, Group 1, Chapter 2adrianaemoran
 
What is social media
What is social mediaWhat is social media
What is social mediaAili Kerova
 
Twitter turns ten: its use to date in disaster management
Twitter turns ten: its use to date in disaster managementTwitter turns ten: its use to date in disaster management
Twitter turns ten: its use to date in disaster managementNeil Dufty
 
Chapter 3 presentation
Chapter 3 presentation Chapter 3 presentation
Chapter 3 presentation voelkeld
 
Template Twitter Strategy for Government Departments
Template Twitter Strategy for Government DepartmentsTemplate Twitter Strategy for Government Departments
Template Twitter Strategy for Government DepartmentsBreaking news
 
New Media And Usaf
New Media And UsafNew Media And Usaf
New Media And Usafjtmcdc
 
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Liliana Bounegru
 
Social media for government
Social media for governmentSocial media for government
Social media for governmentGohar Khan
 
The spread of misinformation in social media
The spread of misinformation in social mediaThe spread of misinformation in social media
The spread of misinformation in social mediaFilippo Menczer
 

What's hot (20)

Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...Big Tech & Disinformation: What are the main threats and how can journalists ...
Big Tech & Disinformation: What are the main threats and how can journalists ...
 
Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...Making the invisible visible. Managing the digital footprint of development p...
Making the invisible visible. Managing the digital footprint of development p...
 
B Rampf Session 3
B Rampf Session 3B Rampf Session 3
B Rampf Session 3
 
Processing Large Complex Data
Processing Large Complex DataProcessing Large Complex Data
Processing Large Complex Data
 
Social media Marketing Presentation by vaibhavjain
Social media Marketing Presentation by vaibhavjainSocial media Marketing Presentation by vaibhavjain
Social media Marketing Presentation by vaibhavjain
 
Social Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationSocial Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)information
 
Social media, journalism & climate change in Africa: presentation
Social media, journalism & climate change in Africa: presentationSocial media, journalism & climate change in Africa: presentation
Social media, journalism & climate change in Africa: presentation
 
A multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methodsA multifaceted study of online news diversity: issues and methods
A multifaceted study of online news diversity: issues and methods
 
The evolution of research on social media
The evolution of research on social mediaThe evolution of research on social media
The evolution of research on social media
 
Existence of Social Media in Pandemic Boon or Bane
Existence of Social Media in Pandemic Boon or BaneExistence of Social Media in Pandemic Boon or Bane
Existence of Social Media in Pandemic Boon or Bane
 
Social media, Group 1, Chapter 2
Social media, Group 1, Chapter 2Social media, Group 1, Chapter 2
Social media, Group 1, Chapter 2
 
What is social media
What is social mediaWhat is social media
What is social media
 
Twitter turns ten: its use to date in disaster management
Twitter turns ten: its use to date in disaster managementTwitter turns ten: its use to date in disaster management
Twitter turns ten: its use to date in disaster management
 
Chapter 3 presentation
Chapter 3 presentation Chapter 3 presentation
Chapter 3 presentation
 
Journalism 2.0
Journalism 2.0Journalism 2.0
Journalism 2.0
 
Template Twitter Strategy for Government Departments
Template Twitter Strategy for Government DepartmentsTemplate Twitter Strategy for Government Departments
Template Twitter Strategy for Government Departments
 
New Media And Usaf
New Media And UsafNew Media And Usaf
New Media And Usaf
 
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
Fake News, Algorithmic Accountability and the Role of Data Journalism in the ...
 
Social media for government
Social media for governmentSocial media for government
Social media for government
 
The spread of misinformation in social media
The spread of misinformation in social mediaThe spread of misinformation in social media
The spread of misinformation in social media
 

Similar to Social media mining for sensing and responding to real-world trends and events

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)Quanah Zimmerman
 
eventuallyeverythingconnects-160111150846
eventuallyeverythingconnects-160111150846eventuallyeverythingconnects-160111150846
eventuallyeverythingconnects-160111150846Kaitlyn Whelan
 
Eventually everything connects
Eventually everything connectsEventually everything connects
Eventually everything connectsRachel Noonan
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...Farida Vis
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...REVEAL - Social Media Verification
 
A Study On The Changing Trends In Social Media And Its Impact Globally
A Study On The Changing Trends In Social Media And Its Impact GloballyA Study On The Changing Trends In Social Media And Its Impact Globally
A Study On The Changing Trends In Social Media And Its Impact GloballyAlicia Edwards
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Axel Bruns
 
ScPo - SoMe - introduction
ScPo - SoMe - introductionScPo - SoMe - introduction
ScPo - SoMe - introductionFabrice Epelboin
 
Beyond the Hype - Lasting Trends in Digital Media
Beyond the Hype - Lasting Trends in Digital MediaBeyond the Hype - Lasting Trends in Digital Media
Beyond the Hype - Lasting Trends in Digital MediaOlaf Nitz
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
 
The impact of false information spread.p
The impact of false information spread.pThe impact of false information spread.p
The impact of false information spread.pmaryams2156
 
Know Your Wikis From Your Blogs - CIPR West of England
Know Your Wikis From Your Blogs  - CIPR West of EnglandKnow Your Wikis From Your Blogs  - CIPR West of England
Know Your Wikis From Your Blogs - CIPR West of Englandmarketplace amp Ltd
 
Social media gaucher
Social media gaucherSocial media gaucher
Social media gaucherRob Camp
 
Fake news detection for Arabic headlines-articles news data using deep learning
Fake news detection for Arabic headlines-articles news data  using deep learningFake news detection for Arabic headlines-articles news data  using deep learning
Fake news detection for Arabic headlines-articles news data using deep learningIJECEIAES
 

Similar to Social media mining for sensing and responding to real-world trends and events (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Sc po some-01
Sc po some-01Sc po some-01
Sc po some-01
 
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)
Implementing a digital strategy at Gembloux Agro-Bio Tech (Université de Liège)
 
eventuallyeverythingconnects-160111150846
eventuallyeverythingconnects-160111150846eventuallyeverythingconnects-160111150846
eventuallyeverythingconnects-160111150846
 
Eventually everything connects
Eventually everything connectsEventually everything connects
Eventually everything connects
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
 
Media Genius Study Guide
Media Genius Study GuideMedia Genius Study Guide
Media Genius Study Guide
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
mpifg_p10_13
mpifg_p10_13mpifg_p10_13
mpifg_p10_13
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
 
A Study On The Changing Trends In Social Media And Its Impact Globally
A Study On The Changing Trends In Social Media And Its Impact GloballyA Study On The Changing Trends In Social Media And Its Impact Globally
A Study On The Changing Trends In Social Media And Its Impact Globally
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
ScPo - SoMe - introduction
ScPo - SoMe - introductionScPo - SoMe - introduction
ScPo - SoMe - introduction
 
Beyond the Hype - Lasting Trends in Digital Media
Beyond the Hype - Lasting Trends in Digital MediaBeyond the Hype - Lasting Trends in Digital Media
Beyond the Hype - Lasting Trends in Digital Media
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
The impact of false information spread.p
The impact of false information spread.pThe impact of false information spread.p
The impact of false information spread.p
 
Know Your Wikis From Your Blogs - CIPR West of England
Know Your Wikis From Your Blogs  - CIPR West of EnglandKnow Your Wikis From Your Blogs  - CIPR West of England
Know Your Wikis From Your Blogs - CIPR West of England
 
Social media gaucher
Social media gaucherSocial media gaucher
Social media gaucher
 
Fake news detection for Arabic headlines-articles news data using deep learning
Fake news detection for Arabic headlines-articles news data  using deep learningFake news detection for Arabic headlines-articles news data  using deep learning
Fake news detection for Arabic headlines-articles news data using deep learning
 

More from Yiannis Kompatsiaris

AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionYiannis Kompatsiaris
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingYiannis Kompatsiaris
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionYiannis Kompatsiaris
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and ApplicationsYiannis Kompatsiaris
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaYiannis Kompatsiaris
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Yiannis Kompatsiaris
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Yiannis Kompatsiaris
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...Yiannis Kompatsiaris
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Yiannis Kompatsiaris
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...Yiannis Kompatsiaris
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...Yiannis Kompatsiaris
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsYiannis Kompatsiaris
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Yiannis Kompatsiaris
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Yiannis Kompatsiaris
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsYiannis Kompatsiaris
 

More from Yiannis Kompatsiaris (18)

AI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI sessionAI4Media - European Leadership in Human-Centred Trustworthy AI session
AI4Media - European Leadership in Human-Centred Trustworthy AI session
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Sensor Based Ambient Assisted Living
Sensor Based Ambient Assisted LivingSensor Based Ambient Assisted Living
Sensor Based Ambient Assisted Living
 
Social Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event DetectionSocial Media Analytics for Graph-Based Event Detection
Social Media Analytics for Graph-Based Event Detection
 
Social Media Verification Challenges, Approaches and Applications
Social Media Verification  Challenges, Approaches and ApplicationsSocial Media Verification  Challenges, Approaches and Applications
Social Media Verification Challenges, Approaches and Applications
 
The DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with DementiaThe DemaWare Service-Oriented AAL Platform for People with Dementia
The DemaWare Service-Oriented AAL Platform for People with Dementia
 
Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)Vision about Social Networks Content Exploitation (EC Concertation meeting)
Vision about Social Networks Content Exploitation (EC Concertation meeting)
 
Dem@care Project Short Overview
Dem@care Project Short OverviewDem@care Project Short Overview
Dem@care Project Short Overview
 
Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)Social Media Crawling and Mining Seminar (Motivation Part)
Social Media Crawling and Mining Seminar (Motivation Part)
 
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ..."Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
"Μια πόλη από το μέλλον": Πως ο πολίτης μπορεί να γίνει συμμέτοχος μέσω της χ...
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
Τεχνικές Αναγνώρισης Προτύπων και Μηχανικής Μάθησης για Εφαρμογές Ανάλυσης Πο...
 
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ... Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
Άνοια στο σπίτι: Τεχνολογίες για παρακολούθηση από απόσταση και ανεξάρτητη δ...
 
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
SocialSensor Project: Sensing User Generated Input for Improved Media Discove...
 
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – RegionsImprove My City: App for Citizens Reporting Issues in Municipalities – Regions
Improve My City: App for Citizens Reporting Issues in Municipalities – Regions
 
Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams Socialsensor project overview and topic discovery in tweeter streams
Socialsensor project overview and topic discovery in tweeter streams
 
Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012Introduction for the Summer School on Social Media Modeling and Search 2012
Introduction for the Summer School on Social Media Modeling and Search 2012
 
Social media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applicationsSocial media mining and multimedia analysis research and applications
Social media mining and multimedia analysis research and applications
 

Recently uploaded

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Recently uploaded (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Social media mining for sensing and responding to real-world trends and events

  • 1. Social media mining for sensing and responding to real-world trends and events Dr. Yiannis Kompatsiaris, ikom@iti.gr Multimedia, Knowledge and Social Media Analytics Lab, Head CERTH-ITI CLEF 2020 Thessaloniki, Greece September 2020
  • 2. CLEF 2020 Social Media Mining Overview • Introduction – Motivation – Challenges – Conceptual architecture • Fighting Disinformation – Tweet Credibility Classification – Image Verification Assistant • Crisis Management – Location estimation and classification • Fighting abuse – Multiple identities detection • Contributions – Support - Conclusions 2
  • 3. CLEF 2020 Social Media Mining3 Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release http://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  • 4. CLEF 2020 Social Media Mining Hillary Clinton's Epic Group Selfie
  • 5. CLEF 2020 Social Media Mining User Profile Tags Social Media aspects
  • 6. CLEF 2020 Social Media Mining Multi-modal graphs #
  • 7. CLEF 2020 Social Media Mining7 rise of the networks (hubs, communiti centrality, etc) Intelligence Processi Unit (IPU) – graphco
  • 8. CLEF 2020 Social Media Mining Social Networks as Graphs
  • 9. CLEF 2020 Social Media Mining9 Social Networks as Real-Life Sensors • Social Networks is a data source with an extremely dynamic nature that reflects events and the evolution of community focus (user’s interests) • Huge smartphones and mobile devices penetration provides real-time and location-based user feedback • Transform individually rare but collectively frequent media to meaningful topics, events, points of interest, emotional states and social connections • Present in an efficient way for a variety of applications (news, security (cyber and physical), marketing, science, health)
  • 10. CLEF 2020 Social Media Mining10 Real-life Social Networks • Social networks have emergent properties. Emergent properties are new attributes of a whole that arise from the interaction and interconnection of the parts • Emotions, Health, Sexual relationships depend on our connections (e.g. number of them) and on our position - structure in the social graph • Central – Hub • Outlier • Transitivity (connections between friends)
  • 11. CLEF 2020 Social Media Mining11
  • 12. CLEF 2020 Social Media Mining Example – twitter and earthquakes 12
  • 13. CLEF 2020 Social Media Mining13 API Wrapper Website Wrapper Scheduler CRAWLING Visual Indexing Near-duplicates Text Indexing INDEXING Media Fetcher SNA Sentiment - Influence Trends - Topics MINING Model Building Concepts Relevance Diversity Popularity RANKING Veracity Crawling Specs Sources Interaction Responsiveness Aggregation VISUALIZATION Aesthetics Conceptual Architecture ANALYSIS PRESENTATION
  • 14. CLEF 2020 Social Media Mining14 Challenges – Content (Indexing - Mining) •Multi-modality: e.g. image + tags, video, audio •Rich social context: spatio-temporal, social connections, relations and social graph •Specific messages: short, conversations, errors, no context •Inconsistent quality: noise, spam, fake, propaganda •Huge volume: Massively produced and disseminated •Multi-source: may be generated by different applications and user communities •Dynamic: Fast updates, real-time
  • 15. CLEF 2020 Social Media Mining Policy – Licensing – Legal challenges • Fragmented access to data – Separate wrappers/APIs for each source (Twitter, Facebook, etc.) – Different data collection/crawling policies • Limitations imposed by API providers (“Walled Gardens”) • Full access to data impossible or extremely expensive (e.g. see data licensing plans for GNIP and DataSift) • Non-transparent data access practices (e.g. access is provided to an organization/person if they have a contact in Twitter) • Constant change of model and ToS of social APIs – No backwards compatibility, additional development costs • Ephemeral nature of content • Social search results often lead to removed content  inconsistent and unreliable referencing • User Privacy & Purpose of use • Fuzzy regulatory framework regarding mining user-contributed data 15
  • 17. CLEF 2020 Social Media Mining The Rise of Fake News 17 https://trends.google.com/trends/explore?date=all&geo=US&q=fake%20news US Elections 2016 Volume for query “fake news” over time: A key milestone has been the US Elections in 2016, which marked the beginning of large-scale coordinated disinformation campaigns.
  • 18. CLEF 2020 Social Media Mining Key Concepts • Fake news: popular term to refer to the phenomenon of disinformation, but currently avoided from Academics and the EC due to the fact that it is often misused by Trump and the alt-right • Disinformation: general term that typically refers to intentional (and often coordinated) efforts to spread misleading information to the public • Misinformation: refers to misleading content and information but not necessarily intentional • Propaganda: refers to coordinated campaigns aiming to spread a particular ideology or belief • Manipulated content: Also known as tampered or doctored. Refers to multimedia content that has been digitally altered typically for malicious purposes. 18
  • 19. CLEF 2020 Social Media Mining The Diffusion of Fake News Example cascade Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146-1151. Number of cascades Topic frequency Misleading posts tend to spread faster and wider compared to accurate ones.
  • 20. CLEF 2020 Social Media Mining The Famous Shark https://www.snopes.com/photos/animals/puertorico.asp 2005
  • 21. CLEF 2020 Social Media Mining
  • 22. CLEF 2020 Social Media Mining
  • 23. CLEF 2020 Social Media Mining A bit of “historical” background 2011-2014 2013-2016 2016-2018 Trend detection Social media search Quality & veracity of social media Media forensics Social media video verification Reverse video search 2018-2021 Deepfake detection Deep learning-assisted forensics and analysis 23 EU funded projects
  • 24. CLEF 2020 Social Media Mining Overview of Media Verification Resources Tools/Approaches • Social media verification – Tweet Credibility Classification – Context Analysis and Aggregation • Multimedia forensics – Image Verification Assistant – Video forensics • Reverse-image and video search Datasets • Tweet verification Corpus • Fake Video Corpus • FIVR-200K 24
  • 25. CLEF 2020 Social Media Mining Tweet Credibility Classification - Features Credibility cues (aka features)
  • 26. CLEF 2020 Social Media Mining Tweet Credibility Classification - Model
  • 27. CLEF 2020 Social Media Mining Tweet Credibility Classification - Evaluation 92.5% accuracy in identifying misleading posts 88-98% accuracy depending on language (major languages tested: en, fr, es, nl) New features and agreement-based retraining led to significant improvements! One of the top performing methods in the MediaEval VMU 2015 & 2016 tasks! Boididou, C., Papadopoulos, S., Zampoglou, M., Apostolidis, L., Papadopoulou, O., & Kompatsiaris, Y. (2018). Detection and visualization of misleading content on Twitter. International Journal of Multimedia Information Retrieval, 7(1), 71-86.
  • 28. CLEF 2020 Social Media Mining Image Verification Assistant – Intro (1/2) copy-move splicing in-painting retouching Types of Multimedia Manipulation
  • 29. CLEF 2020 Social Media Mining Image Verification Assistant – Intro (2/2) Lens Optical filter CFA pattern Real-world scene R G G B Imaging sensor (e.g. CCD) CFA interpolat. In-camera SW processing In-camera JPEG compress. DIGITAL CAMERA Digital image Out of camera SW processing Piva, A. (2013). An overview on image forensics. ISRN Signal Processing, 2013. Image Capturing & Tampering Process
  • 30. CLEF 2020 Social Media Mining Image Verification Assistant – Forensics Assume that when a “foreign” object is inserted into an image, some traces of it will be possible to detect. • Noise-based methods try to locate areas where the noise patterns are different compared to the rest. • JPEG compression analysis methods try to locate areas where some JPEG-specific property is different, e.g. 8x8 grid, DCT quantization, etc. • Machine learning-based methods try to locate areas that look like areas of tampered images that were used to “train” them. MeVer – Media Verification (mever.iti.gr) 30
  • 31. CLEF 2020 Social Media Mining Image Verification Assistant – Forensics Zampoglou, M., Papadopoulos, S., & Kompatsiaris, Y. (2015). Detecting image splicing in the wild (web). In International Conference on Multimedia & Expo Workshops (ICMEW), 2015 (pp. 1-6). IEEE The Challenge of Image Forensics on the (Wild) Web!
  • 32. CLEF 2020 Social Media Mining Image Verification Assistant - UI http://reveal-mklab.iti.gr/ Zampoglou, M., Papadopoulos, S., Kompatsiaris, Y., Bouwmeester, R., & Spangenberg, J. (2016, April). Web and Social Media Image Forensics for News Professionals. In SMN@ ICWSM.
  • 33. CLEF 2020 Social Media Mining Image Verification Assistant - Comparison 33MeVer – Media Verification (mever.iti.gr) FotoForensics1 Forensically2 Ghiro3 Ours ELA X X X X Ghost X DW Noise X Median Noise X X Block Artifact X Double Quantization X Deep Learning-based X Copy-move X* X Thumbnail X X Metadata X X X X Geotagging X X X X Reverse search X *Forensically implements a very simple block-matching algorithm with low robustness 1 http://fotoforensics.com 2 http://29a.ch/photo-forensics/ 3 http://www.imageforensic.org/
  • 34. CLEF 2020 Social Media Mining The Fake Video Corpus - Overview • 200 fake and 188 real newsworthy videos • 2206 fake and 1209 real near-duplicates • 388 cascades of near-duplicate videos https://mklab.iti.gr/results/fake-video-corpus/
  • 35. CLEF 2020 Social Media Mining The Fake Video Corpus - Analysis • Fake videos keep reappearing years later • Real videos tend to be reproduced mostly during the first month Papadopoulou, O., Zampoglou, M., Papadopoulos, S., & Kompatsiaris, I. (2019). A corpus of debunked and verified user-generated videos. Online information review.
  • 36. CLEF 2020 Social Media Mining The Rise of Deepfakes • Synthetic media become increasingly realistic mainly using Generative Adversarial Networks • We seem to get into an arms race on disinformation! • Novel solutions beyond supervised learning models will be needed!
  • 37. Social Data mining in Crisis Management
  • 38. CLEF 2020 Social Media Mining Approach • Thousands of tweets are generated during a crisis event in a specific location 38 256% rise in Italian tweets about floods on Thu, 01 November 2018 16:12 in Veneto
  • 39. CLEF 2020 Social Media Mining Posts during emergencies 39
  • 40. CLEF 2020 Social Media Mining Problem – Challenges – Existing Limitations • Civil protection agencies and local authorities require timely access to citizen observations during a crisis event to estimate the – Location of a crisis event (e.g. floods, fires, etc.) – Relevance of each tweet – Concepts of the image (e.g. people in danger) • Challenges and existing limitations include: – Management of large streams of data for event detection – Disambiguation from multimodal content (text/image) – Limited location information (only as mention in text) 40
  • 41. CLEF 2020 Social Media Mining Social Media Data Mining • Focusing on Twitter posts, collected with Twitter Streaming API https://developer.twitter.com/en/docs/tweets/filter-realtime/overview • Various analysis techniques to obtain further knowledge on the tweets • The complete flow: new tweet Search terms: • Keywords • Accounts • Bounding Boxes Keys & Tokens Twitter Streaming API Client receives tweets Fake tweets detection Text classification Image classification Get tweet in JSON format & find matching use case Nudity detection Tweets localisation Concept extraction tweet has image Inputs:
  • 42. CLEF 2020 Social Media Mining Datasets • Benchmark datasets (e.g. MediaEval tasks) • Collected datasets about crisis events 42 10 m. about fires in Spain 75 k. about floods in Italy 74 k. about heatwave in Greece 42 k. about snow in Finland
  • 43. CLEF 2020 Social Media Mining • Results of the NER task for English Dataset (CoNLL2003) Precision Recall F1- score Our system (ELMo embeddings) 91.63 93.01 92.32 Best-scoring CoNLL2003 system: Florian et al., 2003 88.99 88.54 88.76 Baevski, A. et al. 2019 (not reported) (not reporte d) 93.5 • Localisation steps after Named Entity Recognition (NER) has been performed on available tweets Dataset (EVALITA2009) Precisio n Recall F1- score Our system (GloVe embeddings) 75.49 75.60 75.37 Best-scoring shared task system: FBK_ZanoliPianta 84.07 80.02 82.00 Nguyen and Moschitti, 2012 85.99 82.73 84.33 • Results of the NER task for Italian Estimation of the location mentioned in a tweet
  • 44. CLEF 2020 Social Media Mining Concept Detection in Social Media Images • Extracts high-level concepts from visual low-level information • Fine-tune pre-trained 22-layer GoogleNet DCNN network to recognize the 345 TRECVID INS concepts and thresholding to keep concepts with higher probability • Concept examples: animal, boat_ship, clouds, waterscape_waterfront
  • 45. CLEF 2020 Social Media Mining CERTH-ITI participation in MediaEval 2018 First in the social media image classification (Average F1-score) https://www.youtube.com/watch?v=yq1nIPc6dWw&list=PLOPR p1vNOG9ahE5viJmF6Gx8XDk8hG9MP&index=2&t=0s
  • 46. CLEF 2020 Social Media Mining Demo • Social media dashboard in EOPEN project: – https://eopen.spaceapplications.com/dashboard/ – Dashboards  Social Media 46
  • 47. Fighting abuse, extremism, and terrorism in Social Media
  • 48. CLEF 2020 Social Media Mining Multiple identities detection in social media: sockpuppets, doppelgängers, and more • Users often hold several accounts in their effort to multiply the spread of their thoughts, ideas, and viewpoints • Illegal & abusive activities: creation of multiple accounts to bypass the combating measures enforced by social media platforms 48 Figure: Kumar et al. “An Army of Me: Sockpuppets in Online Discussion Communities” WWW 2017 User Identity Linkage Detect accounts likely to belong to the same natural person (“linked accounts”)
  • 49. CLEF 2020 Social Media Mining Approach Feature extraction • Profile (P) • Activity (A) • Linguistic (L) • Network (N) Data Collection Linked accounts detection User Modeling • Individual representation • Joint representation Classification • Probabilistic • Tree-based • Ensemble • Neural networks
  • 50. CLEF 2020 Social Media Mining Feature Extraction • Profile: e.g., demographic information, biography, avatar • Activity: e.g., number of posts, lists, shares, favorited tweets, mentions, hashtags, posts’ inter-arrival time • Linguistic: i.e., character-based, word-based, sentence-based, dictionary- based, syntactic-based • Network: e.g., # followers, # friends, authority, hub, # triangles, eigenvector, PageRank, clustering coefficient Feature extraction • Profile (P) • Activity (A) • Linguistic (L) • Network (N) Data Collection Linked accounts detection User Modeling • Individual representation • Joint representation Classification • Probabilistic • Tree-based • Ensemble • Neural networks
  • 51. CLEF 2020 Social Media Mining • 𝑢𝑖: 𝑉𝑆 𝑢 𝑖 = < 𝑓𝑆 𝑖1 , 𝑓𝑆 𝑖2 , … , 𝑓𝑆 𝑖 𝑗 , … , 𝑓𝑆 𝑖 𝑛 >, Feature sets: S = {P, A, L, N} User Modeling: Individual representation 𝑗𝑡ℎ feature of category S for 𝑢𝑖 Total number of features for category S Example: 𝑉𝑁 𝑣 𝑖 = < 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑡𝑦𝑖, ℎ𝑢𝑏𝑖, … , 𝑃𝑎𝑔𝑒𝑅𝑎𝑛𝑘𝑖 > Feature extraction • Profile (P) • Activity (A) • Linguistic (L) • Network (N) Data Collection Linked accounts detection User Modeling • Individual representation • Joint representation Classification • Probabilistic • Tree-based • Ensemble • Neural networks
  • 52. CLEF 2020 Social Media Mining User Modeling: Joint representation 1. abs: absolute difference of feature vectors of 𝑢𝑖, 𝑢𝑗 2. sim: similarity of the per-category feature vector (Cosine similarity, Euclidean distance, Manhattan distance) 3. Similarity of the content posted by users 𝑢𝑖, 𝑢𝑗 • edits: edit distance - Levenshtein distance • sem: semantic similarity - vector space model approach (word embeddings) Feature extraction • Profile (P) • Activity (A) • Linguistic (L) • Network (N) Data Collection Linked accounts detection User Modeling • Individual representation • Joint representation Classification • Probabilistic • Tree-based • Ensemble • Neural networks
  • 53. CLEF 2020 Social Media Mining Classification • Probabilistic: Naïve Bayes, BayesNet • Tree-based: J48, LADTree, LMT • Ensemble: Random Forest (RF), AdaBoost and voting ensembles • Deep Neural Network • Recurrent Neural Network (RNN) • Combined Network: Text classification network + Metadata network Feature extraction • Profile (P) • Activity (A) • Linguistic (L) • Network (N) Data Collection Linked accounts detection User Modeling • Individual representation • Joint representation Classification • Probabilistic • Tree-based • Ensemble • Neural networks
  • 54. CLEF 2020 Social Media Mining Comparison to other approaches [1] Fredrik Johansson, Lisa Kaati, and Amendra Shrestha (2013) Detecting multiple aliases in social media. In Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining. [2] Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. A case study of sockpuppet detection in wikipedia. In Proceedings of the Workshop on Language Analysis in Social Media. ACL. [3] Michail Tsikerdekis and Sherali Zeadally. (2014) Multiple account identity deception detection in social media using nonverbal behavior. IEEE Transactions on Information Forensics and Security 9, 8 (2014). [4] Fredrik Johansson, Lisa Kaati, and Amendra Shrestha. (2015) Timeprints for identifying social media users with multiple aliases. Security Informatics 4, 1 (2015) [5] Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian (2017) An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. [6] Jan Pennekamp, Martin Henze, Oliver Hohlfeld, and Andriy Panchenko. (2019) Hi Doppelgänger : Towards Detecting Manipulation in News Comments. In Companion Proceedings of The 2019 World Wide Web Conference. [7] Despoina Chatzakou, Juan Soler-Company, Theodora Tsikrika, Leo Wanner, Stefanos Vrochidis, Ioannis Kompatsiaris, (2020) User Identity Linkage in Social Media Using Linguistic and Social Interaction Features”. In Proceedings of the 2020 ACM on Web Science Conference Features Classifier Activity Linguistic Network Traditional ML NN Character Word Sentence Dictionary Syntactic Distribution Segmentation Connection Johansson et al. [1] X X X X Solorio et al. [2] X X X X X Tsikerdekis et al. [3] X X Johansson et al. [4] X X X X X Kumar et al. [5] X X X X X X X X X Pennekamp et al. [6] X X X X X Ours [7] X X X X X X X X X X X
  • 55. CLEF 2020 Social Media Mining Datasets and Ground Truth Manual creation of the ground truth due to the absence of ground truth that indicates which user accounts belong to the same person • Split each account 𝑢𝑖 (its posts) in two distinct accounts: 𝑢𝑖𝑎 and 𝑢𝑖𝑏 • linked accounts: (𝑢𝑖𝑎, 𝑢𝑖𝑏) • non-linked accounts: (𝑢𝑖𝑎, 𝑢𝑗𝑏), where 𝑖 ≠ 𝑗 • 10% of linked and 90% non-linked accounts Abusive Dataset • June to August 2016 • Relevant to Gamergate controversy • Abusive-related English hashtags • 650K tweets and 312K users Terrorism Dataset • February 2017 to June 2018 • Relevant to Jihadist terrorism • Terrorism-related Arabic keywords • 65K tweets and 35K users
  • 56. CLEF 2020 Social Media Mining Experimental Methodology: Features Combination 𝑨𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒂𝒃𝒔 𝑳𝒊𝒏𝒈𝒖𝒊𝒔𝒕𝒊𝒄 𝒂𝒃𝒔 𝑵𝒆𝒕𝒘𝒐𝒓𝒌 𝒂𝒃𝒔 𝑨𝒍𝒍 𝒂𝒃𝒔 Baseline abs: absolute difference sim: similarity of feature vectors edits: edit distance (Levenshtein) sem: semantic similarity 𝑨𝒄𝒕𝒊𝒗𝒊𝒕𝒚 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎 𝑳𝒊𝒏𝒈𝒖𝒊𝒔𝒕𝒊𝒄 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎 𝑵𝒆𝒕𝒘𝒐𝒓𝒌 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎 𝑨𝒍𝒍 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎 𝑨𝒍𝒍 𝒔𝒊𝒎 𝑨𝒍𝒍 𝒔𝒊𝒎 + 𝑨𝒍𝒍 𝒂𝒃𝒔 𝑨𝒍𝒍 𝒔𝒊𝒎 + 𝑨𝒍𝒍 𝒂𝒃𝒔 + 𝒆𝒅𝒊𝒕𝒔 + 𝒔𝒆𝒎
  • 57. CLEF 2020 Social Media Mining Experimental Phases • Phase 1: 10% linked & 90% non-linked • #𝑙𝑖𝑛𝑘𝑒𝑑 𝑎𝑐𝑜𝑢𝑛𝑡𝑠: 200 • #non-linked accounts: 1,800 • Phase 2: Varying number of linked accounts • # linked accounts: 200 to 500 with step 100 • # non-linked accounts: 1,800 • Phase 3: Varying number of non-linked accounts • # linked accounts: 200 • # non-linked accounts: 1,800 to 39,800 with step 1,800
  • 58. CLEF 2020 Social Media Mining Results: Abusive dataset (Phase 1) Features • Traditional classifiers: Network features perform better (combined or not with the edit & sim features) • Neural Network: Linguistic features result to a better performance (combined or not with the edit & sim features) Classifiers • Random Forest achieves the best performance (AUC: 99.50%)
  • 59. CLEF 2020 Social Media Mining Results: Abusive dataset (Phases 2 & 3) Varied linked accounts Varied non-linked accounts • From 200 to 300: slight increase (precision, recall, accuracy) • Stable performance: 99% AUC • Even with the highest number of non-linked user accounts, AUC remains at around 87.30% • Increase of precision & recall when more data are available • At ~24k non-linked accounts, precision & recall converge Results obtained by using Random Forest as classifier
  • 60. CLEF 2020 Social Media Mining Results: Terrorism dataset (Phase 1) Features • J48: Network features perform better compared to the Activity and Linguistic • Random Forest, BayesNet, Neural Network: Linguistic features result to a better performance compared to the Activity and Network • In most cases all feature categories (using the abs) combined with similarity feature vectors result to the best performance Classifiers • Random Forest achieves the best performance (AUC: 99.50%)
  • 61. CLEF 2020 Social Media Mining Results: Terrorism dataset (Phases 2 & 3) • Higher number of linked user accounts => higher precision, recall & accuracy • Stable performance: 99% AUC • AUC fluctuates from 94% to 99.50% • Precision & recall fluctuate from 97.1% to 99% • Stable model even with a quite unbalanced dataset Varied linked accounts Varied non-linked accounts Results obtained by using Random Forest as classifier
  • 62. CLEF 2020 Social Media Mining Conclusions • Social media data useful in many applications – From confirming existing and known correlations to prediction and decision- making • Many challenges exist – Data availability and representativeness (of society, real-event) – Coverage, robustness and reproducibility – Authenticity (threat to democratic society) – Real-time and scalable approaches – Fusion of various modalities (Content, social, temporal, location) • Required contribution from various disciplines – Content Analytics – Machine Learning – Network Analysis – Psychology – Social Sciences (patterns of presentation, sharing) – Visualization • Currently mostly an auxiliary means for real-events assessment and decision- making, which can generate additional insights 63
  • 63. CLEF 2020 Social Media Mining With Contributions from • Dr. Symeon Papadopoulos – Social network analysis, social media content mining and multimedia indexing and retrieval – http://mklab.iti.gr/people/papadop – Twitter: @sympap • Dr. Ilias Gialampoukidis – Social media mining and classification, topic detection, community and key-player identification, multimodal fusion and multimedia retrieval – http://www.researchgate.net/profile/Ilias_Gialampoukidis • Dr. Theodora Tsikrika – Web and social media search and mining, multimedia indexing and retrieval, AI- based multimodal analytics, evaluation – https://www.iti.gr/iti/people/Theodora_Tsikrika.html • Dr. Stefanos Vrochidis – Multimodal data fusion, web and social media mining, multimedia analysis and retrieval, multimodal analytics – https://sites.google.com/site/stevrochidis/ 64
  • 64. CLEF 2020 Social Media Mining Support Tools and services for Social Media verification from a journalistic and enterprise perspective. 65 Video verification platform including video forensics, reverse video search and context analysis and aggregation. Social media verification platform including deepfake detection and a database of known fakes. InterCONnected NEXt- Generation Immersive IoT Platform of Crime and Terrorism DetectiON, PredictiON, InvestigatiON, and PreventiON Services EU funded projects opEn interOperable Platform for unified access and analysis of Earth observatioN data Enhancing decision support and management services in extreme weather climate events
  • 65. Thank you for your attention! ikom@iti.gr http://mklab.iti.gr

Editor's Notes

  1. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  2. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  3. The main point here is that the original tweet which was misleading was retweeted much more than the tweet that made the correction. https://twitter.com/Thomas_Binder/status/984934979451879424 https://twitter.com/Thomas_Binder/status/985665154695262211 https://www.dailysabah.com/syrian-crisis/2018/04/18/cardiologist-apologizes-after-falsely-accusing-white-helmets-of-staging-syria-chemical-attack
  4. Of the many tools that our team develops, we will briefly focus on a model for tweet credibility classification and a tool for image verification based on image forensics. Also, the Fake Video Corpus will be presented.
  5. - Automatic resaving and exif removal
  6. Features Tampering localization heat maps Six state-of-the-art algorithms and one newly proposed (CAGI) Zoom-in and overlay of heat map over image Auxiliary features Metadata: full listing, GPS geolocation, Exif thumbnail extraction Reverse image search: auto-generation of link to perform search on Google Images Quantitative Six reference datasets (images + binary masks of tampering = “ground truth”) Measures capturing the matching between ground truth mask and algorithm output Comparison of 14 algorithms, “best” six plus a newly proposed one ended up in the tool Qualitative Informal feedback received by end users Usability and quality of results
  7. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  8. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  9. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  10. matching use case -> first classification Tweets localization – NERecognition for location keyword extraction (LSTM – CRF) -> openstreetmaps
  11. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  12. In the case where location entities are recognised, the bounding box of each location is retrieved via the OpenStreetMap API. In case no location entities are recognised, the organisation entities are considered. Finally, an analysis of the bounding boxes returned follows. Specifically, in case of one entity, a single bounding box is returned. However, in case of multiple entities, the bounding boxes are compared with each other in order to exclude bigger areas when a smaller - more precise one is also available and all the remaining are returned as output. - English language results are already satisfactory for the purposes of PUC 1. Although our scores are lower than the current state-of-the-art (Baevski, A. et al. 2019), they are not far off, while the model still outperforms the baseline method (Florian et al., 2003). - The Italian dataset results are for the purposes of PUC 1. The model is still being worked on and fine-tuning is under process. As can be seen the model’s experimental character is apparent since it is outperformed by the baseline method. Accuracy is expected to increase considerably when we manually enhance the dataset with our own annotations, update the annotation format and decide on final parameters. - To the best of our knowledge there is only one freely available Finnish dataset which was extracted from the archives of Digitoday, an online technology news source. It consists of 953 annotated with six named entity classes (organisation, location, person, product (PRO), event (EVENT), and date (DATE)). The dataset in its current state is too small to be used with a DNN, so no remarkable results are expected until the issue of data size is addressed. On that account, we are working towards enhancing it, by adding more sentences of our own manual annotation efforts.
  13. Figure: Nodes represent users and edges connect users that reply to each other. Sockpuppets (red nodes) tend to interact with other sockpuppets, and are more central in the network than ordinary users (blue nodes) – where a sockpuppet can be defined as a user account that is controlled by an individual (or puppetmaster) who controls at least one other user account. It should be noted that not all people who have multiple online identities engage in malicious activities; however, our focus is on those who do. Of course, the proposed techniques for detecting multiple identities are applicable irrespective of the context (i.e., whether the motivation of creating multiple identifies is for malicious purposes or not).
  14. How to represent each individual users: feature vector for each of the feature sets
  15. How to represent pairs of users: We do that since the goal is to classify whether a pair of users are likely to belong to the same natural person.
  16. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  17. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  18. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/