This document summarizes a tutorial on mining misinformation in social media. It discusses how misinformation spreads through social networks, fueled by spreaders and influenced users. It then covers challenges in detecting misinformation due to its large scale, dynamic nature, and ability to be deceiving. The tutorial overview outlines sections on misinformation detection methods including analyzing content, context and propagation. It discusses features like text, images and modeling message sequences. The importance of early detection is emphasized due to the potential societal harm of misinformation.
1. Arizona State University Mining Misinformation in Social Media, November 21, 2017 1
Mining Misinformation in Social Media:
Understanding Its Rampant Spread, Harm, and Intervention
Liang Wu1, Giovanni Luca Ciampaglia2, Huan Liu1
1Arizona State University
2Indiana University Bloomington
2. Arizona State University Mining Misinformation in Social Media, November 21, 2017 2
Tutorial Web Page
• All materials and resources are available
online:
http://bit.ly/ICDMTutorial
4. Arizona State University Mining Misinformation in Social Media, November 21, 2017 4
Definition of Misinformation
• False and inaccurate information that is spontaneously
spread.
• Misinformation can be…
– Disinformation
– Rumors
– Urban legend
– Spam
– Troll
– Fake news
– …
https://www.kdnuggets.com/2016/08/misinformation-key-terms-explained.html
5. Arizona State University Mining Misinformation in Social Media, November 21, 2017 5
Ecosystem of Misinformation
Misinformation
Fake
News Rumors
Spams
Click-
baits
User
User
User User
• Spreaders
– Fabrication
• Misinformation
– Fake news
– Rumor
– …
• Influenced Users
– Echo chamber
– Filter bubble
Motivation
Spammer Fraudster …
6. Arizona State University Mining Misinformation in Social Media, November 21, 2017 6
Misinformation Ramification
Top issues highlighted for 2014
• 1. Rising societal tensions in the Middle East
and North Africa
• 10. The rapid spread of misinformation online
• 2. Widening income disparities
• …
• 3. Persistent structural unemployment
4.07
4.02
3.97
3.35
• Top 10 global risks – World Economic Forum
7. Arizona State University Mining Misinformation in Social Media, November 21, 2017 7
Word of The Year
• Macquarie Dictionary Word of the Year 2016
• Oxford Dictionaries Word of the Year 2016
8. Arizona State University Mining Misinformation in Social Media, November 21, 2017 8
Social Media
• Social media has changed the way of
exchanging and obtaining information.
• 500 million tweets are posted per day
– An effective channel for information dissemination
RenRenTwitter & Facebook
9. Arizona State University Mining Misinformation in Social Media, November 21, 2017 9
Social Media: A Channel for Misinformation
• False and inaccurate information is pervasive.
• Misinformation can be devastating.
– Cause undesirable consequences
– Wreak havoc
User
User
User User
Echo Chamber: Misinformation
can be reinforced
Filter Bubble: Misinformation
can be targeted
10. Arizona State University Mining Misinformation in Social Media, November 21, 2017 10
Two Examples
• PizzaGate – Fake News has Real Consequences
– What made Edgar Maddison Welch “raid” a “pedo
ring” on 12/1/2016?
– All started with a post on Facebook, spread to
Twitter and then went viral with platforms like
Breitbart and Info-Wars
• Anti-Vaccine Movement on Social Media: A
case of echo chambers and filter bubbles
– Peer-to-peer connection
– Groups
– Facebook feeds
11. Arizona State University Mining Misinformation in Social Media, November 21, 2017 11
PizzaGate
https://www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html
•WikiLeaks began releasing emails of Podesta.
•2
•Social media users on Reddit searched the releases for evidence of wrongdoing.
•3
•Discussions were found that include the word pizza, including dinner plans.
•4
•A participant connected the phrase “cheese pizza” to pedophiles (“c.p.” -->child pornography).
•5
•Following the use of “pizza,” theorists focused on the Washington pizza restaurant Comet Ping Pong.
•6
•The theory started snowballing, taking on the meme #PizzaGate. Fake news articles emerged.
•7
•The false stories swept up neighboring businesses and bands that had played at Comet. Theories about kill rooms,
underground tunnels, satanism and even cannibalism emerged.
•8
•Edgar M. Welch, a 28-year-old from North Carolina, fired the rifle inside the pizzeria, and surrendered after finding
no evidence to support claims of child slaves.
•9
•The shooting did not put the theory to rest. Purveyors of the theory and fake news pointed to the mainstream media
as conspirators of a coverup to protect what they said was a crime ring.
Oct-Nov 2016
Nov 3rd, 2016
Nov 23rd, 2016
Dec 4th, 2016
2016-2017
12. Arizona State University Mining Misinformation in Social Media, November 21, 2017 12
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
13. Arizona State University Mining Misinformation in Social Media, November 21, 2017 13
Overview of Today’s Tutorial
• Introduction
• Misinformation Detection
• Misinformation in Social Media
• Misinformation Spreader Detection
• Resources
40 minutes
20 minutes
40 minutes
10 minutes
10 minutes
14. Arizona State University Mining Misinformation in Social Media, November 21, 2017 14
Misinformation Detection
15. Arizona State University Mining Misinformation in Social Media, November 21, 2017 15
Misinformation in Social Media: An Example
16. Arizona State University Mining Misinformation in Social Media, November 21, 2017 16
Misinformation in Social Media: An Example
17. Arizona State University Mining Misinformation in Social Media, November 21, 2017 17
Misinformation in Social Media: An Example
18. Arizona State University Mining Misinformation in Social Media, November 21, 2017 18
Misinformation in Social Media: An Example
Misinformation Spreader
Content of Misinformation
• Text
• Hashtag
• URL
• Emoticon
• Image
• Video (GIF)
Context of Misinformation
• Date, Time
• Location
Propagation of Misinformation
• Retweet
• Reply
• Like
19. Arizona State University Mining Misinformation in Social Media, November 21, 2017 19
Overview of Misinformation Detection
Misinformation
Detection
Content
Context
Propagation
Early
Detection
Individual Message
or
Message Cluster
+
Supervised: Classification
or
Unsupervised: Anomaly
Anomalous Time of Bursts
Lack of Data
Lack of Labels
Who
When
How
[1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011.
[2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013).
[3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
[4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
[5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015.
[6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.
[7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018.
[8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017.
[9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016.
[10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017.
[1, 2, 3, 4]
[5, 6]
[7, 8]
[7]
[8]
[9]
[10]
20. Arizona State University Mining Misinformation in Social Media, November 21, 2017 20
Feature Engineering on Content
Text Feature Example
Length of post #words, #characters
Punctuation marks Question mark ? Exclamation!
Emojis/Emoticons Angry face ;-L
Sentiment Sentiment/swear/curse words
Pronoun (1st, 2nd, 3rd) I, me, myself, my, mine
URL, PageRank of domain
Mention (@)
Hashtag (#)
21. Arizona State University Mining Misinformation in Social Media, November 21, 2017 21
Misinformation Detection: Text Matching
• Text Matching
– Exact matching
– Relevance
• TF-IDF
• BM25
– Semantic
• Word2Vec
• Doc2Vec
• Drawbacks
– Low Recall
Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing."
iConference 2014 Proceedings (2014).
Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social
Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017.
Fake News 1
Fake News 2
…
Exact Duplication
Similar Words
Similar
Representation
Different
Representation
Relevance
22. Arizona State University Mining Misinformation in Social Media, November 21, 2017 22
Misinformation Detection: Supervised Learning
• Message-based
– A vector represents a tweet
• Message cluster-based
– A vector represents a cluster of tweets
• Methods
– Random Forest
– SVM
– Naïve Bayes
– Decision Tree
– Maximum Entropy
– Logistic Regression
Individual
Posts
Clusters of
Posts
Picking Data
Picking A Method
Radom
Forest
SVM …
23. Arizona State University Mining Misinformation in Social Media, November 21, 2017 23
Visual Content-based Detection
• Diversity of Images
Jin, Zhiwei, et al. "Novel visual and statistical image features for microblogs news verification."
IEEE Transactions on Multimedia 19.3 (2017): 598-608.
Texas Pizza Hut workers paddle through flood
waters to deliver free pizzas by kayak
There are sharks swimming in the streets of
Houston during Hurricane Harvey
24. Arizona State University Mining Misinformation in Social Media, November 21, 2017 24
References
• Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on
twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings
(2014).
• Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related
Rumors on Twitter." International Conference on Social Computing, Behavioral-
Cultural Modeling and Prediction and Behavior Representation in Modeling and
Simulation. Springer, Cham, 2017.
• Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during
high impact events." Proceedings of the 1st workshop on privacy and security in
online social media. ACM, 2012.
• Yu, Suisheng, Mingcai Li, and Fengming Liu. "Rumor Identification with Maximum
Entropy in MicroNet.“
• Yang, Fan, et al. "Automatic detection of rumor on sina weibo." Proceedings of the
ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012.
• Zhang, Qiao, et al. "Automatic detection of rumor on social network." Natural
Language Processing and Chinese Computing. Springer, Cham, 2015. 113-122.
• Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Information credibility on
twitter." Proceedings of the 20th international conference on World wide web.
ACM, 2011.
• Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Predicting information
credibility in time-sensitive social media." Internet Research 23.5 (2013): 560-588.
• Qazvinian, Vahed, et al. "Rumor has it: Identifying misinformation in microblogs."
Proceedings of the Conference on Empirical Methods in Natural Language
Processing. Association for Computational Linguistics, 2011.
• Wu, Shu, et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
Text
Matching
Message-
based
Cluster-
based
25. Arizona State University Mining Misinformation in Social Media, November 21, 2017 25
Modeling Message Sequence
• The chronological order of messages is ignored
• Messages are generated as a
temporal sequence
– Modeling posts as documents
ignores the structural information
26. Arizona State University Mining Misinformation in Social Media, November 21, 2017 26
Modeling Post Sequence: Message-based
• Message-based
– Conditional Random Fields (CRF)
Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
Linear
Chain
CRF
27. Arizona State University Mining Misinformation in Social Media, November 21, 2017 27
Modeling Post Sequence: Cluster-based
• Message cluster-based
– Recurrent Neural Networks
Ma et al. "Detecting Rumors from Microblogs with Recurrent Neural Networks." IJCAI. 2016.
Classifier
Recurrent
Neural
Network
Classifier
Layer
28. Arizona State University Mining Misinformation in Social Media, November 21, 2017 28
Personalized Misinformation Detection (PCA)
• Detecting anomalous content of a user with PCA
• Main assumption
– Misinformation likely to be eccentric to normal
content of a user
• Detecting misinformation as content outliers
– Tweet-based modeling
– Measure distance between a new message with
historical data
Zhang, Yan, et al. "A distance-based outlier detection method for rumor detection
exploiting user behaviorial differences." Data and Software Engineering (ICoDSE), 2016
International Conference on. IEEE, 2016.
New
Post
New
Post
Historical Posts
Measure
Distance
29. Arizona State University Mining Misinformation in Social Media, November 21, 2017 29
Personalized Misinformation Detection (Autoencoder)
• Detecting anomalous content of a user with Autoencoder
• Multi-layer Autoencoder
– Train an autoencoder with historical data
– To test a message:
• Feed it to the autoencoder
• Obtain the reconstructed data
• Calculate distance between the original and the
reconstruction
Zhang, Yan, et al. "Detecting rumors on Online Social Networks using multi-layer autoencoder."
Technology & Engineering Management Conference (TEMSCON), 2017 IEEE. IEEE, 2017.
30. Arizona State University Mining Misinformation in Social Media, November 21, 2017 30
Detecting Misinformation with Context
Context of Misinformation
• Date, Time
• Location
31. Arizona State University Mining Misinformation in Social Media, November 21, 2017 31
Peak Time of Misinformation
Misinformation on Twitter
Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.Friggeri et al. "Rumor Cascades." ICWSM. 2014.
Misinformation on Facebook
• Rebirth of misinformation
– Misinformation has multiple peaks over time
– True information has only one
32. Arizona State University Mining Misinformation in Social Media, November 21, 2017 32
Detecting Misinformation with Propagation
Propagation of Misinformation
• Retweet
• Reply
• Like
33. Arizona State University Mining Misinformation in Social Media, November 21, 2017 33
• Misinformation is spread by similar users
– Bot army
– Echo chamber of misinformation
• Intuition: misinformation can be distinguished
by who spreads it, and how it is spread
• Challenges
– Users may change accounts (bot army)
– Data sparsity
Detecting Misinformation with Propagation
34. Arizona State University Mining Misinformation in Social Media, November 21, 2017 34
Detecting Misinformation with Propagation
• User Embedding:
• Message Classification
Liang Wu, Huan Liu. “Characterizing Social Media Messages by How They Propagate." WSDM 2018.
B
C
A
D
Users Embed User Representations
Posts
Networks
Community
B
C
A
D
Propagation Pathways Sequence Modeling
A
B
C
D
A B D
Classifier
35. Arizona State University Mining Misinformation in Social Media, November 21, 2017 35
Key Issue for Misinformation Detection
• April 2013, AP tweeted, Two Explosions in the
White House and Barack Obama is injured
– Truth: Hackers hacked the account
– However, it tipped stock market by $136 billion in
2 minutes
36. Arizona State University Mining Misinformation in Social Media, November 21, 2017 36
Early Detection of Misinformation: Challenges
• Challenges of early detection
–Message cluster based methods
• Lack of data
–Supervised learning methods
• Lack of labels
37. Arizona State University Mining Misinformation in Social Media, November 21, 2017 37
Early Detection Challenge I: Lack of Data
• Lack of data
• Early stage: few posts sparsely scattered
• Most methods prove effective in a later stage
Early Stage Later Stage
38. Arizona State University Mining Misinformation in Social Media, November 21, 2017 38
Early Detection: Lack of Data
• Linking scattered messages
– Clustering messages
– Merge individual messages
• Hashtag linkage
• Web Linkage
Sampson et al. "Leveraging the implicit structure within social media for emergent rumor detection." CIKM 2016.
Hashtag
Web Link
39. Arizona State University Mining Misinformation in Social Media, November 21, 2017 39
Early Detection Challenge II: Lack of Labels
• Lack of labels
– Traditional text categories
• Articles within the same category share similar
vocabulary and writing styles
• E.g., sports news are similar to each other
– Misinformation is heterogeneous
• Two rumors are unlikely to be similar to each other
Rumor about
Presidential Election
Rumor about
Ferguson Protest
40. Arizona State University Mining Misinformation in Social Media, November 21, 2017 40
Early Detection (II): Lack of Labels
• Utilize user responses from prior
misinformation
– Clustering misinformation with similar responses
– Selecting effective features shared by a cluster
Post #1: “Can't fix stupid but it can be blocked”
Post #2: “So, when did bearing false witness become a
Christian value?”
Post #3: “Christians Must Support Trump or Face Death
Camps. Does he still claim to be a Christian?”
Post #1: “i've just seen the sign on fb. you can't fix stupid”
Post #2: “THIS IS PURE INSANITY. HOW ABOUT THIS
STATEMENT”
Post #3: “No Mother Should Have To Fear For Her Son's Life
Every Time He Robs A Store”
Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017
41. Arizona State University Mining Misinformation in Social Media, November 21, 2017 41
Early Detection Results: Lack of Data
• Effectiveness of linkage
Classification without linkage
Classification with hashtag linkage
Classification with web linkage
42. Arizona State University Mining Misinformation in Social Media, November 21, 2017 42
Early Detection Results: Lack of Labels
• Effectiveness of linkage
Effectiveness of different methods over time Results at an early stage (2 hours)
43. Arizona State University Mining Misinformation in Social Media, November 21, 2017 43
Overview of Misinformation Detection
Misinformation
Detection
Content
Context
Propagation
Early
Detection
Individual Message
or
Message Cluster
+
Supervised: Classification
or
Unsupervised: Anomaly
Anomalous Time of Bursts
Lack of Data
Lack of Label
Who
When
How
[1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011.
[2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013).
[3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
[4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
[5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015.
[6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.
[7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018.
[8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017.
[9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016.
[10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017.
[1, 2, 3, 4]
[5, 6]
[7, 8]
[7]
[8]
[9]
[10]
44. Arizona State University Mining Misinformation in Social Media, November 21, 2017 44
Spread of Misinformation
46. ➢ What is Misinformation and Why it Spreads on Social Media
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
47. ➢ What is Misinformation and Why it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
48. Pheme
❖ Wartime studies, types of rumors (e.g., pipe dreams)
[Knapp 1944, Allport & Pottsman, 1947]
❖ “Demand” for Improvised News
[Shibutani, 1968]
❖ Two-step information diffusion
[Katz & Latzarsfeld, 1955]
❖ Reputation exchange
[Rosnow & Fine, 1976]
❖ Collective Sensemaking, Watercooler effect
[Bordia & DiFonzo 2004]
Swift is her walk, more swift her winged haste:
A monstrous phantom, horrible and vast.
As many plumes as raise her lofty flight,
So many piercing eyes inlarge her sight;
Millions of opening mouths to Fame belong,
And ev'ry mouth is furnish'd with a tongue,
And round with list'ning ears the flying plague is hung.
Aeneid, Book IV
Publij Virgilij maronis opera cum quinque
vulgatis commentariis Seruii Mauri honorati
gram[m]atici: Aelii Donati: Christofori Landini:
Antonii Mancinelli & Domicii Calderini,
expolitissimisque figuris atque imaginibus
nuper per Sebastianum Brant superadditis,
exactissimeque revisis atque elimatis,
Straßburg: Grieninger 1502.
49. Source: “Fake News. It’s Complicated”. First Draft News medium.com/1st-draft
51. Echo chambers
What is the role of online
social networks and social
media in fostering echo
chambers, filter bubbles,
segregation, polarization?
Adamic & Glance (2005),
[Blogs]
Conover et al. (2011),
[Twitter]
52. Recap: What is
misinformation
and why it
spreads
❖ Misinformation has always existed
❖ Social media disseminate
(mis)information very quickly
❖ Echo chambers insulate people from
fact-checking and verifications
53. ➢ What is Misinformation and How it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
54. ❖ Compartmental models (SI, SIR, SIS, etc.)
[Kermack and McKendrick, 1927]
❖ Rumor spreading models (DK, MT)
[Daley and Kendall 1964, Maki 1973]
❖ Independent Cascades Model
[Kempe et al., 2005]
❖ Threshold Model, Complex Contagion
[Granovetter 1979, Centola 2010]
Models of
Information
Diffusion
Pi
(m) ∝ Pi
(m) ∝ f ( i )
f monotonically increasing
Probability of adopting a
“meme” at the i-th exposure
55. Simple vs
Complex
Contagion
Complex contagion:
strong concentration of
communication inside
communities
Simple contagion:
weak concentration
❖ Most memes spread like complex contagion
❖ Viral memes spread across communities
more like diseases (simple contagion)
Weng et al. (2014) [Twitter]
57. Role of the
social network
and limited
attention
❖ Spread among agents with limited attention
on social network is sufficient to explain
virality patterns
❖ Not necessary to invoke more complicated
explanations based on intrinsic meme quality
Weng et al. (2014), Nature Sci. Rep.
60. Recap: Models
of the Spread of
Misinformation
❖ Simple vs complex contagion
❖ More realistic features
➢ Agents have limited attention
➢ Social network structure
➢ Competition between different memes
❖ Tradeoff between information quality and
diversity
61. ➢ What is Misinformation and Why it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
65. Conclusions
❖ What is misinformation and why it spreads
➢ Online it spreads through a mix social, cognitive, and algorithmic biases.
❖ Models the spread of misinformation
➢ Social network structure, limited attention, and information overload
make us vulnerable to misinformation.
❖ Open Questions:
➢ Bots are strategic superspreaders
➢ They are effective at spreading misinformation.
❖ Tools to detect manipulation of public opinions may be
first steps toward a trustworthy Web.
68. Recap: Open
Questions
❖ Social Bots Amplify Misinformation
➢ Through social reinforcement
➢ Early amplification
➢ Target humans, possibly “influentials”
70. Supply of and Demand for
Information
❖ Production of information is associated to shifts in collective attention
❖ Evidence that attention precedes production
❖ Higher demand → higher price → more production
Ciampaglia et al. Scientific Reports 2015
source: Wikipedia
71. Predicting
Virality
Structural Trapping Social Reinforcement Homophily
M1: random sampling model
M2: random cascading model (structural trapping)
M3: social reinforcement model (structural trapping + social reinforcement)
M4: homophily model (structural trapping + homophily)
SIMPLE Contagion
COMPLEX Contagion
Weng et al. (2014),
[Twitter]
74. Arizona State University Mining Misinformation in Social Media, November 21, 2017 45
Misinformation Spreader Detection
75. Arizona State University Mining Misinformation in Social Media, November 21, 2017 46
Misinformation in Social Media: An Example
Misinformation Spreader
Content of Misinformation
• Text
• Hashtag
• URL
• Emoticon
• Image
• Video (GIF)
Context of Misinformation
• Date, Time
• Location
Propagation of Misinformation
• Retweet
• Reply
• Like
76. Arizona State University Mining Misinformation in Social Media, November 21, 2017 47
Detecting Misinformation by Its Spreaders
Misinformation Spreader
• A large portion of OSN
accounts are likely to be
fake
– Facebook: 67.65 million –
137.76 million
– Twitter: 48 million
77. Arizona State University Mining Misinformation in Social Media, November 21, 2017 48
A Misinformation Spreader
• Misinformation spreaders: users that deliberately
spread misinformation to mislead others
A phishing link
to Twivvter.com
78. Arizona State University Mining Misinformation in Social Media, November 21, 2017 49
Types of Misinformation Spreaders
• Spammers
• Fraudsters
• Trolls
• Crowdturfers
• …
Misinformation
Fake
News Rumor
Spam
Spammer Fraudster
Clickbait
…
79. Arizona State University Mining Misinformation in Social Media, November 21, 2017 50
Features for Capturing a Spreader
• What can be used to detect a spammer?
– Profile
– Posts (Text)
– Friends (Network)
Profile Features
Post Features
Network Features
80. Arizona State University Mining Misinformation in Social Media, November 21, 2017 51
• Extracting features from a profile
– #followers, #followees
• E.g., small #followers suspicious account
– Biography, registration time, screen name, etc.
Feature Engineering: Profile
Profile Features
81. Arizona State University Mining Misinformation in Social Media, November 21, 2017 52
• Extracting text features from user posts
– Text: BoW, TF-IDF, etc.
Feature Engineering: Text
…
A textual feature vector:
82. Arizona State University Mining Misinformation in Social Media, November 21, 2017 53
• Extracting network features
– Network: Adjacency matrix, number of follower,
follower/followee ratio, centrality
Feature Engineering: Network
Adjacency matrix:
83. Arizona State University Mining Misinformation in Social Media, November 21, 2017 54
Overview: Misinformation Spreader Detection
[1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007.
[2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in
Microblogging. In IJCAI.
[3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International
Conference on Neural Information Processing. Springer, Cham, 2017.
[4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling.
ICWSM 2017 (pp. 319-326).
[5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017.
[6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
Content
Network
Text Mining [1]
Content
Network
Camouflage
Content
Text + Graph Mining [2, 3]
Data
Methods Instance (Post/User)
Selection [4, 5, 6]
84. Arizona State University Mining Misinformation in Social Media, November 21, 2017 55
Supervised Learning: Content + Network
…
A textual feature vector:
Adjacency matrix:
Profile Features
Post Features
Network Features
• Features for supervised learning
– Text features
– Network features
85. Arizona State University Mining Misinformation in Social Media, November 21, 2017 56
Traditional Approach: Content Modeling
…
A textual feature vector:
Positive and negative
accounts can be
distinguished with text
: Coefficients to be estimated
• Supervised learning with text features
86. Arizona State University Mining Misinformation in Social Media, November 21, 2017 57
Traditional Approach: Network Modeling
…
A textual feature vector:
Adjacency matrix:
• Supervised learning with network features
Friends are likely to
have the same label
: Coefficients to be estimated
87. Arizona State University Mining Misinformation in Social Media, November 21, 2017 58
Emerging Challenge: Camouflage
• Content Camouflage
– Copy content from legitimate users
– Exploit compromised account
• Network Camouflage
– Link farming with other spreaders, bots
– Link farming with normal users
88. Arizona State University Mining Misinformation in Social Media, November 21, 2017 59
Challenge (I): Camouflage
• In order to avoid being detected:
– manipulate the text feature vector
• posting content similar to regular users’
– manipulate the adjacency matrix
• harvesting links with other users
Positive and negative
accounts can be
distinguished with text
Friends are likely to
have the same label
89. Arizona State University Mining Misinformation in Social Media, November 21, 2017 60
Challenge (II): Network Camouflage
• Heuristic-based methods
• #Followers
• Follower/followee ratio
• Anomaly detection
90. Arizona State University Mining Misinformation in Social Media, November 21, 2017 61
Challenge III: Limited Label Information
• Label a malicious account (positive)
– Suspended accounts
– Honeypot
• A set of bots created to lure other bots in the wild.
• Any user that follow them is a bot.
• The assumption is that normal users can easily
recognize them.
• Lack of labeled camouflage
Honeypots
91. Arizona State University Mining Misinformation in Social Media, November 21, 2017 62
Camouflage
• Prior assumptions:
– All suspended accounts are misinformation spreaders
– All posts of a spreader are malicious
• Selecting a subset of users for training
• Selecting a subset of posts for training
Positive and negative
accounts can be
distinguished with text
Friends are likely to
have the same label
92. Arizona State University Mining Misinformation in Social Media, November 21, 2017 63
Selecting Users for Training
Select a subset of
users for training
Evaluate with a
validation set
Update the
training set
• How to select the optimal set for training?
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
93. Arizona State University Mining Misinformation in Social Media, November 21, 2017 64
Relaxation I: Group Structure
• Assumption: malicious accounts
cannot join a legitimate community
– organize users in groups
– users in the same group should be
similarly weighted
𝑮 𝟏
𝟎
𝑮 𝟏
𝟏
𝑮 𝟑
𝟏
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4} {6, 7}{5} 𝑮 𝟐
𝟏
𝟓 6 7𝑮 𝟏
𝟐
{1, 2}
1 2
𝑮 𝟐
𝟐
{3, 4}
3 4
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
94. Arizona State University Mining Misinformation in Social Media, November 21, 2017 65
Relaxation II: Weighted Training
𝐰,𝐜
𝐦𝐢𝐧
𝒊=𝟏
𝑵
ci 𝐱𝐢 𝐰 − 𝐲𝒊
𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨
𝒊
𝒄𝒊 = 𝑲
0 < 𝒄𝒊< 1
+ λ2σi=0
d
σj=1
ni ||𝐜Gj
i||2
𝑔𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜𝑎𝑣𝑜𝑖𝑑 𝑜𝑣𝑒𝑟𝑓𝑖𝑡𝑡𝑖𝑛𝑔
𝑮 𝟏
𝟎
𝑮 𝟏
𝟏
𝑮 𝟑
𝟏
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4} {5} 𝑮 𝟐
𝟏
𝟓 6 7𝑮 𝟏
𝟐
{1, 2}
1 2
𝑮 𝟐
𝟐
{3, 4}
3 4
L1 norm on the
inter-group level
L2 norm on the
intra-group level
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
𝐜Gj
i: nodes of group j on layer i
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
95. Arizona State University Mining Misinformation in Social Media, November 21, 2017 66
Optimization
𝐰,𝐜
𝐦𝐢𝐧
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨
𝒊
𝒄𝒊 = 𝟏
ci: weight of i
𝐱i: an attribute vector of i
𝐰: coefficients of linear regression
𝐲𝒊: Label of instance i
||𝐰||2
2: avoiding overfitting
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
+ λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
N: number of instances
𝐜Gj
i: nodes of group j on layer i
𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜
𝐰
𝐦𝐢𝐧
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
Optimize w
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
96. Arizona State University Mining Misinformation in Social Media, November 21, 2017 67
Optimization
𝐰,𝐜
𝐦𝐢𝐧
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨
𝒊
𝒄𝒊 = 𝟏
ci: weight of i
𝐱i: an attribute vector of i
𝐰: coefficients of linear regression
𝐲𝒊: Label of instance i
||𝐰||2
2: avoiding overfitting
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
+ λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
N: number of instances
𝐜Gj
i: nodes of group j on layer i
𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜
𝐰,𝐜
𝐦𝐢𝐧
𝒊=𝟏
𝒎
ci 𝑡𝒊 + λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
Optimize c
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
97. Arizona State University Mining Misinformation in Social Media, November 21, 2017 68
Experimental Results
1 Hu et al., “Social spammer detection in microblogging.”, IJCAI’13
2 Ye et al., “Discovering opinion spammer groups by network footprints.”, ECML-PKDD’15
Approaches Precision Recall F-Score
SSDM1 92.15% 92.00% 92.07%
NFS2 88.16% 65.67% 75.27%
SGASD 93.75% 96.92% 95.31%
• Collecting data with honeypots
– http://infolab.tamu.edu/data/
Tweets Users ReTweets Links Spammers
4,453,380 38,400 223,115 8,739,105 19,200
98. Arizona State University Mining Misinformation in Social Media, November 21, 2017 69
Content Camouflage
• Basic assumption for traditional methods:
– All content of a misinformation spreader is
malicious
• Content camouflage: posts of a misinformation
spreader may be legitimate
– Copy content from legitimate users
– Exploit compromised accounts
99. Arizona State University Mining Misinformation in Social Media, November 21, 2017 70
Content Camouflage: An Example
A normal post
A normal post
100. Arizona State University Mining Misinformation in Social Media, November 21, 2017 71
Challenge: Lack of Labeled Data
• Labels of camouflage are costly to collect
101. Arizona State University Mining Misinformation in Social Media, November 21, 2017 72
Learning to Identify Camouflage
• Assumption: posts of misinformation
spreaders are mixed with normal and
malicious.
• Introduce a weight for each post label.
• Select posts that distinguish between
misinformation spreaders and normal users.
102. Arizona State University Mining Misinformation in Social Media, November 21, 2017 73
Learning to Identify Camouflage: Formulation
Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
103. Arizona State University Mining Misinformation in Social Media, November 21, 2017 74
Experimental Results
• Findings:
– Sophisticated misinformation spreaders first
disguise, and then do harm.
• Results:
Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
104. Arizona State University Mining Misinformation in Social Media, November 21, 2017 75
Misinformation Spreader Detection
[1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007.
[2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in
Microblogging. In IJCAI.
[3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International
Conference on Neural Information Processing. Springer, Cham, 2017.
[4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling.
ICWSM 2017 (pp. 319-326).
[5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017.
[6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
Content
Network
Text Mining [1]
Content
Network
Camouflage
Content
Text + Graph Mining [2, 3]
Data
Methods Instance (Post/User)
Selection [4, 5, 6]
105. Arizona State University Mining Misinformation in Social Media, November 21, 2017 76
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
106. Arizona State University Mining Misinformation in Social Media, November 21, 2017 77
Codes, Platforms and Datasets
107. Arizona State University Mining Misinformation in Social Media, November 21, 2017 78
Platforms
• TweetTracker: Detecting Topic-centric Bots
• Hoaxy: Tracking Online Misinformation
• Botometer: Detecting Bots on Twitter
108. Arizona State University Mining Misinformation in Social Media, November 21, 2017 79
Fact-Checking Websites
• Online Fact-checking websites
– PolitiFact: http://www.politifact.com/
– Truthy: http://truthy.indiana.edu/
– Snopes: http://www.snopes.com/
– TruthOrFiction: https://www.truthorfiction.com/
– Weibo Rumor: http://service.account.weibo.com/
• Volunteering committee
109. Arizona State University Mining Misinformation in Social Media, November 21, 2017 80
Code and Data Repositories
• Honeypot: http://bit.ly/ASUHoneypot
• Identification: https://veri.ly/
• Diffusion:
– Python Networkx: https://networkx.github.io/
– Stanford SNAP: http://snap.stanford.edu/
• Datasets
– http://socialcomputing.asu.edu/pages/datasets
– http://bit.ly/asonam-bot-data
– https://github.com/jsampso/AMNDBots
– http://carl.cs.indiana.edu/data/#fact-checking
– http://snap.stanford.edu/data/index.html
110. Arizona State University Mining Misinformation in Social Media, November 21, 2017 81
Book Chapters
• “Mining Misinformation in Social Media’’, Chapter 5
in Big Data in Complex and Social Networks
• http://bit.ly/2AYr5KM
• “Detecting Crowdturfing in Social Media’’,
Encyclopedia of Social Network Analysis and Mining
• http://bit.ly/2hE6LXE
111. Arizona State University Mining Misinformation in Social Media, November 21, 2017 82
Twitter Data Analytics
• Common tasks in mining Twitter Data.
– Free Download with Code & Data
– Collection
– Analysis
– Visualization
tweettracker.fulton.asu.edu/tda/
112. Arizona State University Mining Misinformation in Social Media, November 21, 2017 83
Social Media Mining
• Social Media Mining:
An Introduction – a textbook
• A comprehensive coverage of
social media mining techniques
– Free Download
– Network Measures and Analysis
– Influence and Diffusion
– Community Detection
– Classification and Clustering
– Behavior Analytics
http://dmml.asu.edu/smm/
113. Arizona State University Mining Misinformation in Social Media, November 21, 2017 84
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
114. Arizona State University Mining Misinformation in Social Media, November 21, 2017 85
Q&A
• Liang Wu, Giovanni Luca Ciampaglia, Huan Liu
• {wuliang, huanliu}@asu.edu
• All materials and resources are available
online:
• Giovanni Luca Ciampaglia
• gciampag@indiana.edu
http://bit.ly/ICDMTutorial
115. Arizona State University Mining Misinformation in Social Media, November 21, 2017 86
Acknowledgements
• DMML @ ASU
• NaN @ IUB
• MINERVA initiative through the ONR N000141310835 on
Multi-Source Assessment of State Stability