SlideShare ist ein Scribd-Unternehmen logo
1 von 115
Downloaden Sie, um offline zu lesen
Arizona State University Mining Misinformation in Social Media, November 21, 2017 1
Mining Misinformation in Social Media:
Understanding Its Rampant Spread, Harm, and Intervention
Liang Wu1, Giovanni Luca Ciampaglia2, Huan Liu1
1Arizona State University
2Indiana University Bloomington
Arizona State University Mining Misinformation in Social Media, November 21, 2017 2
Tutorial Web Page
• All materials and resources are available
online:
http://bit.ly/ICDMTutorial
Arizona State University Mining Misinformation in Social Media, November 21, 2017 3
Introduction
Arizona State University Mining Misinformation in Social Media, November 21, 2017 4
Definition of Misinformation
• False and inaccurate information that is spontaneously
spread.
• Misinformation can be…
– Disinformation
– Rumors
– Urban legend
– Spam
– Troll
– Fake news
– …
https://www.kdnuggets.com/2016/08/misinformation-key-terms-explained.html
Arizona State University Mining Misinformation in Social Media, November 21, 2017 5
Ecosystem of Misinformation
Misinformation
Fake
News Rumors
Spams
Click-
baits
User
User
User User
• Spreaders
– Fabrication
• Misinformation
– Fake news
– Rumor
– …
• Influenced Users
– Echo chamber
– Filter bubble
Motivation
Spammer Fraudster …
Arizona State University Mining Misinformation in Social Media, November 21, 2017 6
Misinformation Ramification
Top issues highlighted for 2014
• 1. Rising societal tensions in the Middle East
and North Africa
• 10. The rapid spread of misinformation online
• 2. Widening income disparities
• …
• 3. Persistent structural unemployment
4.07
4.02
3.97
3.35
• Top 10 global risks – World Economic Forum
Arizona State University Mining Misinformation in Social Media, November 21, 2017 7
Word of The Year
• Macquarie Dictionary Word of the Year 2016
• Oxford Dictionaries Word of the Year 2016
Arizona State University Mining Misinformation in Social Media, November 21, 2017 8
Social Media
• Social media has changed the way of
exchanging and obtaining information.
• 500 million tweets are posted per day
– An effective channel for information dissemination
RenRenTwitter & Facebook
Arizona State University Mining Misinformation in Social Media, November 21, 2017 9
Social Media: A Channel for Misinformation
• False and inaccurate information is pervasive.
• Misinformation can be devastating.
– Cause undesirable consequences
– Wreak havoc
User
User
User User
Echo Chamber: Misinformation
can be reinforced
Filter Bubble: Misinformation
can be targeted
Arizona State University Mining Misinformation in Social Media, November 21, 2017 10
Two Examples
• PizzaGate – Fake News has Real Consequences
– What made Edgar Maddison Welch “raid” a “pedo
ring” on 12/1/2016?
– All started with a post on Facebook, spread to
Twitter and then went viral with platforms like
Breitbart and Info-Wars
• Anti-Vaccine Movement on Social Media: A
case of echo chambers and filter bubbles
– Peer-to-peer connection
– Groups
– Facebook feeds
Arizona State University Mining Misinformation in Social Media, November 21, 2017 11
PizzaGate
https://www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html
•WikiLeaks began releasing emails of Podesta.
•2
•Social media users on Reddit searched the releases for evidence of wrongdoing.
•3
•Discussions were found that include the word pizza, including dinner plans.
•4
•A participant connected the phrase “cheese pizza” to pedophiles (“c.p.” -->child pornography).
•5
•Following the use of “pizza,” theorists focused on the Washington pizza restaurant Comet Ping Pong.
•6
•The theory started snowballing, taking on the meme #PizzaGate. Fake news articles emerged.
•7
•The false stories swept up neighboring businesses and bands that had played at Comet. Theories about kill rooms,
underground tunnels, satanism and even cannibalism emerged.
•8
•Edgar M. Welch, a 28-year-old from North Carolina, fired the rifle inside the pizzeria, and surrendered after finding
no evidence to support claims of child slaves.
•9
•The shooting did not put the theory to rest. Purveyors of the theory and fake news pointed to the mainstream media
as conspirators of a coverup to protect what they said was a crime ring.
Oct-Nov 2016
Nov 3rd, 2016
Nov 23rd, 2016
Dec 4th, 2016
2016-2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 12
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
Arizona State University Mining Misinformation in Social Media, November 21, 2017 13
Overview of Today’s Tutorial
• Introduction
• Misinformation Detection
• Misinformation in Social Media
• Misinformation Spreader Detection
• Resources
40 minutes
20 minutes
40 minutes
10 minutes
10 minutes
Arizona State University Mining Misinformation in Social Media, November 21, 2017 14
Misinformation Detection
Arizona State University Mining Misinformation in Social Media, November 21, 2017 15
Misinformation in Social Media: An Example
Arizona State University Mining Misinformation in Social Media, November 21, 2017 16
Misinformation in Social Media: An Example
Arizona State University Mining Misinformation in Social Media, November 21, 2017 17
Misinformation in Social Media: An Example
Arizona State University Mining Misinformation in Social Media, November 21, 2017 18
Misinformation in Social Media: An Example
Misinformation Spreader
Content of Misinformation
• Text
• Hashtag
• URL
• Emoticon
• Image
• Video (GIF)
Context of Misinformation
• Date, Time
• Location
Propagation of Misinformation
• Retweet
• Reply
• Like
Arizona State University Mining Misinformation in Social Media, November 21, 2017 19
Overview of Misinformation Detection
Misinformation
Detection
Content
Context
Propagation
Early
Detection
Individual Message
or
Message Cluster
+
Supervised: Classification
or
Unsupervised: Anomaly
Anomalous Time of Bursts
Lack of Data
Lack of Labels
Who
When
How
[1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011.
[2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013).
[3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
[4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
[5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015.
[6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.
[7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018.
[8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017.
[9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016.
[10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017.
[1, 2, 3, 4]
[5, 6]
[7, 8]
[7]
[8]
[9]
[10]
Arizona State University Mining Misinformation in Social Media, November 21, 2017 20
Feature Engineering on Content
Text Feature Example
Length of post #words, #characters
Punctuation marks Question mark ? Exclamation!
Emojis/Emoticons Angry face ;-L
Sentiment Sentiment/swear/curse words
Pronoun (1st, 2nd, 3rd) I, me, myself, my, mine
URL, PageRank of domain
Mention (@)
Hashtag (#)
Arizona State University Mining Misinformation in Social Media, November 21, 2017 21
Misinformation Detection: Text Matching
• Text Matching
– Exact matching
– Relevance
• TF-IDF
• BM25
– Semantic
• Word2Vec
• Doc2Vec
• Drawbacks
– Low Recall
Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing."
iConference 2014 Proceedings (2014).
Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social
Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017.
Fake News 1
Fake News 2
…
Exact Duplication
Similar Words
Similar
Representation
Different
Representation
Relevance
Arizona State University Mining Misinformation in Social Media, November 21, 2017 22
Misinformation Detection: Supervised Learning
• Message-based
– A vector represents a tweet
• Message cluster-based
– A vector represents a cluster of tweets
• Methods
– Random Forest
– SVM
– Naïve Bayes
– Decision Tree
– Maximum Entropy
– Logistic Regression
Individual
Posts
Clusters of
Posts
Picking Data
Picking A Method
Radom
Forest
SVM …
Arizona State University Mining Misinformation in Social Media, November 21, 2017 23
Visual Content-based Detection
• Diversity of Images
Jin, Zhiwei, et al. "Novel visual and statistical image features for microblogs news verification."
IEEE Transactions on Multimedia 19.3 (2017): 598-608.
Texas Pizza Hut workers paddle through flood
waters to deliver free pizzas by kayak
There are sharks swimming in the streets of
Houston during Hurricane Harvey
Arizona State University Mining Misinformation in Social Media, November 21, 2017 24
References
• Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on
twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings
(2014).
• Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related
Rumors on Twitter." International Conference on Social Computing, Behavioral-
Cultural Modeling and Prediction and Behavior Representation in Modeling and
Simulation. Springer, Cham, 2017.
• Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during
high impact events." Proceedings of the 1st workshop on privacy and security in
online social media. ACM, 2012.
• Yu, Suisheng, Mingcai Li, and Fengming Liu. "Rumor Identification with Maximum
Entropy in MicroNet.“
• Yang, Fan, et al. "Automatic detection of rumor on sina weibo." Proceedings of the
ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012.
• Zhang, Qiao, et al. "Automatic detection of rumor on social network." Natural
Language Processing and Chinese Computing. Springer, Cham, 2015. 113-122.
• Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Information credibility on
twitter." Proceedings of the 20th international conference on World wide web.
ACM, 2011.
• Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Predicting information
credibility in time-sensitive social media." Internet Research 23.5 (2013): 560-588.
• Qazvinian, Vahed, et al. "Rumor has it: Identifying misinformation in microblogs."
Proceedings of the Conference on Empirical Methods in Natural Language
Processing. Association for Computational Linguistics, 2011.
• Wu, Shu, et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
Text
Matching
Message-
based
Cluster-
based
Arizona State University Mining Misinformation in Social Media, November 21, 2017 25
Modeling Message Sequence
• The chronological order of messages is ignored
• Messages are generated as a
temporal sequence
– Modeling posts as documents
ignores the structural information
Arizona State University Mining Misinformation in Social Media, November 21, 2017 26
Modeling Post Sequence: Message-based
• Message-based
– Conditional Random Fields (CRF)
Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
Linear
Chain
CRF
Arizona State University Mining Misinformation in Social Media, November 21, 2017 27
Modeling Post Sequence: Cluster-based
• Message cluster-based
– Recurrent Neural Networks
Ma et al. "Detecting Rumors from Microblogs with Recurrent Neural Networks." IJCAI. 2016.
Classifier
Recurrent
Neural
Network
Classifier
Layer
Arizona State University Mining Misinformation in Social Media, November 21, 2017 28
Personalized Misinformation Detection (PCA)
• Detecting anomalous content of a user with PCA
• Main assumption
– Misinformation likely to be eccentric to normal
content of a user
• Detecting misinformation as content outliers
– Tweet-based modeling
– Measure distance between a new message with
historical data
Zhang, Yan, et al. "A distance-based outlier detection method for rumor detection
exploiting user behaviorial differences." Data and Software Engineering (ICoDSE), 2016
International Conference on. IEEE, 2016.
New
Post
New
Post
Historical Posts
Measure
Distance
Arizona State University Mining Misinformation in Social Media, November 21, 2017 29
Personalized Misinformation Detection (Autoencoder)
• Detecting anomalous content of a user with Autoencoder
• Multi-layer Autoencoder
– Train an autoencoder with historical data
– To test a message:
• Feed it to the autoencoder
• Obtain the reconstructed data
• Calculate distance between the original and the
reconstruction
Zhang, Yan, et al. "Detecting rumors on Online Social Networks using multi-layer autoencoder."
Technology & Engineering Management Conference (TEMSCON), 2017 IEEE. IEEE, 2017.
Arizona State University Mining Misinformation in Social Media, November 21, 2017 30
Detecting Misinformation with Context
Context of Misinformation
• Date, Time
• Location
Arizona State University Mining Misinformation in Social Media, November 21, 2017 31
Peak Time of Misinformation
Misinformation on Twitter
Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.Friggeri et al. "Rumor Cascades." ICWSM. 2014.
Misinformation on Facebook
• Rebirth of misinformation
– Misinformation has multiple peaks over time
– True information has only one
Arizona State University Mining Misinformation in Social Media, November 21, 2017 32
Detecting Misinformation with Propagation
Propagation of Misinformation
• Retweet
• Reply
• Like
Arizona State University Mining Misinformation in Social Media, November 21, 2017 33
• Misinformation is spread by similar users
– Bot army
– Echo chamber of misinformation
• Intuition: misinformation can be distinguished
by who spreads it, and how it is spread
• Challenges
– Users may change accounts (bot army)
– Data sparsity
Detecting Misinformation with Propagation
Arizona State University Mining Misinformation in Social Media, November 21, 2017 34
Detecting Misinformation with Propagation
• User Embedding:
• Message Classification
Liang Wu, Huan Liu. “Characterizing Social Media Messages by How They Propagate." WSDM 2018.
B
C
A
D
Users Embed User Representations
Posts
Networks
Community
B
C
A
D
Propagation Pathways Sequence Modeling
A
B
C
D
A B D
Classifier
Arizona State University Mining Misinformation in Social Media, November 21, 2017 35
Key Issue for Misinformation Detection
• April 2013, AP tweeted, Two Explosions in the
White House and Barack Obama is injured
– Truth: Hackers hacked the account
– However, it tipped stock market by $136 billion in
2 minutes
Arizona State University Mining Misinformation in Social Media, November 21, 2017 36
Early Detection of Misinformation: Challenges
• Challenges of early detection
–Message cluster based methods
• Lack of data
–Supervised learning methods
• Lack of labels
Arizona State University Mining Misinformation in Social Media, November 21, 2017 37
Early Detection Challenge I: Lack of Data
• Lack of data
• Early stage: few posts sparsely scattered
• Most methods prove effective in a later stage
Early Stage Later Stage
Arizona State University Mining Misinformation in Social Media, November 21, 2017 38
Early Detection: Lack of Data
• Linking scattered messages
– Clustering messages
– Merge individual messages
• Hashtag linkage
• Web Linkage
Sampson et al. "Leveraging the implicit structure within social media for emergent rumor detection." CIKM 2016.
Hashtag
Web Link
Arizona State University Mining Misinformation in Social Media, November 21, 2017 39
Early Detection Challenge II: Lack of Labels
• Lack of labels
– Traditional text categories
• Articles within the same category share similar
vocabulary and writing styles
• E.g., sports news are similar to each other
– Misinformation is heterogeneous
• Two rumors are unlikely to be similar to each other
Rumor about
Presidential Election
Rumor about
Ferguson Protest
Arizona State University Mining Misinformation in Social Media, November 21, 2017 40
Early Detection (II): Lack of Labels
• Utilize user responses from prior
misinformation
– Clustering misinformation with similar responses
– Selecting effective features shared by a cluster
Post #1: “Can't fix stupid but it can be blocked”
Post #2: “So, when did bearing false witness become a
Christian value?”
Post #3: “Christians Must Support Trump or Face Death
Camps. Does he still claim to be a Christian?”
Post #1: “i've just seen the sign on fb. you can't fix stupid”
Post #2: “THIS IS PURE INSANITY. HOW ABOUT THIS
STATEMENT”
Post #3: “No Mother Should Have To Fear For Her Son's Life
Every Time He Robs A Store”
Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 41
Early Detection Results: Lack of Data
• Effectiveness of linkage
Classification without linkage
Classification with hashtag linkage
Classification with web linkage
Arizona State University Mining Misinformation in Social Media, November 21, 2017 42
Early Detection Results: Lack of Labels
• Effectiveness of linkage
Effectiveness of different methods over time Results at an early stage (2 hours)
Arizona State University Mining Misinformation in Social Media, November 21, 2017 43
Overview of Misinformation Detection
Misinformation
Detection
Content
Context
Propagation
Early
Detection
Individual Message
or
Message Cluster
+
Supervised: Classification
or
Unsupervised: Anomaly
Anomalous Time of Bursts
Lack of Data
Lack of Label
Who
When
How
[1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011.
[2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013).
[3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media."
[4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016.
[5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015.
[6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.
[7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018.
[8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017.
[9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016.
[10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017.
[1, 2, 3, 4]
[5, 6]
[7, 8]
[7]
[8]
[9]
[10]
Arizona State University Mining Misinformation in Social Media, November 21, 2017 44
Spread of Misinformation
Mining
Misinformation
in Social Media Giovanni Luca Ciampaglia
glciampaglia.com
ICDM 2017, New Orleans, Nov 21, 2017
➢ What is Misinformation and Why it Spreads on Social Media
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
➢ What is Misinformation and Why it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
Pheme
❖ Wartime studies, types of rumors (e.g., pipe dreams)
[Knapp 1944, Allport & Pottsman, 1947]
❖ “Demand” for Improvised News
[Shibutani, 1968]
❖ Two-step information diffusion
[Katz & Latzarsfeld, 1955]
❖ Reputation exchange
[Rosnow & Fine, 1976]
❖ Collective Sensemaking, Watercooler effect
[Bordia & DiFonzo 2004]
Swift is her walk, more swift her winged haste:
A monstrous phantom, horrible and vast.
As many plumes as raise her lofty flight,
So many piercing eyes inlarge her sight;
Millions of opening mouths to Fame belong,
And ev'ry mouth is furnish'd with a tongue,
And round with list'ning ears the flying plague is hung.
Aeneid, Book IV
Publij Virgilij maronis opera cum quinque
vulgatis commentariis Seruii Mauri honorati
gram[m]atici: Aelii Donati: Christofori Landini:
Antonii Mancinelli & Domicii Calderini,
expolitissimisque figuris atque imaginibus
nuper per Sebastianum Brant superadditis,
exactissimeque revisis atque elimatis,
Straßburg: Grieninger 1502.
Source: “Fake News. It’s Complicated”. First Draft News medium.com/1st-draft
hoaxy.iuni.iu.edu
Query: “three million votes illegal aliens”
Echo chambers
What is the role of online
social networks and social
media in fostering echo
chambers, filter bubbles,
segregation, polarization?
Adamic & Glance (2005),
[Blogs]
Conover et al. (2011),
[Twitter]
Recap: What is
misinformation
and why it
spreads
❖ Misinformation has always existed
❖ Social media disseminate
(mis)information very quickly
❖ Echo chambers insulate people from
fact-checking and verifications
➢ What is Misinformation and How it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
❖ Compartmental models (SI, SIR, SIS, etc.)
[Kermack and McKendrick, 1927]
❖ Rumor spreading models (DK, MT)
[Daley and Kendall 1964, Maki 1973]
❖ Independent Cascades Model
[Kempe et al., 2005]
❖ Threshold Model, Complex Contagion
[Granovetter 1979, Centola 2010]
Models of
Information
Diffusion
Pi
(m) ∝ Pi
(m) ∝ f ( i )
f monotonically increasing
Probability of adopting a
“meme” at the i-th exposure
Simple vs
Complex
Contagion
Complex contagion:
strong concentration of
communication inside
communities
Simple contagion:
weak concentration
❖ Most memes spread like complex contagion
❖ Viral memes spread across communities
more like diseases (simple contagion)
Weng et al. (2014) [Twitter]
Weng et al. (2014), Nature Sci. Rep.
Role of the
social network
and limited
attention
❖ Spread among agents with limited attention
on social network is sufficient to explain
virality patterns
❖ Not necessary to invoke more complicated
explanations based on intrinsic meme quality
Weng et al. (2014), Nature Sci. Rep.
Can the best
ideas win?
α{
P(m) ∝ (m)
f fitness function
Discriminative Power
(Kendall’s Tau)
When do
the best
ideas win?
High Quality Low Quality
Recap: Models
of the Spread of
Misinformation
❖ Simple vs complex contagion
❖ More realistic features
➢ Agents have limited attention
➢ Social network structure
➢ Competition between different memes
❖ Tradeoff between information quality and
diversity
➢ What is Misinformation and Why it Spreads
➢ Modeling the Spread of Misinformation
➢ Open Questions
○ What techniques are used to boost misinformation?
Introduction
Bots are
super
spreaders
Shao et al. 2017 (CoRR)
Bots are
strategic
Shao et al. 2017 (CoRR)
Bots are
effective
Shao et al. 2017 (CoRR)
Conclusions
❖ What is misinformation and why it spreads
➢ Online it spreads through a mix social, cognitive, and algorithmic biases.
❖ Models the spread of misinformation
➢ Social network structure, limited attention, and information overload
make us vulnerable to misinformation.
❖ Open Questions:
➢ Bots are strategic superspreaders
➢ They are effective at spreading misinformation.
❖ Tools to detect manipulation of public opinions may be
first steps toward a trustworthy Web.
Thanks!
cnets.indiana.edu
iuni.iu.iedu
Marcella
Tambuscio
WMF Research Showcase, August 17, 2016 Giovanni Luca Ciampaglia gciampag@indiana.edu
Recap: Open
Questions
❖ Social Bots Amplify Misinformation
➢ Through social reinforcement
➢ Early amplification
➢ Target humans, possibly “influentials”
afterbefore
(days)
Demand and
Supply of
Information
2012 London Olympics [Wikipedia]
Ciampaglia et al., Sci. Rep. (2015)
London
England
Usain
Bolt
Olympics
Medal
2012
London
Olympics
Supply of and Demand for
Information
❖ Production of information is associated to shifts in collective attention
❖ Evidence that attention precedes production
❖ Higher demand → higher price → more production
Ciampaglia et al. Scientific Reports 2015
source: Wikipedia
Predicting
Virality
Structural Trapping Social Reinforcement Homophily
M1: random sampling model
M2: random cascading model (structural trapping)
M3: social reinforcement model (structural trapping + social reinforcement)
M4: homophily model (structural trapping + homophily)
SIMPLE Contagion
COMPLEX Contagion
Weng et al. (2014),
[Twitter]
Virality and
Competition
for Attention
User Popularity
# followers
[Yahoo! Meme]
Hashtag Popularity
# daily retweets
[Twitter]
2B Views 55M Followers
Low Quality
Information
just as Likely
to Go Viral
Source: Emergent.info
[FB shares]
Arizona State University Mining Misinformation in Social Media, November 21, 2017 45
Misinformation Spreader Detection
Arizona State University Mining Misinformation in Social Media, November 21, 2017 46
Misinformation in Social Media: An Example
Misinformation Spreader
Content of Misinformation
• Text
• Hashtag
• URL
• Emoticon
• Image
• Video (GIF)
Context of Misinformation
• Date, Time
• Location
Propagation of Misinformation
• Retweet
• Reply
• Like
Arizona State University Mining Misinformation in Social Media, November 21, 2017 47
Detecting Misinformation by Its Spreaders
Misinformation Spreader
• A large portion of OSN
accounts are likely to be
fake
– Facebook: 67.65 million –
137.76 million
– Twitter: 48 million
Arizona State University Mining Misinformation in Social Media, November 21, 2017 48
A Misinformation Spreader
• Misinformation spreaders: users that deliberately
spread misinformation to mislead others
A phishing link
to Twivvter.com
Arizona State University Mining Misinformation in Social Media, November 21, 2017 49
Types of Misinformation Spreaders
• Spammers
• Fraudsters
• Trolls
• Crowdturfers
• …
Misinformation
Fake
News Rumor
Spam
Spammer Fraudster
Clickbait
…
Arizona State University Mining Misinformation in Social Media, November 21, 2017 50
Features for Capturing a Spreader
• What can be used to detect a spammer?
– Profile
– Posts (Text)
– Friends (Network)
Profile Features
Post Features
Network Features
Arizona State University Mining Misinformation in Social Media, November 21, 2017 51
• Extracting features from a profile
– #followers, #followees
• E.g., small #followers  suspicious account
– Biography, registration time, screen name, etc.
Feature Engineering: Profile
Profile Features
Arizona State University Mining Misinformation in Social Media, November 21, 2017 52
• Extracting text features from user posts
– Text: BoW, TF-IDF, etc.
Feature Engineering: Text
…
A textual feature vector:
Arizona State University Mining Misinformation in Social Media, November 21, 2017 53
• Extracting network features
– Network: Adjacency matrix, number of follower,
follower/followee ratio, centrality
Feature Engineering: Network
Adjacency matrix:
Arizona State University Mining Misinformation in Social Media, November 21, 2017 54
Overview: Misinformation Spreader Detection
[1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007.
[2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in
Microblogging. In IJCAI.
[3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International
Conference on Neural Information Processing. Springer, Cham, 2017.
[4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling.
ICWSM 2017 (pp. 319-326).
[5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017.
[6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
Content
Network
Text Mining [1]
Content
Network
Camouflage
Content
Text + Graph Mining [2, 3]
Data
Methods Instance (Post/User)
Selection [4, 5, 6]
Arizona State University Mining Misinformation in Social Media, November 21, 2017 55
Supervised Learning: Content + Network
…
A textual feature vector:
Adjacency matrix:
Profile Features
Post Features
Network Features
• Features for supervised learning
– Text features
– Network features
Arizona State University Mining Misinformation in Social Media, November 21, 2017 56
Traditional Approach: Content Modeling
…
A textual feature vector:
Positive and negative
accounts can be
distinguished with text
: Coefficients to be estimated
• Supervised learning with text features
Arizona State University Mining Misinformation in Social Media, November 21, 2017 57
Traditional Approach: Network Modeling
…
A textual feature vector:
Adjacency matrix:
• Supervised learning with network features
Friends are likely to
have the same label
: Coefficients to be estimated
Arizona State University Mining Misinformation in Social Media, November 21, 2017 58
Emerging Challenge: Camouflage
• Content Camouflage
– Copy content from legitimate users
– Exploit compromised account
• Network Camouflage
– Link farming with other spreaders, bots
– Link farming with normal users
Arizona State University Mining Misinformation in Social Media, November 21, 2017 59
Challenge (I): Camouflage
• In order to avoid being detected:
– manipulate the text feature vector
• posting content similar to regular users’
– manipulate the adjacency matrix
• harvesting links with other users
Positive and negative
accounts can be
distinguished with text
Friends are likely to
have the same label
Arizona State University Mining Misinformation in Social Media, November 21, 2017 60
Challenge (II): Network Camouflage
• Heuristic-based methods
• #Followers
• Follower/followee ratio
• Anomaly detection
Arizona State University Mining Misinformation in Social Media, November 21, 2017 61
Challenge III: Limited Label Information
• Label a malicious account (positive)
– Suspended accounts
– Honeypot
• A set of bots created to lure other bots in the wild.
• Any user that follow them is a bot.
• The assumption is that normal users can easily
recognize them.
• Lack of labeled camouflage
Honeypots
Arizona State University Mining Misinformation in Social Media, November 21, 2017 62
Camouflage
• Prior assumptions:
– All suspended accounts are misinformation spreaders
– All posts of a spreader are malicious
• Selecting a subset of users for training
• Selecting a subset of posts for training
Positive and negative
accounts can be
distinguished with text
Friends are likely to
have the same label
Arizona State University Mining Misinformation in Social Media, November 21, 2017 63
Selecting Users for Training
Select a subset of
users for training
Evaluate with a
validation set
Update the
training set
• How to select the optimal set for training?
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 64
Relaxation I: Group Structure
• Assumption: malicious accounts
cannot join a legitimate community
– organize users in groups
– users in the same group should be
similarly weighted
𝑮 𝟏
𝟎
𝑮 𝟏
𝟏
𝑮 𝟑
𝟏
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4} {6, 7}{5} 𝑮 𝟐
𝟏
𝟓 6 7𝑮 𝟏
𝟐
{1, 2}
1 2
𝑮 𝟐
𝟐
{3, 4}
3 4
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 65
Relaxation II: Weighted Training
𝐰,𝐜
𝐦𝐢𝐧 ෍
𝒊=𝟏
𝑵
ci 𝐱𝐢 𝐰 − 𝐲𝒊
𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍
𝒊
𝒄𝒊 = 𝑲
0 < 𝒄𝒊< 1
+ λ2σi=0
d
σj=1
ni ||𝐜Gj
i||2
𝑔𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜𝑎𝑣𝑜𝑖𝑑 𝑜𝑣𝑒𝑟𝑓𝑖𝑡𝑡𝑖𝑛𝑔
𝑮 𝟏
𝟎
𝑮 𝟏
𝟏
𝑮 𝟑
𝟏
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4} {5} 𝑮 𝟐
𝟏
𝟓 6 7𝑮 𝟏
𝟐
{1, 2}
1 2
𝑮 𝟐
𝟐
{3, 4}
3 4
L1 norm on the
inter-group level
L2 norm on the
intra-group level
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
𝐜Gj
i: nodes of group j on layer i
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 66
Optimization
𝐰,𝐜
𝐦𝐢𝐧 ෍
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍
𝒊
𝒄𝒊 = 𝟏
ci: weight of i
𝐱i: an attribute vector of i
𝐰: coefficients of linear regression
𝐲𝒊: Label of instance i
||𝐰||2
2: avoiding overfitting
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
+ λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
N: number of instances
𝐜Gj
i: nodes of group j on layer i
𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜
𝐰
𝐦𝐢𝐧 ෍
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
Optimize w
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 67
Optimization
𝐰,𝐜
𝐦𝐢𝐧 ෍
𝒊=𝟏
𝒎
ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐
+ λ1||w||2
2
𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍
𝒊
𝒄𝒊 = 𝟏
ci: weight of i
𝐱i: an attribute vector of i
𝐰: coefficients of linear regression
𝐲𝒊: Label of instance i
||𝐰||2
2: avoiding overfitting
d: depth of hierarchy of Louvain method
ni: number of groups on layer i
+ λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
N: number of instances
𝐜Gj
i: nodes of group j on layer i
𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜
𝐰,𝐜
𝐦𝐢𝐧 ෍
𝒊=𝟏
𝒎
ci 𝑡𝒊 + λ2σi=0
d σj=1
ni ||𝐜Gj
i||2
Optimize c
Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 68
Experimental Results
1 Hu et al., “Social spammer detection in microblogging.”, IJCAI’13
2 Ye et al., “Discovering opinion spammer groups by network footprints.”, ECML-PKDD’15
Approaches Precision Recall F-Score
SSDM1 92.15% 92.00% 92.07%
NFS2 88.16% 65.67% 75.27%
SGASD 93.75% 96.92% 95.31%
• Collecting data with honeypots
– http://infolab.tamu.edu/data/
Tweets Users ReTweets Links Spammers
4,453,380 38,400 223,115 8,739,105 19,200
Arizona State University Mining Misinformation in Social Media, November 21, 2017 69
Content Camouflage
• Basic assumption for traditional methods:
– All content of a misinformation spreader is
malicious
• Content camouflage: posts of a misinformation
spreader may be legitimate
– Copy content from legitimate users
– Exploit compromised accounts
Arizona State University Mining Misinformation in Social Media, November 21, 2017 70
Content Camouflage: An Example
A normal post
A normal post
Arizona State University Mining Misinformation in Social Media, November 21, 2017 71
Challenge: Lack of Labeled Data
• Labels of camouflage are costly to collect
Arizona State University Mining Misinformation in Social Media, November 21, 2017 72
Learning to Identify Camouflage
• Assumption: posts of misinformation
spreaders are mixed with normal and
malicious.
• Introduce a weight for each post label.
• Select posts that distinguish between
misinformation spreaders and normal users.
Arizona State University Mining Misinformation in Social Media, November 21, 2017 73
Learning to Identify Camouflage: Formulation
Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 74
Experimental Results
• Findings:
– Sophisticated misinformation spreaders first
disguise, and then do harm.
• Results:
Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
Arizona State University Mining Misinformation in Social Media, November 21, 2017 75
Misinformation Spreader Detection
[1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007.
[2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in
Microblogging. In IJCAI.
[3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International
Conference on Neural Information Processing. Springer, Cham, 2017.
[4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling.
ICWSM 2017 (pp. 319-326).
[5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017.
[6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
ACM, 2016.
Content
Network
Text Mining [1]
Content
Network
Camouflage
Content
Text + Graph Mining [2, 3]
Data
Methods Instance (Post/User)
Selection [4, 5, 6]
Arizona State University Mining Misinformation in Social Media, November 21, 2017 76
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
Arizona State University Mining Misinformation in Social Media, November 21, 2017 77
Codes, Platforms and Datasets
Arizona State University Mining Misinformation in Social Media, November 21, 2017 78
Platforms
• TweetTracker: Detecting Topic-centric Bots
• Hoaxy: Tracking Online Misinformation
• Botometer: Detecting Bots on Twitter
Arizona State University Mining Misinformation in Social Media, November 21, 2017 79
Fact-Checking Websites
• Online Fact-checking websites
– PolitiFact: http://www.politifact.com/
– Truthy: http://truthy.indiana.edu/
– Snopes: http://www.snopes.com/
– TruthOrFiction: https://www.truthorfiction.com/
– Weibo Rumor: http://service.account.weibo.com/
• Volunteering committee
Arizona State University Mining Misinformation in Social Media, November 21, 2017 80
Code and Data Repositories
• Honeypot: http://bit.ly/ASUHoneypot
• Identification: https://veri.ly/
• Diffusion:
– Python Networkx: https://networkx.github.io/
– Stanford SNAP: http://snap.stanford.edu/
• Datasets
– http://socialcomputing.asu.edu/pages/datasets
– http://bit.ly/asonam-bot-data
– https://github.com/jsampso/AMNDBots
– http://carl.cs.indiana.edu/data/#fact-checking
– http://snap.stanford.edu/data/index.html
Arizona State University Mining Misinformation in Social Media, November 21, 2017 81
Book Chapters
• “Mining Misinformation in Social Media’’, Chapter 5
in Big Data in Complex and Social Networks
• http://bit.ly/2AYr5KM
• “Detecting Crowdturfing in Social Media’’,
Encyclopedia of Social Network Analysis and Mining
• http://bit.ly/2hE6LXE
Arizona State University Mining Misinformation in Social Media, November 21, 2017 82
Twitter Data Analytics
• Common tasks in mining Twitter Data.
– Free Download with Code & Data
– Collection
– Analysis
– Visualization
tweettracker.fulton.asu.edu/tda/
Arizona State University Mining Misinformation in Social Media, November 21, 2017 83
Social Media Mining
• Social Media Mining:
An Introduction – a textbook
• A comprehensive coverage of
social media mining techniques
– Free Download
– Network Measures and Analysis
– Influence and Diffusion
– Community Detection
– Classification and Clustering
– Behavior Analytics
http://dmml.asu.edu/smm/
Arizona State University Mining Misinformation in Social Media, November 21, 2017 84
Challenges in Dealing with Misinformation
• Large-scale
– Misinformation can be rampant
• Dynamic
– It can happen fast
• Deceiving
– Hard to verify
• Homophily
– Consistent with one’s beliefs
Arizona State University Mining Misinformation in Social Media, November 21, 2017 85
Q&A
• Liang Wu, Giovanni Luca Ciampaglia, Huan Liu
• {wuliang, huanliu}@asu.edu
• All materials and resources are available
online:
• Giovanni Luca Ciampaglia
• gciampag@indiana.edu
http://bit.ly/ICDMTutorial
Arizona State University Mining Misinformation in Social Media, November 21, 2017 86
Acknowledgements
• DMML @ ASU
• NaN @ IUB
• MINERVA initiative through the ONR N000141310835 on
Multi-Source Assessment of State Stability

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Sharing Uncertain Science: Mapping the Circulation of COVID-19 Preprint Resea...
Sharing Uncertain Science: Mapping the Circulation of COVID-19 Preprint Resea...Sharing Uncertain Science: Mapping the Circulation of COVID-19 Preprint Resea...
Sharing Uncertain Science: Mapping the Circulation of COVID-19 Preprint Resea...
 
Social Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)informationSocial Media and the News: Approaches to the Spread of (Mis)information
Social Media and the News: Approaches to the Spread of (Mis)information
 
Karin wahl jorgensen normalisation of surveillance ssn2016
Karin wahl jorgensen normalisation of surveillance ssn2016Karin wahl jorgensen normalisation of surveillance ssn2016
Karin wahl jorgensen normalisation of surveillance ssn2016
 
From Cable Niche to Social Media Success: International Engagement with Sky N...
From Cable Niche to Social Media Success: International Engagement with Sky N...From Cable Niche to Social Media Success: International Engagement with Sky N...
From Cable Niche to Social Media Success: International Engagement with Sky N...
 
Social media as echo chamber
Social media as echo chamberSocial media as echo chamber
Social media as echo chamber
 
Journalism fake news disinformation
Journalism fake news disinformationJournalism fake news disinformation
Journalism fake news disinformation
 
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
 
Covering Conspiracy: Mainstream and Fringe Reporting of the COVID/5G Conspira...
Covering Conspiracy: Mainstream and Fringe Reporting of the COVID/5G Conspira...Covering Conspiracy: Mainstream and Fringe Reporting of the COVID/5G Conspira...
Covering Conspiracy: Mainstream and Fringe Reporting of the COVID/5G Conspira...
 
Online Communities and Where to Find Them: Conceptual and Analytical Frameworks
Online Communities and Where to Find Them: Conceptual and Analytical FrameworksOnline Communities and Where to Find Them: Conceptual and Analytical Frameworks
Online Communities and Where to Find Them: Conceptual and Analytical Frameworks
 
Fake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sitesFake news and trust and distrust in fact checking sites
Fake news and trust and distrust in fact checking sites
 
Seen and heard ica presentation 5.22.15
Seen and heard ica presentation 5.22.15Seen and heard ica presentation 5.22.15
Seen and heard ica presentation 5.22.15
 
Are Filter Bubbles Real?
Are Filter Bubbles Real?Are Filter Bubbles Real?
Are Filter Bubbles Real?
 
The Conversation on Facebook: Patterns of Dissemination in Australia and Angl...
The Conversation on Facebook: Patterns of Dissemination in Australia and Angl...The Conversation on Facebook: Patterns of Dissemination in Australia and Angl...
The Conversation on Facebook: Patterns of Dissemination in Australia and Angl...
 
Algorithms & Analytics as Gatekeepers
Algorithms & Analytics as GatekeepersAlgorithms & Analytics as Gatekeepers
Algorithms & Analytics as Gatekeepers
 
Boston marathon bombings presentation
Boston marathon bombings presentationBoston marathon bombings presentation
Boston marathon bombings presentation
 
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
It's Not the Technology, Stupid: How the ‘Echo Chamber’ and ‘Filter Bubble’ M...
 
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
 
How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...How does fakenews spread understanding pathways of disinformation spread thro...
How does fakenews spread understanding pathways of disinformation spread thro...
 
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
From the Fringes to the Mainstream: How COVID-19 Conspiracy Theories Spread a...
 
Duty identity-credibility
Duty identity-credibilityDuty identity-credibility
Duty identity-credibility
 

Ähnlich wie ICDM 2017 tutorial misinformation

Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
archiejones4
 
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
Neil Foote
 

Ähnlich wie ICDM 2017 tutorial misinformation (20)

Fake news
Fake newsFake news
Fake news
 
Fake News, Real Teens: Problems and Possibilities
Fake News, Real Teens: Problems and PossibilitiesFake News, Real Teens: Problems and Possibilities
Fake News, Real Teens: Problems and Possibilities
 
Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
Weapons of-mass-distraction-foreign-state-sponsored-disinformation-in-the-dig...
 
Ransomware attacks ravage computer networks in dozens of
Ransomware attacks ravage computer networks in dozens ofRansomware attacks ravage computer networks in dozens of
Ransomware attacks ravage computer networks in dozens of
 
Media literacy panel
Media literacy panel Media literacy panel
Media literacy panel
 
Twitris in Action - a review of its many applications
Twitris in Action - a review of its many applications Twitris in Action - a review of its many applications
Twitris in Action - a review of its many applications
 
Online radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directionsOnline radicalisation: work, challenges and future directions
Online radicalisation: work, challenges and future directions
 
Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing Data Science Popup Austin: The Science of Sharing
Data Science Popup Austin: The Science of Sharing
 
Visual Literacy in the Context of Social Media, Michalle Gould
Visual Literacy in the Context of Social Media, Michalle GouldVisual Literacy in the Context of Social Media, Michalle Gould
Visual Literacy in the Context of Social Media, Michalle Gould
 
Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)Research seminar Queen Mary University of London (CogSci)
Research seminar Queen Mary University of London (CogSci)
 
5 ways social media can make news better
5 ways social media can make news better5 ways social media can make news better
5 ways social media can make news better
 
Social Media: the good, the bad and the ugly
Social Media: the good, the bad and the uglySocial Media: the good, the bad and the ugly
Social Media: the good, the bad and the ugly
 
CM2017conf Keynote Bronwyn Hemsley SOCIAL MEDIA and AAC
CM2017conf Keynote Bronwyn Hemsley SOCIAL MEDIA and AACCM2017conf Keynote Bronwyn Hemsley SOCIAL MEDIA and AAC
CM2017conf Keynote Bronwyn Hemsley SOCIAL MEDIA and AAC
 
Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...
Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...
Mathematical Models of the Spread of Diseases, Opinions, Information, and Mis...
 
The View from Here and Here: Making the Invisible Visible in the Hypertextual...
The View from Here and Here: Making the Invisible Visible in the Hypertextual...The View from Here and Here: Making the Invisible Visible in the Hypertextual...
The View from Here and Here: Making the Invisible Visible in the Hypertextual...
 
Who's Setting The Agenda?: People + the News and the Social Media Connection
Who's Setting The Agenda?: People + the News and the Social Media ConnectionWho's Setting The Agenda?: People + the News and the Social Media Connection
Who's Setting The Agenda?: People + the News and the Social Media Connection
 
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
Conspiracy Stories: Building Archives to Facilitate Narrative Analyses of Onl...
 
Understanding Public Service Media Value Beyond Audience Metrics: Influence, ...
Understanding Public Service Media Value Beyond Audience Metrics:Influence, ...Understanding Public Service Media Value Beyond Audience Metrics:Influence, ...
Understanding Public Service Media Value Beyond Audience Metrics: Influence, ...
 
Engagement in Social Media for Researchers: Using Twitter and Blogs
Engagement in Social Media for Researchers: Using Twitter and BlogsEngagement in Social Media for Researchers: Using Twitter and Blogs
Engagement in Social Media for Researchers: Using Twitter and Blogs
 
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
 

Kürzlich hochgeladen

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 

Kürzlich hochgeladen (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

ICDM 2017 tutorial misinformation

  • 1. Arizona State University Mining Misinformation in Social Media, November 21, 2017 1 Mining Misinformation in Social Media: Understanding Its Rampant Spread, Harm, and Intervention Liang Wu1, Giovanni Luca Ciampaglia2, Huan Liu1 1Arizona State University 2Indiana University Bloomington
  • 2. Arizona State University Mining Misinformation in Social Media, November 21, 2017 2 Tutorial Web Page • All materials and resources are available online: http://bit.ly/ICDMTutorial
  • 3. Arizona State University Mining Misinformation in Social Media, November 21, 2017 3 Introduction
  • 4. Arizona State University Mining Misinformation in Social Media, November 21, 2017 4 Definition of Misinformation • False and inaccurate information that is spontaneously spread. • Misinformation can be… – Disinformation – Rumors – Urban legend – Spam – Troll – Fake news – … https://www.kdnuggets.com/2016/08/misinformation-key-terms-explained.html
  • 5. Arizona State University Mining Misinformation in Social Media, November 21, 2017 5 Ecosystem of Misinformation Misinformation Fake News Rumors Spams Click- baits User User User User • Spreaders – Fabrication • Misinformation – Fake news – Rumor – … • Influenced Users – Echo chamber – Filter bubble Motivation Spammer Fraudster …
  • 6. Arizona State University Mining Misinformation in Social Media, November 21, 2017 6 Misinformation Ramification Top issues highlighted for 2014 • 1. Rising societal tensions in the Middle East and North Africa • 10. The rapid spread of misinformation online • 2. Widening income disparities • … • 3. Persistent structural unemployment 4.07 4.02 3.97 3.35 • Top 10 global risks – World Economic Forum
  • 7. Arizona State University Mining Misinformation in Social Media, November 21, 2017 7 Word of The Year • Macquarie Dictionary Word of the Year 2016 • Oxford Dictionaries Word of the Year 2016
  • 8. Arizona State University Mining Misinformation in Social Media, November 21, 2017 8 Social Media • Social media has changed the way of exchanging and obtaining information. • 500 million tweets are posted per day – An effective channel for information dissemination RenRenTwitter & Facebook
  • 9. Arizona State University Mining Misinformation in Social Media, November 21, 2017 9 Social Media: A Channel for Misinformation • False and inaccurate information is pervasive. • Misinformation can be devastating. – Cause undesirable consequences – Wreak havoc User User User User Echo Chamber: Misinformation can be reinforced Filter Bubble: Misinformation can be targeted
  • 10. Arizona State University Mining Misinformation in Social Media, November 21, 2017 10 Two Examples • PizzaGate – Fake News has Real Consequences – What made Edgar Maddison Welch “raid” a “pedo ring” on 12/1/2016? – All started with a post on Facebook, spread to Twitter and then went viral with platforms like Breitbart and Info-Wars • Anti-Vaccine Movement on Social Media: A case of echo chambers and filter bubbles – Peer-to-peer connection – Groups – Facebook feeds
  • 11. Arizona State University Mining Misinformation in Social Media, November 21, 2017 11 PizzaGate https://www.nytimes.com/interactive/2016/12/10/business/media/pizzagate.html •WikiLeaks began releasing emails of Podesta. •2 •Social media users on Reddit searched the releases for evidence of wrongdoing. •3 •Discussions were found that include the word pizza, including dinner plans. •4 •A participant connected the phrase “cheese pizza” to pedophiles (“c.p.” -->child pornography). •5 •Following the use of “pizza,” theorists focused on the Washington pizza restaurant Comet Ping Pong. •6 •The theory started snowballing, taking on the meme #PizzaGate. Fake news articles emerged. •7 •The false stories swept up neighboring businesses and bands that had played at Comet. Theories about kill rooms, underground tunnels, satanism and even cannibalism emerged. •8 •Edgar M. Welch, a 28-year-old from North Carolina, fired the rifle inside the pizzeria, and surrendered after finding no evidence to support claims of child slaves. •9 •The shooting did not put the theory to rest. Purveyors of the theory and fake news pointed to the mainstream media as conspirators of a coverup to protect what they said was a crime ring. Oct-Nov 2016 Nov 3rd, 2016 Nov 23rd, 2016 Dec 4th, 2016 2016-2017
  • 12. Arizona State University Mining Misinformation in Social Media, November 21, 2017 12 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  • 13. Arizona State University Mining Misinformation in Social Media, November 21, 2017 13 Overview of Today’s Tutorial • Introduction • Misinformation Detection • Misinformation in Social Media • Misinformation Spreader Detection • Resources 40 minutes 20 minutes 40 minutes 10 minutes 10 minutes
  • 14. Arizona State University Mining Misinformation in Social Media, November 21, 2017 14 Misinformation Detection
  • 15. Arizona State University Mining Misinformation in Social Media, November 21, 2017 15 Misinformation in Social Media: An Example
  • 16. Arizona State University Mining Misinformation in Social Media, November 21, 2017 16 Misinformation in Social Media: An Example
  • 17. Arizona State University Mining Misinformation in Social Media, November 21, 2017 17 Misinformation in Social Media: An Example
  • 18. Arizona State University Mining Misinformation in Social Media, November 21, 2017 18 Misinformation in Social Media: An Example Misinformation Spreader Content of Misinformation • Text • Hashtag • URL • Emoticon • Image • Video (GIF) Context of Misinformation • Date, Time • Location Propagation of Misinformation • Retweet • Reply • Like
  • 19. Arizona State University Mining Misinformation in Social Media, November 21, 2017 19 Overview of Misinformation Detection Misinformation Detection Content Context Propagation Early Detection Individual Message or Message Cluster + Supervised: Classification or Unsupervised: Anomaly Anomalous Time of Bursts Lack of Data Lack of Labels Who When How [1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011. [2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013). [3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." [4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. [5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015. [6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014. [7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018. [8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017. [9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016. [10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017. [1, 2, 3, 4] [5, 6] [7, 8] [7] [8] [9] [10]
  • 20. Arizona State University Mining Misinformation in Social Media, November 21, 2017 20 Feature Engineering on Content Text Feature Example Length of post #words, #characters Punctuation marks Question mark ? Exclamation! Emojis/Emoticons Angry face ;-L Sentiment Sentiment/swear/curse words Pronoun (1st, 2nd, 3rd) I, me, myself, my, mine URL, PageRank of domain Mention (@) Hashtag (#)
  • 21. Arizona State University Mining Misinformation in Social Media, November 21, 2017 21 Misinformation Detection: Text Matching • Text Matching – Exact matching – Relevance • TF-IDF • BM25 – Semantic • Word2Vec • Doc2Vec • Drawbacks – Low Recall Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings (2014). Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017. Fake News 1 Fake News 2 … Exact Duplication Similar Words Similar Representation Different Representation Relevance
  • 22. Arizona State University Mining Misinformation in Social Media, November 21, 2017 22 Misinformation Detection: Supervised Learning • Message-based – A vector represents a tweet • Message cluster-based – A vector represents a cluster of tweets • Methods – Random Forest – SVM – Naïve Bayes – Decision Tree – Maximum Entropy – Logistic Regression Individual Posts Clusters of Posts Picking Data Picking A Method Radom Forest SVM …
  • 23. Arizona State University Mining Misinformation in Social Media, November 21, 2017 23 Visual Content-based Detection • Diversity of Images Jin, Zhiwei, et al. "Novel visual and statistical image features for microblogs news verification." IEEE Transactions on Multimedia 19.3 (2017): 598-608. Texas Pizza Hut workers paddle through flood waters to deliver free pizzas by kayak There are sharks swimming in the streets of Houston during Hurricane Harvey
  • 24. Arizona State University Mining Misinformation in Social Media, November 21, 2017 24 References • Starbird, Kate, et al. "Rumors, false flags, and digital vigilantes: Misinformation on twitter after the 2013 boston marathon bombing." iConference 2014 Proceedings (2014). • Jin, Zhiwei, et al. "Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter." International Conference on Social Computing, Behavioral- Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, Cham, 2017. • Gupta, Aditi, and Ponnurangam Kumaraguru. "Credibility ranking of tweets during high impact events." Proceedings of the 1st workshop on privacy and security in online social media. ACM, 2012. • Yu, Suisheng, Mingcai Li, and Fengming Liu. "Rumor Identification with Maximum Entropy in MicroNet.“ • Yang, Fan, et al. "Automatic detection of rumor on sina weibo." Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 2012. • Zhang, Qiao, et al. "Automatic detection of rumor on social network." Natural Language Processing and Chinese Computing. Springer, Cham, 2015. 113-122. • Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Information credibility on twitter." Proceedings of the 20th international conference on World wide web. ACM, 2011. • Castillo, Carlos, Marcelo Mendoza, and Barbara Poblete. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013): 560-588. • Qazvinian, Vahed, et al. "Rumor has it: Identifying misinformation in microblogs." Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011. • Wu, Shu, et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. Text Matching Message- based Cluster- based
  • 25. Arizona State University Mining Misinformation in Social Media, November 21, 2017 25 Modeling Message Sequence • The chronological order of messages is ignored • Messages are generated as a temporal sequence – Modeling posts as documents ignores the structural information
  • 26. Arizona State University Mining Misinformation in Social Media, November 21, 2017 26 Modeling Post Sequence: Message-based • Message-based – Conditional Random Fields (CRF) Zubiaga, Arkaitz, Maria Liakata, and Rob Procter. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." Linear Chain CRF
  • 27. Arizona State University Mining Misinformation in Social Media, November 21, 2017 27 Modeling Post Sequence: Cluster-based • Message cluster-based – Recurrent Neural Networks Ma et al. "Detecting Rumors from Microblogs with Recurrent Neural Networks." IJCAI. 2016. Classifier Recurrent Neural Network Classifier Layer
  • 28. Arizona State University Mining Misinformation in Social Media, November 21, 2017 28 Personalized Misinformation Detection (PCA) • Detecting anomalous content of a user with PCA • Main assumption – Misinformation likely to be eccentric to normal content of a user • Detecting misinformation as content outliers – Tweet-based modeling – Measure distance between a new message with historical data Zhang, Yan, et al. "A distance-based outlier detection method for rumor detection exploiting user behaviorial differences." Data and Software Engineering (ICoDSE), 2016 International Conference on. IEEE, 2016. New Post New Post Historical Posts Measure Distance
  • 29. Arizona State University Mining Misinformation in Social Media, November 21, 2017 29 Personalized Misinformation Detection (Autoencoder) • Detecting anomalous content of a user with Autoencoder • Multi-layer Autoencoder – Train an autoencoder with historical data – To test a message: • Feed it to the autoencoder • Obtain the reconstructed data • Calculate distance between the original and the reconstruction Zhang, Yan, et al. "Detecting rumors on Online Social Networks using multi-layer autoencoder." Technology & Engineering Management Conference (TEMSCON), 2017 IEEE. IEEE, 2017.
  • 30. Arizona State University Mining Misinformation in Social Media, November 21, 2017 30 Detecting Misinformation with Context Context of Misinformation • Date, Time • Location
  • 31. Arizona State University Mining Misinformation in Social Media, November 21, 2017 31 Peak Time of Misinformation Misinformation on Twitter Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014.Friggeri et al. "Rumor Cascades." ICWSM. 2014. Misinformation on Facebook • Rebirth of misinformation – Misinformation has multiple peaks over time – True information has only one
  • 32. Arizona State University Mining Misinformation in Social Media, November 21, 2017 32 Detecting Misinformation with Propagation Propagation of Misinformation • Retweet • Reply • Like
  • 33. Arizona State University Mining Misinformation in Social Media, November 21, 2017 33 • Misinformation is spread by similar users – Bot army – Echo chamber of misinformation • Intuition: misinformation can be distinguished by who spreads it, and how it is spread • Challenges – Users may change accounts (bot army) – Data sparsity Detecting Misinformation with Propagation
  • 34. Arizona State University Mining Misinformation in Social Media, November 21, 2017 34 Detecting Misinformation with Propagation • User Embedding: • Message Classification Liang Wu, Huan Liu. “Characterizing Social Media Messages by How They Propagate." WSDM 2018. B C A D Users Embed User Representations Posts Networks Community B C A D Propagation Pathways Sequence Modeling A B C D A B D Classifier
  • 35. Arizona State University Mining Misinformation in Social Media, November 21, 2017 35 Key Issue for Misinformation Detection • April 2013, AP tweeted, Two Explosions in the White House and Barack Obama is injured – Truth: Hackers hacked the account – However, it tipped stock market by $136 billion in 2 minutes
  • 36. Arizona State University Mining Misinformation in Social Media, November 21, 2017 36 Early Detection of Misinformation: Challenges • Challenges of early detection –Message cluster based methods • Lack of data –Supervised learning methods • Lack of labels
  • 37. Arizona State University Mining Misinformation in Social Media, November 21, 2017 37 Early Detection Challenge I: Lack of Data • Lack of data • Early stage: few posts sparsely scattered • Most methods prove effective in a later stage Early Stage Later Stage
  • 38. Arizona State University Mining Misinformation in Social Media, November 21, 2017 38 Early Detection: Lack of Data • Linking scattered messages – Clustering messages – Merge individual messages • Hashtag linkage • Web Linkage Sampson et al. "Leveraging the implicit structure within social media for emergent rumor detection." CIKM 2016. Hashtag Web Link
  • 39. Arizona State University Mining Misinformation in Social Media, November 21, 2017 39 Early Detection Challenge II: Lack of Labels • Lack of labels – Traditional text categories • Articles within the same category share similar vocabulary and writing styles • E.g., sports news are similar to each other – Misinformation is heterogeneous • Two rumors are unlikely to be similar to each other Rumor about Presidential Election Rumor about Ferguson Protest
  • 40. Arizona State University Mining Misinformation in Social Media, November 21, 2017 40 Early Detection (II): Lack of Labels • Utilize user responses from prior misinformation – Clustering misinformation with similar responses – Selecting effective features shared by a cluster Post #1: “Can't fix stupid but it can be blocked” Post #2: “So, when did bearing false witness become a Christian value?” Post #3: “Christians Must Support Trump or Face Death Camps. Does he still claim to be a Christian?” Post #1: “i've just seen the sign on fb. you can't fix stupid” Post #2: “THIS IS PURE INSANITY. HOW ABOUT THIS STATEMENT” Post #3: “No Mother Should Have To Fear For Her Son's Life Every Time He Robs A Store” Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017
  • 41. Arizona State University Mining Misinformation in Social Media, November 21, 2017 41 Early Detection Results: Lack of Data • Effectiveness of linkage Classification without linkage Classification with hashtag linkage Classification with web linkage
  • 42. Arizona State University Mining Misinformation in Social Media, November 21, 2017 42 Early Detection Results: Lack of Labels • Effectiveness of linkage Effectiveness of different methods over time Results at an early stage (2 hours)
  • 43. Arizona State University Mining Misinformation in Social Media, November 21, 2017 43 Overview of Misinformation Detection Misinformation Detection Content Context Propagation Early Detection Individual Message or Message Cluster + Supervised: Classification or Unsupervised: Anomaly Anomalous Time of Bursts Lack of Data Lack of Label Who When How [1] Qazvinian et al. "Rumor has it: Identifying misinformation in microblogs." EMNLP 2011. [2] Castillo et al. "Predicting information credibility in time-sensitive social media." Internet Research 23.5 (2013). [3] Zubiaga et al. "Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media." [4] Wu et al. "Information Credibility Evaluation on Social Media." AAAI. 2016. [5] Wang et al. "Detecting rumor patterns in streaming social media." IEEE BigData, 2015. [6] Kwon et al. "Modeling Bursty Temporal Pattern of Rumors." ICWSM. 2014. [7] Wu et al. “Characterizing Social Media Messages by How They Propagate.“ WSDM 2018. [8] Ma et al. "Detect Rumors in Microblog Posts Using Propagation Structure via Kernel Learning." ACL 2017. [9] Sampson et al. "Leveraging the Implicit Structure within Social Media for Emergent Rumor Detection.“ CIKM 2016. [10] Wu et al. "Gleaning Wisdom from the Past: Early Detection of Emerging Rumors in Social Media.“ SDM 2017. [1, 2, 3, 4] [5, 6] [7, 8] [7] [8] [9] [10]
  • 44. Arizona State University Mining Misinformation in Social Media, November 21, 2017 44 Spread of Misinformation
  • 45. Mining Misinformation in Social Media Giovanni Luca Ciampaglia glciampaglia.com ICDM 2017, New Orleans, Nov 21, 2017
  • 46. ➢ What is Misinformation and Why it Spreads on Social Media ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  • 47. ➢ What is Misinformation and Why it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  • 48. Pheme ❖ Wartime studies, types of rumors (e.g., pipe dreams) [Knapp 1944, Allport & Pottsman, 1947] ❖ “Demand” for Improvised News [Shibutani, 1968] ❖ Two-step information diffusion [Katz & Latzarsfeld, 1955] ❖ Reputation exchange [Rosnow & Fine, 1976] ❖ Collective Sensemaking, Watercooler effect [Bordia & DiFonzo 2004] Swift is her walk, more swift her winged haste: A monstrous phantom, horrible and vast. As many plumes as raise her lofty flight, So many piercing eyes inlarge her sight; Millions of opening mouths to Fame belong, And ev'ry mouth is furnish'd with a tongue, And round with list'ning ears the flying plague is hung. Aeneid, Book IV Publij Virgilij maronis opera cum quinque vulgatis commentariis Seruii Mauri honorati gram[m]atici: Aelii Donati: Christofori Landini: Antonii Mancinelli & Domicii Calderini, expolitissimisque figuris atque imaginibus nuper per Sebastianum Brant superadditis, exactissimeque revisis atque elimatis, Straßburg: Grieninger 1502.
  • 49. Source: “Fake News. It’s Complicated”. First Draft News medium.com/1st-draft
  • 50. hoaxy.iuni.iu.edu Query: “three million votes illegal aliens”
  • 51. Echo chambers What is the role of online social networks and social media in fostering echo chambers, filter bubbles, segregation, polarization? Adamic & Glance (2005), [Blogs] Conover et al. (2011), [Twitter]
  • 52. Recap: What is misinformation and why it spreads ❖ Misinformation has always existed ❖ Social media disseminate (mis)information very quickly ❖ Echo chambers insulate people from fact-checking and verifications
  • 53. ➢ What is Misinformation and How it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  • 54. ❖ Compartmental models (SI, SIR, SIS, etc.) [Kermack and McKendrick, 1927] ❖ Rumor spreading models (DK, MT) [Daley and Kendall 1964, Maki 1973] ❖ Independent Cascades Model [Kempe et al., 2005] ❖ Threshold Model, Complex Contagion [Granovetter 1979, Centola 2010] Models of Information Diffusion Pi (m) ∝ Pi (m) ∝ f ( i ) f monotonically increasing Probability of adopting a “meme” at the i-th exposure
  • 55. Simple vs Complex Contagion Complex contagion: strong concentration of communication inside communities Simple contagion: weak concentration ❖ Most memes spread like complex contagion ❖ Viral memes spread across communities more like diseases (simple contagion) Weng et al. (2014) [Twitter]
  • 56. Weng et al. (2014), Nature Sci. Rep.
  • 57. Role of the social network and limited attention ❖ Spread among agents with limited attention on social network is sufficient to explain virality patterns ❖ Not necessary to invoke more complicated explanations based on intrinsic meme quality Weng et al. (2014), Nature Sci. Rep.
  • 58. Can the best ideas win? α{ P(m) ∝ (m) f fitness function
  • 59. Discriminative Power (Kendall’s Tau) When do the best ideas win? High Quality Low Quality
  • 60. Recap: Models of the Spread of Misinformation ❖ Simple vs complex contagion ❖ More realistic features ➢ Agents have limited attention ➢ Social network structure ➢ Competition between different memes ❖ Tradeoff between information quality and diversity
  • 61. ➢ What is Misinformation and Why it Spreads ➢ Modeling the Spread of Misinformation ➢ Open Questions ○ What techniques are used to boost misinformation? Introduction
  • 63. Bots are strategic Shao et al. 2017 (CoRR)
  • 64. Bots are effective Shao et al. 2017 (CoRR)
  • 65. Conclusions ❖ What is misinformation and why it spreads ➢ Online it spreads through a mix social, cognitive, and algorithmic biases. ❖ Models the spread of misinformation ➢ Social network structure, limited attention, and information overload make us vulnerable to misinformation. ❖ Open Questions: ➢ Bots are strategic superspreaders ➢ They are effective at spreading misinformation. ❖ Tools to detect manipulation of public opinions may be first steps toward a trustworthy Web.
  • 67. WMF Research Showcase, August 17, 2016 Giovanni Luca Ciampaglia gciampag@indiana.edu
  • 68. Recap: Open Questions ❖ Social Bots Amplify Misinformation ➢ Through social reinforcement ➢ Early amplification ➢ Target humans, possibly “influentials”
  • 69. afterbefore (days) Demand and Supply of Information 2012 London Olympics [Wikipedia] Ciampaglia et al., Sci. Rep. (2015) London England Usain Bolt Olympics Medal 2012 London Olympics
  • 70. Supply of and Demand for Information ❖ Production of information is associated to shifts in collective attention ❖ Evidence that attention precedes production ❖ Higher demand → higher price → more production Ciampaglia et al. Scientific Reports 2015 source: Wikipedia
  • 71. Predicting Virality Structural Trapping Social Reinforcement Homophily M1: random sampling model M2: random cascading model (structural trapping) M3: social reinforcement model (structural trapping + social reinforcement) M4: homophily model (structural trapping + homophily) SIMPLE Contagion COMPLEX Contagion Weng et al. (2014), [Twitter]
  • 72. Virality and Competition for Attention User Popularity # followers [Yahoo! Meme] Hashtag Popularity # daily retweets [Twitter] 2B Views 55M Followers
  • 73. Low Quality Information just as Likely to Go Viral Source: Emergent.info [FB shares]
  • 74. Arizona State University Mining Misinformation in Social Media, November 21, 2017 45 Misinformation Spreader Detection
  • 75. Arizona State University Mining Misinformation in Social Media, November 21, 2017 46 Misinformation in Social Media: An Example Misinformation Spreader Content of Misinformation • Text • Hashtag • URL • Emoticon • Image • Video (GIF) Context of Misinformation • Date, Time • Location Propagation of Misinformation • Retweet • Reply • Like
  • 76. Arizona State University Mining Misinformation in Social Media, November 21, 2017 47 Detecting Misinformation by Its Spreaders Misinformation Spreader • A large portion of OSN accounts are likely to be fake – Facebook: 67.65 million – 137.76 million – Twitter: 48 million
  • 77. Arizona State University Mining Misinformation in Social Media, November 21, 2017 48 A Misinformation Spreader • Misinformation spreaders: users that deliberately spread misinformation to mislead others A phishing link to Twivvter.com
  • 78. Arizona State University Mining Misinformation in Social Media, November 21, 2017 49 Types of Misinformation Spreaders • Spammers • Fraudsters • Trolls • Crowdturfers • … Misinformation Fake News Rumor Spam Spammer Fraudster Clickbait …
  • 79. Arizona State University Mining Misinformation in Social Media, November 21, 2017 50 Features for Capturing a Spreader • What can be used to detect a spammer? – Profile – Posts (Text) – Friends (Network) Profile Features Post Features Network Features
  • 80. Arizona State University Mining Misinformation in Social Media, November 21, 2017 51 • Extracting features from a profile – #followers, #followees • E.g., small #followers  suspicious account – Biography, registration time, screen name, etc. Feature Engineering: Profile Profile Features
  • 81. Arizona State University Mining Misinformation in Social Media, November 21, 2017 52 • Extracting text features from user posts – Text: BoW, TF-IDF, etc. Feature Engineering: Text … A textual feature vector:
  • 82. Arizona State University Mining Misinformation in Social Media, November 21, 2017 53 • Extracting network features – Network: Adjacency matrix, number of follower, follower/followee ratio, centrality Feature Engineering: Network Adjacency matrix:
  • 83. Arizona State University Mining Misinformation in Social Media, November 21, 2017 54 Overview: Misinformation Spreader Detection [1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. [2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in Microblogging. In IJCAI. [3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International Conference on Neural Information Processing. Springer, Cham, 2017. [4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017 (pp. 319-326). [5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017. [6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Content Network Text Mining [1] Content Network Camouflage Content Text + Graph Mining [2, 3] Data Methods Instance (Post/User) Selection [4, 5, 6]
  • 84. Arizona State University Mining Misinformation in Social Media, November 21, 2017 55 Supervised Learning: Content + Network … A textual feature vector: Adjacency matrix: Profile Features Post Features Network Features • Features for supervised learning – Text features – Network features
  • 85. Arizona State University Mining Misinformation in Social Media, November 21, 2017 56 Traditional Approach: Content Modeling … A textual feature vector: Positive and negative accounts can be distinguished with text : Coefficients to be estimated • Supervised learning with text features
  • 86. Arizona State University Mining Misinformation in Social Media, November 21, 2017 57 Traditional Approach: Network Modeling … A textual feature vector: Adjacency matrix: • Supervised learning with network features Friends are likely to have the same label : Coefficients to be estimated
  • 87. Arizona State University Mining Misinformation in Social Media, November 21, 2017 58 Emerging Challenge: Camouflage • Content Camouflage – Copy content from legitimate users – Exploit compromised account • Network Camouflage – Link farming with other spreaders, bots – Link farming with normal users
  • 88. Arizona State University Mining Misinformation in Social Media, November 21, 2017 59 Challenge (I): Camouflage • In order to avoid being detected: – manipulate the text feature vector • posting content similar to regular users’ – manipulate the adjacency matrix • harvesting links with other users Positive and negative accounts can be distinguished with text Friends are likely to have the same label
  • 89. Arizona State University Mining Misinformation in Social Media, November 21, 2017 60 Challenge (II): Network Camouflage • Heuristic-based methods • #Followers • Follower/followee ratio • Anomaly detection
  • 90. Arizona State University Mining Misinformation in Social Media, November 21, 2017 61 Challenge III: Limited Label Information • Label a malicious account (positive) – Suspended accounts – Honeypot • A set of bots created to lure other bots in the wild. • Any user that follow them is a bot. • The assumption is that normal users can easily recognize them. • Lack of labeled camouflage Honeypots
  • 91. Arizona State University Mining Misinformation in Social Media, November 21, 2017 62 Camouflage • Prior assumptions: – All suspended accounts are misinformation spreaders – All posts of a spreader are malicious • Selecting a subset of users for training • Selecting a subset of posts for training Positive and negative accounts can be distinguished with text Friends are likely to have the same label
  • 92. Arizona State University Mining Misinformation in Social Media, November 21, 2017 63 Selecting Users for Training Select a subset of users for training Evaluate with a validation set Update the training set • How to select the optimal set for training? Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  • 93. Arizona State University Mining Misinformation in Social Media, November 21, 2017 64 Relaxation I: Group Structure • Assumption: malicious accounts cannot join a legitimate community – organize users in groups – users in the same group should be similarly weighted 𝑮 𝟏 𝟎 𝑮 𝟏 𝟏 𝑮 𝟑 𝟏 {1, 2, 3, 4, 5, 6, 7} {1, 2, 3, 4} {6, 7}{5} 𝑮 𝟐 𝟏 𝟓 6 7𝑮 𝟏 𝟐 {1, 2} 1 2 𝑮 𝟐 𝟐 {3, 4} 3 4 Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  • 94. Arizona State University Mining Misinformation in Social Media, November 21, 2017 65 Relaxation II: Weighted Training 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝑵 ci 𝐱𝐢 𝐰 − 𝐲𝒊 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝑲 0 < 𝒄𝒊< 1 + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 𝑔𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜𝑎𝑣𝑜𝑖𝑑 𝑜𝑣𝑒𝑟𝑓𝑖𝑡𝑡𝑖𝑛𝑔 𝑮 𝟏 𝟎 𝑮 𝟏 𝟏 𝑮 𝟑 𝟏 {1, 2, 3, 4, 5, 6, 7} {1, 2, 3, 4} {5} 𝑮 𝟐 𝟏 𝟓 6 7𝑮 𝟏 𝟐 {1, 2} 1 2 𝑮 𝟐 𝟐 {3, 4} 3 4 L1 norm on the inter-group level L2 norm on the intra-group level d: depth of hierarchy of Louvain method ni: number of groups on layer i 𝐜Gj i: nodes of group j on layer i Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  • 95. Arizona State University Mining Misinformation in Social Media, November 21, 2017 66 Optimization 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝟏 ci: weight of i 𝐱i: an attribute vector of i 𝐰: coefficients of linear regression 𝐲𝒊: Label of instance i ||𝐰||2 2: avoiding overfitting d: depth of hierarchy of Louvain method ni: number of groups on layer i + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 N: number of instances 𝐜Gj i: nodes of group j on layer i 𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜 𝐰 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 Optimize w Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  • 96. Arizona State University Mining Misinformation in Social Media, November 21, 2017 67 Optimization 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝐱𝐢 𝐰 − 𝐲𝐢 𝟐 + λ1||w||2 2 𝐒𝐮𝐛𝐣𝐞𝐜𝐭 𝐭𝐨 ෍ 𝒊 𝒄𝒊 = 𝟏 ci: weight of i 𝐱i: an attribute vector of i 𝐰: coefficients of linear regression 𝐲𝒊: Label of instance i ||𝐰||2 2: avoiding overfitting d: depth of hierarchy of Louvain method ni: number of groups on layer i + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 N: number of instances 𝐜Gj i: nodes of group j on layer i 𝐺𝑟𝑜𝑢𝑝 𝐿𝑎𝑠𝑠𝑜 𝐰,𝐜 𝐦𝐢𝐧 ෍ 𝒊=𝟏 𝒎 ci 𝑡𝒊 + λ2σi=0 d σj=1 ni ||𝐜Gj i||2 Optimize c Wu et al. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017
  • 97. Arizona State University Mining Misinformation in Social Media, November 21, 2017 68 Experimental Results 1 Hu et al., “Social spammer detection in microblogging.”, IJCAI’13 2 Ye et al., “Discovering opinion spammer groups by network footprints.”, ECML-PKDD’15 Approaches Precision Recall F-Score SSDM1 92.15% 92.00% 92.07% NFS2 88.16% 65.67% 75.27% SGASD 93.75% 96.92% 95.31% • Collecting data with honeypots – http://infolab.tamu.edu/data/ Tweets Users ReTweets Links Spammers 4,453,380 38,400 223,115 8,739,105 19,200
  • 98. Arizona State University Mining Misinformation in Social Media, November 21, 2017 69 Content Camouflage • Basic assumption for traditional methods: – All content of a misinformation spreader is malicious • Content camouflage: posts of a misinformation spreader may be legitimate – Copy content from legitimate users – Exploit compromised accounts
  • 99. Arizona State University Mining Misinformation in Social Media, November 21, 2017 70 Content Camouflage: An Example A normal post A normal post
  • 100. Arizona State University Mining Misinformation in Social Media, November 21, 2017 71 Challenge: Lack of Labeled Data • Labels of camouflage are costly to collect
  • 101. Arizona State University Mining Misinformation in Social Media, November 21, 2017 72 Learning to Identify Camouflage • Assumption: posts of misinformation spreaders are mixed with normal and malicious. • Introduce a weight for each post label. • Select posts that distinguish between misinformation spreaders and normal users.
  • 102. Arizona State University Mining Misinformation in Social Media, November 21, 2017 73 Learning to Identify Camouflage: Formulation Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
  • 103. Arizona State University Mining Misinformation in Social Media, November 21, 2017 74 Experimental Results • Findings: – Sophisticated misinformation spreaders first disguise, and then do harm. • Results: Wu et al. Detecting Camouflaged Content Polluters. ICWSM 2017
  • 104. Arizona State University Mining Misinformation in Social Media, November 21, 2017 75 Misinformation Spreader Detection [1] Jindal, Nitin, and Bing Liu. "Review spam detection." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. [2] Hu, X., Tang, J., Zhang, Y. and Liu, H., 2013, August. Social Spammer Detection in Microblogging. In IJCAI. [3] Song, Yuqi, et al. "PUD: Social Spammer Detection Based on PU Learning.“ International Conference on Neural Information Processing. Springer, Cham, 2017. [4] Wu L, Hu X, Morstatter F, Liu H. Adaptive Spammer Detection with Sparse Group Modeling. ICWSM 2017 (pp. 319-326). [5] Wu, Liang, et al. "Detecting Camouflaged Content Polluters." ICWSM. 2017. [6] Hooi, Bryan, et al. "Fraudar: Bounding graph fraud in the face of camouflage." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. Content Network Text Mining [1] Content Network Camouflage Content Text + Graph Mining [2, 3] Data Methods Instance (Post/User) Selection [4, 5, 6]
  • 105. Arizona State University Mining Misinformation in Social Media, November 21, 2017 76 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  • 106. Arizona State University Mining Misinformation in Social Media, November 21, 2017 77 Codes, Platforms and Datasets
  • 107. Arizona State University Mining Misinformation in Social Media, November 21, 2017 78 Platforms • TweetTracker: Detecting Topic-centric Bots • Hoaxy: Tracking Online Misinformation • Botometer: Detecting Bots on Twitter
  • 108. Arizona State University Mining Misinformation in Social Media, November 21, 2017 79 Fact-Checking Websites • Online Fact-checking websites – PolitiFact: http://www.politifact.com/ – Truthy: http://truthy.indiana.edu/ – Snopes: http://www.snopes.com/ – TruthOrFiction: https://www.truthorfiction.com/ – Weibo Rumor: http://service.account.weibo.com/ • Volunteering committee
  • 109. Arizona State University Mining Misinformation in Social Media, November 21, 2017 80 Code and Data Repositories • Honeypot: http://bit.ly/ASUHoneypot • Identification: https://veri.ly/ • Diffusion: – Python Networkx: https://networkx.github.io/ – Stanford SNAP: http://snap.stanford.edu/ • Datasets – http://socialcomputing.asu.edu/pages/datasets – http://bit.ly/asonam-bot-data – https://github.com/jsampso/AMNDBots – http://carl.cs.indiana.edu/data/#fact-checking – http://snap.stanford.edu/data/index.html
  • 110. Arizona State University Mining Misinformation in Social Media, November 21, 2017 81 Book Chapters • “Mining Misinformation in Social Media’’, Chapter 5 in Big Data in Complex and Social Networks • http://bit.ly/2AYr5KM • “Detecting Crowdturfing in Social Media’’, Encyclopedia of Social Network Analysis and Mining • http://bit.ly/2hE6LXE
  • 111. Arizona State University Mining Misinformation in Social Media, November 21, 2017 82 Twitter Data Analytics • Common tasks in mining Twitter Data. – Free Download with Code & Data – Collection – Analysis – Visualization tweettracker.fulton.asu.edu/tda/
  • 112. Arizona State University Mining Misinformation in Social Media, November 21, 2017 83 Social Media Mining • Social Media Mining: An Introduction – a textbook • A comprehensive coverage of social media mining techniques – Free Download – Network Measures and Analysis – Influence and Diffusion – Community Detection – Classification and Clustering – Behavior Analytics http://dmml.asu.edu/smm/
  • 113. Arizona State University Mining Misinformation in Social Media, November 21, 2017 84 Challenges in Dealing with Misinformation • Large-scale – Misinformation can be rampant • Dynamic – It can happen fast • Deceiving – Hard to verify • Homophily – Consistent with one’s beliefs
  • 114. Arizona State University Mining Misinformation in Social Media, November 21, 2017 85 Q&A • Liang Wu, Giovanni Luca Ciampaglia, Huan Liu • {wuliang, huanliu}@asu.edu • All materials and resources are available online: • Giovanni Luca Ciampaglia • gciampag@indiana.edu http://bit.ly/ICDMTutorial
  • 115. Arizona State University Mining Misinformation in Social Media, November 21, 2017 86 Acknowledgements • DMML @ ASU • NaN @ IUB • MINERVA initiative through the ONR N000141310835 on Multi-Source Assessment of State Stability