The document discusses automatically identifying Islamophobia in social media text. It begins by introducing the speaker and their areas of research, including hate speech detection. It then provides background on Islamophobia, discussing its origins and definitions. The remainder of the document outlines a project to collect and annotate Twitter data containing mentions of Ilhan Omar to detect Islamophobic sentiment, discussing the pilot annotation process and lessons learned.
Automatically Identifying Islamophobia in Social Media
1. Automatically Identifying Islamophobia
in Social Media
Ted Pedersen
Department of Computer Science
University of Minnesota, Duluth
tpederse@umn.edu
@SeeTedTalk
http://www.d.umn.edu/~tpederse
2. My NLP Areas
Word Sense Disambiguation and Discrimination
Semantic Similarity and Relatedness
Text Classification / Sentiment Analysis
Humor Recognition
Hate Speech Detection (Words lead to deeds)
Islamophobia and NLP blog
Ethics and NLP
3. Today’s Agenda
Islamophobia in General
Islamophobia in Minnesota
Connections to Hate Speech Detection
Collecting and Annotating Twitter Data
Lessons Learned & the Way Forward
4. Islamophobia
A legacy of colonial histories,
particularly those that view the
Muslim world as exotic, savage,
dangerous, and/or desperate for
a “Clash of Civlilizations.”
Orientalism (Said, 1978)
5. Islamophobia
A recent term for an older phenomena
Runnymede Trust (1997, 2017)
Unfounded hostility towards Islam
Practical consequences of such hostility
in unfair discrimination against Muslim
individuals and communities
Exclusion of Muslims from mainstream
political and social affairs.
6. Islamophobia
Anti-Muslim racism
Are Muslims a race? No.
Race is constructed, marginalized groups are often racialized and thereby
assumed to share certain inherent features (often seen as limitations)
“Racism is the fatal coupling of power and difference that creates a
vulnerability to premature death.” Ruha Benjamin / Ruth Wilson Gilmore
Intersection of religion, ethnicity, race, gender, immigration status, …
7. Common Anti-Muslim Tropes (Bridge Institute)
Islam and Muslims are inherently
violent.
Islam and Muslims are oppresive
to women.
Islam and Muslims are intolerant
towards other religions.
Islam is a political ideology, not
a religion.
In the West, Mulims are using
non-violent stealth jihad to
implement Sharia Law.
Islam is foreign, medieval, and
at odds with Western modernity.
Islam is a monolith.
All Muslims are Arab or Brown.
14. Goal : Identify Islamophobia in Written Text
Why?
Relatively understudied form of Hate Speech in NLP.
Highly intersectional problem since Muslim identity is multi-faceted.
Significant influence on events in the World, the USA, and Minnesota.
How?
Use ideas from NLP, especially Hate Speech Detection.
Create annotated corpora in order to understand problem better, and then
apply Machine Learning or Deep Learning.
15. What is Hate Speech?
It exists along a spectrum of language :
Ordinary … Offensive ... Hate Speech ... Dangerous Speech
It takes many forms :
Insults, Profanity, Bullying, Harassment, Attacks, Abuse, Threats, …
It has many targets :
Racism, Sexism, Anti-Semitism, Islamophobia, …
Hate speech seeks to deny a person or group the right to exist in the future.
16. Growth of Hate Speech Detection I
The detection of offensive and abusive language and hate speech is a problem of
growing interest. Work began to really accelerate around 2015, perhaps given the
increasingly apparent problem that exists on social media platforms.
Abusive Language Workshop (2017, 2018, 2019), as of 2020 now known as
WOAH : Workshop on Online Abuse and Harms (Nov 20, EMNLP)
OffensEval (2019, 2020) Detecting Offensive Language (Dec 12-13, COLING)
SemEval 2019 Detecting Hate Speech Against Women and Immigrants
PAN 2021 Profiling Hate Speech Spreaders on Twitter (upcoming)
17. Growth of Hate Speech Detection II
Schmidt and Wiegand (2017) : A Survey on Hate Speech Detection using Natural
Language Processing, SocialNLP
Fortuna and Nunes (2018) : A Survey on Automatic Detection of Hate Speech in
Text, ACM Computing Surveys
Alan Turing Institute Hate Speech Hub and Reading List for Online Hate & Abuse
Hate Speech Data : more than 60 hate speech data sets in 15 languages
Workshops, shared tasks, survey papers, resources are all very positive. But …
18. Limitations of Hate Speech Detection I
The rise of shared tasks and data sets makes it possible to work on Hate Speech
without thinking too hard about the problem or the people. It’s all just data.
Keyword detection is easy to game and prone to false negatives / positives.
Gröndahl et al. 2018, All You Need is "Love": Evading Hate Speech Detection
There is no standard set of classes in which to categorize hate speech. Offensive,
profane, targeted, abusive, racist, threatening, strong, weak, implicit, explicit …
Hate Speech varies depending on the target, the speaker, and local context.
19. Limitations of Hate Speech Detection II
Low Annotator Agreement :
Ross et al (2016) : Measuring the Reliability of Hate Speech Annotations: The
Case of the European Refugee Crisis, NLP4CMC
Waseem et al (2016) : Are You a Racist or Am I Seeing Things? Annotator
Influence on Hate Speech Detection on Twitter. NLP+CSS.
Racial Bias :
Sap et al (2019) : The Risk of Racial Bias in Hate Speech Detection. ACL
Davidson et al (2019) : Racial Bias in Hate Speech and Abusive Language
Detection Datasets. ALW.
20. The Way Forward
Hate Speech Detection is not Just Another Classification Task. Seek out domain
expertise, build relationships, don’t reduce the problem to a data set.
Frey et al. (2018). Artificial Intelligence and Inclusion: Formerly Gang-Involved
Youth as Domain Experts for Analyzing Unstructured Twitter Data. Social
Science Computer Review.
Creating annotated data is likely necessary. Be careful to fully document the
decisions made along the way, paying special attention to annotator background.
Bender and Friedman (2018) Data Statements for NLP : Toward Mitigating
Systems Bias and Enabling Better Science. TACL.
21. How Can We Detect Islamophobia (with NLP)?
Carry out a Qualitative Analysis of text with input from domain experts.
Collect and annotate Tweets.
Seek out a diverse pool of annotators.
Develop annotation scheme / code book.
Be Iterative.
Carry out Quantitative Analysis using Machine Learning or Deep Learning.
22. Data Collection
Islamophobia is global, but has many local variations each with their own issues,
terminology, and ways of being expressed.
This suggests the need for the data to have a regional focus - Islamophobia in the
UK, France, India, the USA, Minnesota, etc.
While she is a national figure, Ilhan Omar is from Minnesota, and our data
collection starts with her.
Muslim, but also a black Somali woman who was an immigrant / refugee.
Highly intersectional identity.
23. Tweet Collection (using Twitter public API)
Collecting since April 2019, any tweet that includes one or more of :
‘Ilhan omar’, ilhan, omar, @ilhanmn, ilhanmn, #ilhanmn, #ilhanomar, #ilhan
Pilot Annotation based on April 2019 - April 2020, approx 5 million total tweets.
1020 Annotation based on Nov. 2019 - Oct. 2020, approx 10 million total tweets.
Twitter public API does not give you all tweets, downsamples.
24. Pilot Annotation
Data Statement Ilhan Omar
Islamophobia Data Set, created
during LREC 2020 Data Statements
workshop (May 11-13, 2020)
~5 million tweets from April 2019 -
April 2020. Selected those with
muslim, islam, quran, or koran
(220,000). Drew random subsets of
100 for pilot annotation. Low
annotator agreement.
Traitor/Not Loyal - Muslims are not loyal
and and beholden to some external
organization or government (potential
overlap with Terrorist, Sharia Law)
Terrorist/Sympathizer - Muslims are either
terrorists themselves or support those who
are. (potential overlap with Traitor)
False Religion - Islam is a false religion with
strange, primitive, evil practices.
Sharia Law - Muslims want to replace the
existing legal system with Sharia Law.
(potential overlap with Traitor).
25. 1020 Annotation (October 2020)
9.6 million tweets (incl. RT) collected Nov 2019 - Oct 2020.
1 million unique tweets.
Selected random samples of 384 tweets for annotation.
Agreement improved with more extensive set of labels.
Began to consider profile descriptions of “speakers” (Tweeters).
26. 1020 Annotation Labels
Neutral - apolitical or about someone
other than Ilhan Omar
Support - expresses support for position
or person of Ilhan Omar
Political - expression of political
difference of opinion with Ilhan Omar
Insult - personal insult directed at Ilhan
Omar not related to other labels
Immigration - Ilhan Omar has committed
fraud to remain in USA
Terrorist - Ilhan Omar is a terrorist or
supports them
Loyalty - Ilhan Omar is unAmerican,
disployal, or a traitor
Jail - Ilhan Omar should be prosecuted,
convicted, or incarcerated
Sharia - Ilhan Omar wants to replace US
law with Sharia Law
Adultery - Ilhan Omar is an adulterer or
married to her brother
31. 1 grams (muslim,islam,quran) in all 1020 Tweets
muslim (14,791), muslims (4,849), islamic (3,827),
islam (3,302,), islamist (1650), islamophobia (607),
islamists (600), quran (580), islamophobic (553),
congressmuslim (435)
32. 2 grams
a muslim (2,446), muslim brotherhood (1,631), the
muslim (1,440), islamic terrorist (594), anti muslim
(591), radical muslim (512), muslims in (483), muslim
woman (477), radical islamic (427), islam is (412)
33. 3 grams
the muslim brotherhood (624), is a muslim (488),
congressmuslim ilhan omar (376), as a muslim
(285), a muslim american (217), muslim ilhan omar
(197), muslim american trump (195), of the muslim
(192), a radical muslim (181), muslim anti
immigrant (181)
34. 4 grams
as a muslim american (198),a muslim american trump
(195),muslim american trump admirer (191),ahmed as
a muslim (183),muslim anti immigrant anti (175),she is
a muslim (171),somali congressmuslim ilhan omar
(166),omar is a muslim (151),muslim brotherhood ilhan
omar (136),muslim refugee dalia al (119)
35. as a muslim american trump (193), a muslim american
trump admirer (191), muslim american trump admirer i
(187), ahmed as a muslim american (182), muslim anti
immigrant anti black (152), qanta ahmed as a muslim
(144), icg obama isis soros muslim (117), obama isis soros
muslim brotherhood (117), isis soros muslim brotherhood
ilhan (116),omar and the progressive islamist (115)
5 grams
37. 1 grams (anti, treason, traitor, hates) in all Tweets
anti (22,403), treason (6,435), traitor
(6,324), hates (5,557), antifa (2,634),
antisemitic (1,892), treasonous (1,707),
traitors (1,702), antisemite (1,647),
antisemitism (1,308)
38. 2 grams
anti american (6455), anti semitic (3860),
anti semite (2554), a traitor (2440), for
treason (2326), she hates (2077), hates
america (2064), anti semitism (1986), an anti
(1582), is anti (1192)
39. 3 grams
is a traitor (1,065), she hates america (861),
an anti semite (661), a traitor to (570), is an
anti (564), this anti american (553), ilhan omar
hates (535), of anti semitism (514), is anti
american (478), an anti american (475)
40. 4 grams
anti semite ilhan omar (434), she is a traitor (421),
omar is a traitor (329), be hanged for treason
(311), omar is an anti (305), is a traitor to (299),
accused of anti semitism (261), ilhan omar hates
america (253), be tried for treason (246), account
suspended over treason (238)
41. 5 grams
after account suspended over treason (238), should be
hanged for treason (238), petition to demand this antisemite
(235), demand this antisemite terrorist sympathizer (235), to
demand this antisemite terrorist (235), account suspended
over treason tweet (230), this antisemite terrorist sympathizer
ilhan (228), antisemite terrorist sympathizer ilhan omar (219),
anti semite of the year (218), be hanged for treason if (216)
42. Most frequent 2 grams in (re)Tweeter profiles
#maga #kag (25416), trump supporter (18969), trump 2020 (14241), president
trump (13951), husband father (12562), pro life (11502), happily married (10383),
god family (9690), proud american (9281), god bless (9100), wife mother (8487),
lives matter (7699), love god (7609), wife mom (6833), #maga #trump2020 (6799),
maga kag (6195), jesus christ (6187), christian conservative (6103), #kag
#trump2020 (6096), family country (5749), business owner (5733), american
patriot (5055), bless america (4916), common sense (4672), #trump2020 #maga
(4478), black lives (4230), truth seeker (4138), conservative christian (4132),
father husband (3991), donald trump (3931), constitutional conservative (3908),
united states (3884), 2nd amendment (3841), mother grandmother (3811),
america great (3801), #maga #wwg1wga (3725), army veteran (3486), human
rights (3419), dog lover (3414), #wwg1wga #maga (3112), free speech (3044)
43. 1020 Annotation Labels
Neutral - apolitical or about someone
other than Ilhan Omar
Support - expresses support for position
or person of Ilhan Omar
Political - expression of political
difference of opinion with Ilhan Omar
Insult - personal insult directed at Ilhan
Omar not related to other labels
Immigration - Ilhan Omar has committed
fraud to remain in USA
Terrorist - Ilhan Omar is a terrorist or
supports them
Loyalty - Ilhan Omar is unAmerican,
disployal, or a traitor
Jail - Ilhan Omar should be prosecuted,
convicted, or incarcerated
Sharia - Ilhan Omar wants to replace US
law with Sharia Law
Adultery - Ilhan Omar is an adulterer or
married to her brother
44.
45. Lessons Learned
Impact of “lock her up” and “send her back” rhetoric clearly seen in annotation.
Annotation labels must be nuanced, can’t simply label as Islamophobic or not
since hateful comments may be based on gender, race, immigration or marital
status, political beliefs in addition to or instead of religion.
A highly visible or politicized personality attracts a lot of repetitive and viral content
based on most recent accusation or conspiracy.
Profile descriptions are indicative of certain types of hateful content.
46. Current Questions
Are there correlations between public events and hateful tweet activity?
What is the impact of Tweeter location on hateful tweet activity?
Are less prominent public figures who are Muslim targeted in the same way?
Are political figures who are known to be Christian, Jewish, Hindu, and other
religions targeted to greater or lesser extents?
Can crowdsourcing be effective for more nuanced annotation problems?
47. The Way Forward
Hate Speech and Islamophobia should not be reduced to data points
Do not treat these as Just Another Classification Task
Don’t rush annotations, don’t rush to train, test, and report F-scores.
Learn the domain. Consult domain experts. Train annotators carefully.
Annotation is a great opportunity to build relationships.
Must go beyond the text to consider the characteristics of the target, the speaker,
and the context in which the speech occurs.
48. Automatically Identifying Islamophobia
in Social Media
Ted Pedersen
Department of Computer Science
University of Minnesota, Duluth
tpederse@umn.edu
@SeeTedTalk
http://www.d.umn.edu/~tpederse
53. 2 grams
sharia law (2409), under sharia (187),
wants sharia (178), for sharia (178), the
sharia (153), of sharia (137), to sharia
(129), in sharia (126), and sharia (102),
with sharia (99)
54. 3 grams
sharia law in (180), under sharia law (139),
wants sharia law (135), sharia law is (122),
sharia law and (101), in sharia law (98), for
sharia law (94), she wants sharia (90),
sharia law she (81), of sharia law (80)
55. 4 grams
she wants sharia law (68), omar calls for sharia (50),
ilhan omar suggests sharia (49), calls for sharia
flogging (48), for sharia flogging of (47), sharia
flogging of critics (46), sharia flogging for critics (44),
suggests sharia flogging for (42), omar suggests
sharia flogging (41), sharia law in the (38)
56. 5 grams
calls for sharia flogging of (47), ilhan omar calls for sharia
(47), omar calls for sharia flogging (46), for sharia flogging
of critics (45), sharia flogging of critics with (45), sharia
flogging for critics who (44), suggests sharia flogging for
critics (41), omar suggests sharia flogging for (41), ilhan
omar suggests sharia flogging (35), ilhan omar wants
sharia law (23)
58. 2 grams
a terrorist (4,282), muslim brotherhood
(1,631), the terrorist (1,151), this terrorist
(951), terrorist sympathizer (861), domestic
terrorist (752), terrorist and (669), islamic
terrorist (594), al qaeda (594), jihad rep (589)
59. 3 grams
is a terrorist (1,781), the muslim brotherhood
(624), jihad rep ilhan (561), terrorist ilhan omar
(365), a terrorist and (325), a domestic terrorist
(303), a terrorist sympathizer (268), terrorist
sympathizer ilhan (259), antisemite terrorist
sympathizer (235), this antisemite terrorist (235)
60. 4 grams
omar is a terrorist (774), jihad rep ilhan omar (558),
she is a terrorist (477), terrorist sympathizer ilhan omar
(249), this antisemite terrorist sympathizer (235),
demand this antisemite terrorist (235), antisemite
terrorist sympathizer ilhan (228), is a terrorist and
(188), she ‘s a terrorist (187), is a domestic terrorist
(159)
61. 5 grams
ilhan omar is a terrorist (484), demand this antisemite terrorist
sympathizer (235), to demand this antisemite terrorist (235),
this antisemite terrorist sympathizer ilhan (228), antisemite
terrorist sympathizer ilhan omar (219), terrorist sympathizer
ilhan omar u(2019)s (215), jihad rep ilhan omar praises (149),
evil jihad rep ilhan omar (148), for killing top iranian terrorist
(144), pure evil jihad rep ilhan (128)
62. Prolific (re)Tweeters in all Tweets
Founder & President @TrumpStudents (127,578), Founder and Co-Chairman of
@TrumpStudents (75,769), GOP Candidate Running Against Ilhan Omar in MN
(75,299), Republican Candidate Running Against Ilhan Omar in MN5 (60,527),
Founder & Co-Chairman @TrumpStudents (59,357), Founder & President of
@TPUSA Chairman of @TrumpStudents (55,581) , Blank (53,060), Republican
Candidate for Congress in MN5 (48,236), Republican-Endorsed Candidate
Running Against Ilhan Omar in MN5 (46,192), JudicialWatch (42,067), Father and
Former Candidate for Florida's 3rd Congressional District (40,451), Republican
Candidate Running Against Ilhan Omar in MN5 (34,889), Sean Hannity (29,355),
45th President of the United States of America (29,255), Mom, Refugee,
Intersectional Feminist, 2017 Top Angler of the Governor's Fishing Opener and
Congresswoman for #MN05 (27,888).