Sikar Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Understanding the world with NLP: interactions between society, behaviour and social media
1. Understanding the world with NLP:
interactions between society, behaviour and
social media
Dr. Diana Maynard
University of Sheffield, UK
27 February 2019, POLIMI, Milan
7. Men or women?
• 24% of all internet male users use Twitter
• 21% of all internet female users use Twitter
Out of all Twitter users (in 2019):
• 34.5% are women
• 65.5% are men
8. How old are twitter users?
• Out of all twitter users, which age range has the
higher percentage?
16. Top 10 most followed Twitter users
2017 2015 2013
Katy Perry Katy Perry Katy Perry
Justin Bieber Justin Bieber Justin Bieber
Barack Obama Taylor Swift Lady Gaga
Taylor Swift Barack Obama Barack Obama
Rihanna Youtube Taylor Swift
Ellen de Generes Lady Gaga YouTube
Lady Gaga Rihanna Britney Spears
Youtube Ellen de Generes Rihanna
Justin Timberlake Twitter Instagram
Twitter Justin Timberlake Justin Timberlake
17. Social media: a valuable source of information
(not just stupid stuff about pop stars)
l business insights
l sharing and receiving news
l campaigns
l all kinds of collective intelligence
l an alternative to traditional polls
l and much more
18. Why is social media interesting for NLP?
l Fast-growing, highly dynamic and high volume
source of data – big data!
l Reflects language used in today's society
l Reflects current views of society
l It's a great source of material for opinion mining
tools
l Challenging research area due to specialised use of
language
19. Gartner 3V definition of Big Data
• Volume
• Velocity
• Variety
• +Veracity
• High volume & velocity of messages:
• 500 million tweets per day
• Massive variety:
• Stock markets
• Earthquakes
• Social arrangements
20. Big Data is not new!
Staff sorting 4M used tickets from #London Underground
to analyse line use in 1939
21. Linguistic challenges of social media
• Language
• Problem: typically exhibits very different language style
• Solution: train specific language processing components
• Relevance
• Problem: topics and comments can rapidly diverge.
• Solution: train a classifier or use clustering techniques
• Lack of context
• Problem: hard to disambiguate entities
• Solution: data aggregation, metadata, entity linking
22. People don’t write “properly”
l Grundman:politics makes #climatechange scientific issue,people
don’t like knowitall rational voice tellin em wat 2do
l Want to solve the problem of #ClimateChange? Just #vote for a
#politician! Poof! Problem gone! #sarcasm #TVP #99%
l Human Caused #ClimateChange is a Monumental Scam!
http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!!
Lying to us like MOFO's Tax The Air We Breath! F**k Them!
l The last people I will listen2 about guns r those that know
nothing about them&politicians who live in states w/strictest
gun laws #cali #ny
24. Named Entity Recognition and Linking
NER (Professor Plum)
dbpedia.org/resource/.....
Michael_Jackson
Michael_Jackson_(writer)
Entity Linking
25. Pipelines for tweets
Errors have a cumulative effect
Good performance is important at each stage
Per-stage
Overall
26. Understanding tweets is hard even
for people
Branching out from Lincoln park after dark ... Hello
Russian Navy, it's like the same thing but with glitter!
??
27. It’s all about the Named Entities
Branching out from Lincoln park after dark ... Hello
Russian Navy, it's like the same thing but with glitter!
29. 2
9
Ecuador, 7.8 earthquake , April 2017, ~700 people
die
Droughts, affecting 60 million in 34 countries
Maxwell, California, Feb 2017
Portugal, forest fires, 64 confirmed deaths, Jun 2017
Manchester, May 2017, 22 dead
Haiti, Hurricane Matthew, Oct 2016,
~500 people died, farming devastated
31. Uses of social media during disasters
• Broadcasting info about the disaster
• Requesting info from local people and
eyewitnesses
• Requesting and offering help and
support
• Disaster mapping
• Mobilising the crowd to support
initiatives
32. How?
• Using citizen
reporters, and digital
responders for
mapping crises
• Ushahidi deployed
over 50k times
• Free and open source
• Working with us on
the COMRADES
project
36. Behaviour Analysis
• Based on the assumption that users in different behavioural
stages communicate differently (different emotions,
directives, etc.)
Pajarito @lindopajarito . 2h
Our building needs 40% of all energy consumed in Switzerland! L
DJPajarito @DJPajaritoGenial . 12h
I'm so proud when I remember to save energy and I know
however small it's helping.
Desirability: Negative sentiment (expressing personal
frustration- anger/sadness)
Buzz: Positive sentiment (happiness/joy). I/we + present tense
HotelPajarito @HotelPajarito . 18h
Join us today today to switch of a light for EH! J
Invitation: Positive sentiment (happy) + use of vocatives
37. • How do people talk about elections and political
events?
• How do the MPs talk about different topics?
• How does the public respond to them?
Social media and politics
40. Parties / themes co-occurrence
UK economy
Europe
Tax and revenue
NHS
Borders and Immigration
Scotland
Employment
Community and society
Public health
Media and communications
LabourPartyCandidate
LabourPartyMP
ConservativePartyCandidate
ConservativePartyMP
UKIPCandidate
OtherMP
SNPCandidate
GreenPartyCandidate
LiberalDemocratsCandidate
SNPOther
41. Twits, twats and twaddle:
analysis of hate speech towards politicians
42. Online abuse
• Puts people off debating online
• Puts people off becoming
politicians
• Seems to be getting worse
• Might be particularly bad for
particular groups (females, ethnic
minorities, LGBT etc)
"I am seriously considering
whether or not to stand
next time"
"My staff try not to let
me go out alone"
"Misogynist comments,
sexual abuse … My
children saw this"
"death threats"
43. • Let’s find all the “hate” terms in the messages and
look at who they’re directed at
• Then we can see who gets the most, and how this
changes over time
• Plug in a swear word lexicon, do a bit of number
crunching, and then head off to a desert island for a
nice holiday
How do we identify hate speech?
52. • Who is being abused?
• Who is abusing them?
• What is the abuse about?
• Is it really getting worse?
Aims of the Analysis
53. • Collect tweets to and from politicians in the run-up
to the 2015 and 2017 UK elections
• Annotate all the interesting information (who, what,
when, where) with the social media toolkit
• Run an abuse classifier
• Analyse the results
Plan
67. Finding abusive terms
nigger
witch
homo
God botherer
• 404 abusive terms collected
• But only annotated when
used in specific situations
shut up
f**k you
Uncivil language
idiot
kill
Threats die
Obscene nouns
cunt
twat
rape
Racist and
bigoted language
69. Did the abuse get worse?
There was more abuse in 2017 than in 2015
20172015
70. Who got the abuse in 2015?
• Men got more abuse then women
• Conservatives got more abuse than Labour
71. Who got the abuse in 2015?
A small number of
prominent MPs
72. What about in 2017?
The same thing happened (but to different people)
73. Check out the interactive version!
http://demos.gate.ac.uk/politics/buzzfeed/sunburst.html
74. Let’s look a bit deeper
• Did men get more abuse
because more men are
Conservatives?
• Did more prominent
people get more abuse
just because they got
more tweets?
• Structural equation
modelling using the
Lavaan package in R
75. • Analysing social media is super interesting
• But super tricky (even for humans)
• Need good underlying language analysis tailored to
social media, before attempting anything else
• Remember that correlation ≠ causation
Conclusions
76. The advertising bit
• Come on a GATE training course!
• 17-21 June in Sheffield
• https://gate.ac.uk/conferences/fig/fig12.html
• Come on a research visit to us!
• SoBigData Transnational Access Program
• Fully funded research visits of 2-8 weeks to work with us on
a social media analysis project
• http://sobigdata.eu/content/open-call-sobigdata-funded-
transnational-access
77. Some useful links
• GATE http://gate.ac.uk
• GateCloud https://cloud.gate.ac.uk
• Come on a GATE training course!
• Blog posts about our GATE social media work
http://gate4ugc.blogspot.co.uk/search/label/Brexit
• UK elections monitor http://gate.ac.uk/projects/pft
• COMRADES project on social media during disasters
http://gate.ac.uk/projects/comrades
• WeVerify project looking at rumour detection
http://weverify.eu
78. Publications
• Diana Maynard, Ian Roberts, Mark A. Greenwood, Dominic Rout and Kalina
Bontcheva. A Framework for Real-time Semantic Social Media Analysis. Web
Semantics: Science, Services and Agents on the World Wide Web, 2017.
• G. Gorrell, M. Greenwood, I. Roberts, D. Maynard, K. Bontcheva. Twits, Twats
and Twaddle: Trends in Online Abuse towards UK Politicians. In Proceedings of
the 12th International Conference on Web and Social Media (ICWSM 2018),
25-28 June 2018, Stanford, US
• Diana Maynard, Kalina Bontcheva, Isabelle Augenstein. Natural Language
Processing for the Semantic Web. Morgan and Claypool, December 2016. ISBN:
9781627059091 (contains a chapter on social media analysis)
• M. Fernandez, L. Piccolo, D. Maynard, M. Wippoo, C. Meili, H. Alani. Pro-
Environmental Campaigns via Social Media: Analysing Awareness and
Behaviour Patterns. Journal of Web Science 2017.
• G. Resce and D. Maynard. What matters most to people around the world?
Retrieving Better Life Index priorities on Twitter. Journal of Technological
Forecasting & Social Change 2018.
• Available (and more) at https://gate.ac.uk/gate/doc/papers.html
79. Acknowledgements
This work supported by:
• the European Union/EU under the Information and
Communication Technologies (ICT) theme of the 7th
Framework and H2020 Programmes for R&D
• DecarboNet (610829) http://www.decarbonet.eu
● SoBigData (654024) http://www.sobigdata.eu
● COMRADES (687847) http://www.comrades-project.eu
● WeVerify (825297) https://weverify.eu/
● EPSRC grant EP/I004327/1 and British Academy under call
“The Humanities and Social Sciences: Tackling the UK’s
International Challenges”.
● Nesta http://nesta.org.uk