The document discusses analyzing social media data, particularly tweets, for natural language processing tasks. It provides examples of analyzing tweets to understand information sharing during disasters, monitor opinions in real-time, detect topics and analyze political discussions. It also discusses challenges in analyzing tweets like informal language, ambiguity and misleading contexts or hashtags. Precise information extraction and annotation of tweets is needed to accurately identify hate speech, abuse and analyze its targets and changes over time. A multi-step pipeline including collection, preprocessing, information extraction and classification is proposed to understand abuse toward politicians from tweets surrounding UK elections.
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
Talk given at the Data Pioneers 1st meetup in London, 27 July 2017.
Abstract:
The GATE open source NLP toolkit has now been in continuous development for 20 years at the University of Sheffield. Originally funded by a small EPSRC research grant, it now involves a team of 12 researchers working on it, and has been downloaded by hundreds of thousands of users all over the world. Its users range from solitary research students to multinational companies and government institutions. In this talk, I will give an overview of my work with GATE, giving examples of real-life case studies, ranging from analysing polarised opinions in online political debates (Brexit, the UK, French and US elections) through to finding a new cause of cancer by analysing information in the biomedical domain.
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...Diana Maynard
Talk given at the Data Pioneers 1st meetup in London, 27 July 2017.
Abstract:
The GATE open source NLP toolkit has now been in continuous development for 20 years at the University of Sheffield. Originally funded by a small EPSRC research grant, it now involves a team of 12 researchers working on it, and has been downloaded by hundreds of thousands of users all over the world. Its users range from solitary research students to multinational companies and government institutions. In this talk, I will give an overview of my work with GATE, giving examples of real-life case studies, ranging from analysing polarised opinions in online political debates (Brexit, the UK, French and US elections) through to finding a new cause of cancer by analysing information in the biomedical domain.
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Lewis Shepherd
Discussion of current Microsoft Research projects and prospects which help drive open innovation and agile experimentation via cloud-based services; and projects which aim at advancing the state-of-the-art in knowledge representation and reasoning under uncertainty at web scale. I also begin by discussing potential malign implications of mass automated implementations of linked-data systems, as functions of what governments (and users of public data) can/should/shouldn’t do in promoting mass activity.
GitHub as Transparency Device in Data Journalism, Open Data and Data ActivismLiliana Bounegru
Slides from presentation of research agenda around uses of GitHub in journalism at the Digital Methods Summer School 2015. More details here: http://lilianabounegru.org/2015/07/08/github-as-transparency-device-in-data-journalism-open-data-and-data-activism/
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Liliana Bounegru
Lecture given at the National Center of Competence in Research: Challenges to Democracy in the 21st Century, 5 November 2015, Zürich University, Zürich, Switzerland
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsLiliana Bounegru
Talk given at the Digital Methods Winter School 2017 at the University of Amsterdam. It presents a selection of projects developed at the 2016 Digital Methods Winter and Summer Schools (www.digitalmethods.net).
Revealing social bot communities through coordinated behaviourDerek Weber
Presented at the 5th Australian Social Network Analysis Conference (ASNAC) on 26 November 2020. Co-authored with Mehwish Nasim (Data61, CSIRO), Lucia Falzon (DST Group, Uni Melbourne) and Lewis Mitchell (Uni Melbourne, DST Group).
Efforts to influence public opinion online, especially during times of political relevance, such as election campaigns, have grown since first observed in 2010, and are feared to be a particular threat to the upcoming US Presidential election. A significant component of such efforts has consisted of the use of social bots to quickly disseminate vast amounts of polarizing information, propaganda and biased opinion. As social bots are intended to mimic humans on social media, it is often difficult for other humans to identify them easily, but as there are also legitimate uses for online automation, the social media platforms also struggle to contain them, especially with the vast number of users they manage. Previous research has developed methods to detect influence campaigns in general, as well as specifically focusing on identifying social bots, including examining how they interact with other accounts and influence the broader political discussion.
In this talk, we discuss preliminary results from analysis of Twitter activity over the recent 2020 Democratic and Republican National Conventions, at which the parties formally nominated their candidates for President and Vice President. Each convention ran for four days, during which we collected 3m tweets. In particular, we apply techniques for discovering highly coordinating communities based on potentially coordinated behaviours: co-retweeting, co-mentioning of hashtags, and URL sharing. In doing so, we reveal groups of accounts engaging in potentially inauthentic behaviour, and identify classes of participating accounts, including social bots, campaign accounts, news accounts, and regular Twitter users. A variety of analyses of content and temporal patterns exhibited by the communities provide qualitative and quantitative validation, along with discussion of different behaviour patterns observed between the conventions. The ultimate aim is to distinguish between legitimate use of online influence activities (e.g., by political parties and grass roots campaigns) from covert malicious ones.
Mapping Issues with the Web: An Introduction to Digital MethodsJonathan Gray
Slides from talk on "Mapping Issues with the Web: An Introduction to Digital Methods" at Tow Center for Digital Journalism, Columbia University, 23rd September 2014. Further details at: http://jonathangray.org/2014/09/10/mapping-issues-with-web-columbia/
Journalists today are faced with an overwhelming abundance of data – from large collections of leaked documents, to public databases about lobbying or government spending, to ‘big data’ from social networks such as Twitter and Facebook. To stay relevant to society journalists are learning to process this data and separate signal from noise in order to provide valuable insights to their readers. This talk will address questions like: What is the potential of data journalism? Why is it relevant to society? And how can you get started?
This is an invited talk I presented at the University of Zurich, speakers' series 2.10.2017. The presentation is based on the following paper: Brandtzaeg, P. B., & Følstad, A. (2017). Trust and distrust in online fact-checking services. Communications of the ACM. 60(9): 65-71
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Prashant Khare
Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis.
Keynote delivered at the 10th International Conference on Social Informatics (SocInfo 2018), St.Petersburg, Russia, September 25–28, 2018. https://socinfo2018.hse.ru/
Material is from the EU COMRADES project and includes work from all the members of COMRADES consortium. For more information on the project, please visit the website at https://comrades-project.eu/
BBC's shoddy analysis about fake news spread in India
PS: Fake news is being spread, there is NO doubt about that.
But there is no easy way to arrive at the outlandish conclusions they have arrived at. Take a look :-) They start off with some "data analysis" and call it qualitative research.
Collective Intelligence Meets the Political AgendaEDV Project
The Web is changing the way citizens engage with the political agenda. Following the emergence of social media, political events are now surrounded by real-time reactions and analyses from viewers, political actors, mainstream media and other social organisations.
We anticipate a future in which events such as election debates will be enriched by an unpredictable range of additional information streams from individuals and organisations, from additional live reaction as events unfold, to retrospectively added resources which can be more reflective, and hence possibly higher quality. The EPSRC Election Debate Visualisation (EDV)
Project is aimed at developing an online video replay platform during the 2015 UK General
Election, in which party leadership debates are linked to customisable visualisation channels to enhance viewers’ experience and hopefully encourage citizen engagement.
What social media does to you? Social Media Statistics [2020]Rory Lee
We live in the age of social media when not being on social networks may cost you being brushed aside. But what it really does to you? Clip social media stats [2020], pros and cons of social platforms, and explore lesser-known social networks with Rory Lee, lead editor at GuideMeHow.
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...Lewis Shepherd
Discussion of current Microsoft Research projects and prospects which help drive open innovation and agile experimentation via cloud-based services; and projects which aim at advancing the state-of-the-art in knowledge representation and reasoning under uncertainty at web scale. I also begin by discussing potential malign implications of mass automated implementations of linked-data systems, as functions of what governments (and users of public data) can/should/shouldn’t do in promoting mass activity.
GitHub as Transparency Device in Data Journalism, Open Data and Data ActivismLiliana Bounegru
Slides from presentation of research agenda around uses of GitHub in journalism at the Digital Methods Summer School 2015. More details here: http://lilianabounegru.org/2015/07/08/github-as-transparency-device-in-data-journalism-open-data-and-data-activism/
Doing Social and Political Research in a Digital Age: An Introduction to Digi...Liliana Bounegru
Lecture given at the National Center of Competence in Research: Challenges to Democracy in the 21st Century, 5 November 2015, Zürich University, Zürich, Switzerland
Doing Digital Methods: Some Recent Highlights from Winter and Summer SchoolsLiliana Bounegru
Talk given at the Digital Methods Winter School 2017 at the University of Amsterdam. It presents a selection of projects developed at the 2016 Digital Methods Winter and Summer Schools (www.digitalmethods.net).
Revealing social bot communities through coordinated behaviourDerek Weber
Presented at the 5th Australian Social Network Analysis Conference (ASNAC) on 26 November 2020. Co-authored with Mehwish Nasim (Data61, CSIRO), Lucia Falzon (DST Group, Uni Melbourne) and Lewis Mitchell (Uni Melbourne, DST Group).
Efforts to influence public opinion online, especially during times of political relevance, such as election campaigns, have grown since first observed in 2010, and are feared to be a particular threat to the upcoming US Presidential election. A significant component of such efforts has consisted of the use of social bots to quickly disseminate vast amounts of polarizing information, propaganda and biased opinion. As social bots are intended to mimic humans on social media, it is often difficult for other humans to identify them easily, but as there are also legitimate uses for online automation, the social media platforms also struggle to contain them, especially with the vast number of users they manage. Previous research has developed methods to detect influence campaigns in general, as well as specifically focusing on identifying social bots, including examining how they interact with other accounts and influence the broader political discussion.
In this talk, we discuss preliminary results from analysis of Twitter activity over the recent 2020 Democratic and Republican National Conventions, at which the parties formally nominated their candidates for President and Vice President. Each convention ran for four days, during which we collected 3m tweets. In particular, we apply techniques for discovering highly coordinating communities based on potentially coordinated behaviours: co-retweeting, co-mentioning of hashtags, and URL sharing. In doing so, we reveal groups of accounts engaging in potentially inauthentic behaviour, and identify classes of participating accounts, including social bots, campaign accounts, news accounts, and regular Twitter users. A variety of analyses of content and temporal patterns exhibited by the communities provide qualitative and quantitative validation, along with discussion of different behaviour patterns observed between the conventions. The ultimate aim is to distinguish between legitimate use of online influence activities (e.g., by political parties and grass roots campaigns) from covert malicious ones.
Mapping Issues with the Web: An Introduction to Digital MethodsJonathan Gray
Slides from talk on "Mapping Issues with the Web: An Introduction to Digital Methods" at Tow Center for Digital Journalism, Columbia University, 23rd September 2014. Further details at: http://jonathangray.org/2014/09/10/mapping-issues-with-web-columbia/
Journalists today are faced with an overwhelming abundance of data – from large collections of leaked documents, to public databases about lobbying or government spending, to ‘big data’ from social networks such as Twitter and Facebook. To stay relevant to society journalists are learning to process this data and separate signal from noise in order to provide valuable insights to their readers. This talk will address questions like: What is the potential of data journalism? Why is it relevant to society? And how can you get started?
This is an invited talk I presented at the University of Zurich, speakers' series 2.10.2017. The presentation is based on the following paper: Brandtzaeg, P. B., & Følstad, A. (2017). Trust and distrust in online fact-checking services. Communications of the ACM. 60(9): 65-71
Classifying Crisis Information Relevancy with Semantics (ESWC 2018)Prashant Khare
Social media platforms have become key portals for sharing and consuming information during crisis situations. However, humanitarian organisations and effected communities often struggle to sieve through the large volumes of data that are typically shared on such platforms during crises to determine which posts are truly relevant to the crisis, and which are not. Previous work on automatically classifying crisis information was mostly focused on using statistical features. However, such approaches tend to be inappropriate when processing data on a type of crisis that the model was not trained on, such as processing information about a train crash, whereas the classifier was trained on floods, earthquakes, and typhoons. In such cases, the model will need to be retrained, which is costly and time-consuming.
In this paper, we explore the impact of semantics in classifying Twitter posts across same, and different, types of crises. We experiment with 26 crisis events, using a hybrid system that combines statistical features with various semantic features extracted from external knowledge bases. We show that adding semantic features has no noticeable benefit over statistical features when classifying same-type crises, whereas it enhances the classifier performance by up to 7.2% when classifying information about a new type of crisis.
Keynote delivered at the 10th International Conference on Social Informatics (SocInfo 2018), St.Petersburg, Russia, September 25–28, 2018. https://socinfo2018.hse.ru/
Material is from the EU COMRADES project and includes work from all the members of COMRADES consortium. For more information on the project, please visit the website at https://comrades-project.eu/
BBC's shoddy analysis about fake news spread in India
PS: Fake news is being spread, there is NO doubt about that.
But there is no easy way to arrive at the outlandish conclusions they have arrived at. Take a look :-) They start off with some "data analysis" and call it qualitative research.
Collective Intelligence Meets the Political AgendaEDV Project
The Web is changing the way citizens engage with the political agenda. Following the emergence of social media, political events are now surrounded by real-time reactions and analyses from viewers, political actors, mainstream media and other social organisations.
We anticipate a future in which events such as election debates will be enriched by an unpredictable range of additional information streams from individuals and organisations, from additional live reaction as events unfold, to retrospectively added resources which can be more reflective, and hence possibly higher quality. The EPSRC Election Debate Visualisation (EDV)
Project is aimed at developing an online video replay platform during the 2015 UK General
Election, in which party leadership debates are linked to customisable visualisation channels to enhance viewers’ experience and hopefully encourage citizen engagement.
What social media does to you? Social Media Statistics [2020]Rory Lee
We live in the age of social media when not being on social networks may cost you being brushed aside. But what it really does to you? Clip social media stats [2020], pros and cons of social platforms, and explore lesser-known social networks with Rory Lee, lead editor at GuideMeHow.
Political communication, digital and Y generationLéa Bellaïche
Our society is changing. The social gap between older and younger generations is growing faster. How does the political institutions has changed their way to communicate with Internet, especially through social media ? And what are the implication and motivation of the young generation ?
Pew Internet Director Lee Rainie discussed the new media ecosystem with leaders of community foundations from Western states and several other locales. He described how three technology revolutions have made the media world personal, portable, participatory, and pervasive in people’s lives and how those changes have affected communities.
The State of Social Media (and How to Use It and Not Lose Your Job)Andrew Krzmarzick
Keynote address for the National Conference of State Legislatures (NCSL) Luncheon for Legislative Information and Communications Staff and National Association of Legislative Information Technology professionals on October 10, 2012.
My standard keynote presentation for an audience that has heard of social media but doesn't know how to apply it to their everyday business lives. Can also be presented as a 1/2 day workshop.
During the COVID-19 Global Pandemic, there were multiple lessons provided to the world. In this talk, I set the stage for the discussion, highlight the issues we faced (and still face), I speak to an effort that contributed to help address one of those issues, then speak to future challenges and our responsibilities going forward.
Arcomem training Cultural Analysis Advancedarcomem
This presentation on Cultural Analysis is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Hate Speech, Polarization and Online DataIngmar Weber
Slides for keynote talk at workshop on hate speech detection and genocide/politicide prediction organized by Ben Goldsmith (https://researchers.anu.edu.au/researchers/goldsmith-b) and Marian-Andrei Rizoiu (https://cecs.anu.edu.au/people/marian-andrei-rizoiu) at the Australian National University (ANU) on November 26/27, 2018.
Social media is not a replacement of previous forms of communication. Keep doing those things you used to do, social media is just another tool. The strength of social media is that it can empower your audiences to participate in your communication and brand development….hopefully in a good way.
GATE: a text analysis tool for social mediaDiana Maynard
Short tutorial about how and why to use GATE for text analysis of social media, given at the Big Social Data workshop at Reading University in April 2015.
Multimodal opinion mining from social mediaDiana Maynard
Presentation at the BCS SGAI 2013 conference in Cambridge, December 2013, describing the combination of opinion mining from text and multimedia from social media.
Practical Opinion Mining for Social MediaDiana Maynard
This tutorial will introduce the concepts of sentiment analysis and opinion mining from unstructured text in social media, looking at why they are useful and what tools and techniques are available. It will cover both rule-based and machine learning techniques, provide some background information on the key underlying NLP processes required, and look in detail at some of the major problems and solutions, such as detection of sarcasm, use of informal language, spam opinion detection, trustworthiness of opinion holders, and so on. The techniques will be demonstrated with real applications developed in GATE, an open-source language processing toolkit. Links are provided to some hands-on material to try at home.
What do you really mean when you tweet? Challenges for opinion mining on soci...Diana Maynard
This talk, given at BRACIS 2013, introduces the topics of opinion mining and social media analytics, in particular looking at the challenges they impose for an NLP system. It investigates the impact of non-standard text in social media, use of sarcasm, swear words, non-words, short sentences, multiple languages and so on, which impede the success of current NLP tools to perform good analysis, and examines tools being developed in some current cutting-edge research projects, including not only text-based research but also multimedia analysis.
Surat Digital Marketing School is created to offer a complete course that is specifically designed as per the current industry trends. Years of experience has helped us identify and understand the graduate-employee skills gap in the industry. At our school, we keep up with the pace of the industry and impart a holistic education that encompasses all the latest concepts of the Digital world so that our graduates can effortlessly integrate into the assigned roles.
This is the place where you become a Digital Marketing Expert.
Buy Pinterest Followers, Reactions & Repins Go Viral on Pinterest with Socio...SocioCosmos
Get more Pinterest followers, reactions, and repins with Sociocosmos, the leading platform to buy all kinds of Pinterest presence. Boost your profile and reach a wider audience.
https://www.sociocosmos.com/product-category/pinterest/
Improving Workplace Safety Performance in Malaysian SMEs: The Role of Safety ...AJHSSR Journal
ABSTRACT: In the Malaysian context, small and medium enterprises (SMEs) experience a significant
burden of workplace accidents. A consensus among scholars attributes a substantial portion of these incidents to
human factors, particularly unsafe behaviors. This study, conducted in Malaysia's northern region, specifically
targeted Safety and Health/Human Resource professionals within the manufacturing sector of SMEs. We
gathered a robust dataset comprising 107 responses through a meticulously designed self-administered
questionnaire. Employing advanced partial least squares-structural equation modeling (PLS-SEM) techniques
with SmartPLS 3.2.9, we rigorously analyzed the data to scrutinize the intricate relationship between safety
behavior and safety performance. The research findings unequivocally underscore the palpable and
consequential impact of safety behavior variables, namely safety compliance and safety participation, on
improving safety performance indicators such as accidents, injuries, and property damages. These results
strongly validate research hypotheses. Consequently, this study highlights the pivotal significance of cultivating
safety behavior among employees, particularly in resource-constrained SME settings, as an essential step toward
enhancing workplace safety performance.
KEYWORDS :Safety compliance, safety participation, safety performance, SME
Grow Your Reddit Community Fast.........SocioCosmos
Sociocosmos helps you gain Reddit followers quickly and easily. Build your community and expand your influence.
https://www.sociocosmos.com/product-category/reddit/
Your LinkedIn Success Starts Here.......SocioCosmos
In order to make a lasting impression on your sector, SocioCosmos provides customized solutions to improve your LinkedIn profile.
https://www.sociocosmos.com/product-category/linkedin/
This tutorial presentation provides a step-by-step guide on how to use Facebook, the popular social media platform. In simple and easy-to-understand language, this presentation explains how to create a Facebook account, connect with friends and family, post updates, share photos and videos, join groups, and manage privacy settings. Whether you're new to Facebook or just need a refresher, this presentation will help you navigate the features and make the most of your Facebook experience.
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...AJHSSR Journal
ABSTARCT: This study presents, to our knowledge, the first bibliometric analysis focusing on the concept of
"felt obligation," examining 120 articles published between 1986 and 2024. The aim of the study is to deepen our
understanding of the existing knowledge in the field of "felt obligation" and to provide guidance for further
research. The analysis is centered around the authors, countries, institutions, and keywords of the articles. The
findings highlight prominent researchers in this field, leading universities, and influential journals. Particularly,
it is identified that China plays a leading role in "felt obligation" research. The analysis of keywords emphasizes
the thematic focuses of these studies and provides a roadmap for future research. Finally, various
recommendations are presented to deepen the knowledge in this area and promote applied research. This study
serves as a foundation to expand and advance the understanding of "felt obligation" in the field.
KEYWORDS: Felt Obligation, Bibliometric Analysis, Research Trends
EASY TUTORIAL OF HOW TO USE G-TEAMS BY: FEBLESS HERNANEFebless Hernane
Using Google Teams (G-Teams) is simple. Start by opening the Google Teams app on your phone or visiting the G-Teams website on your computer. Sign in with your Google account. To join a meeting, click on the link shared by the organizer or enter the meeting code in the "Join a Meeting" section. To start a meeting, click on "New Meeting" and share the link with others. You can use the chat feature to send messages and the video button to turn your camera on or off. G-Teams makes it easy to connect and collaborate with others!
Telegram is a messaging platform that ushers in a new era of communication. Available for Android, Windows, Mac, and Linux, Telegram offers simplicity, privacy, synchronization across devices, speed, and powerful features. It allows users to create their own stickers with a user-friendly editor. With robust encryption, Telegram ensures message security and even offers self-destructing messages. The platform is open, with an API and source code accessible to everyone, making it a secure and social environment where groups can accommodate up to 200,000 members. Customize your messenger experience with Telegram's expressive features.
Unlock TikTok Success with Sociocosmos..SocioCosmos
Discover how Sociocosmos can boost your TikTok presence with real followers and engagement. Achieve your social media goals today!
https://www.sociocosmos.com/product-category/tiktok/
Your Path to YouTube Stardom Starts HereSocioCosmos
Skyrocket your YouTube presence with Sociocosmos' proven methods. Gain real engagement and build a loyal audience. Join us now.
https://www.sociocosmos.com/product-category/youtube/
The Evolution of SEO: Insights from a Leading Digital Marketing AgencyDigital Marketing Lab
Explore the latest trends in Search Engine Optimization (SEO) and discover how modern practices are transforming business visibility. This document delves into the shift from keyword optimization to user intent, highlighting key trends such as voice search optimization, artificial intelligence, mobile-first indexing, and the importance of E-A-T principles. Enhance your online presence with expert insights from Digital Marketing Lab, your partner in maximizing SEO performance.
This tutorial presentation offers a beginner-friendly guide to using THREADS, Instagram's messaging app. It covers the basics of account setup, privacy settings, and explores the core features such as close friends lists, photo and video sharing, creative tools, and status updates. With practical tips and instructions, this tutorial will empower you to use THREADS effectively and stay connected with your close friends on Instagram in a private and engaging way.
Project Serenity is an innovative initiative aimed at transforming urban environments into sustainable, self-sufficient communities. By integrating green architecture, renewable energy, smart technology, sustainable transportation, and urban farming, Project Serenity seeks to minimize the ecological footprint of cities while enhancing residents' quality of life. Key components include energy-efficient buildings, IoT-enabled resource management, electric and autonomous transportation options, green spaces, and robust waste management systems. Emphasizing community engagement and social equity, Project Serenity aspires to serve as a global model for creating eco-friendly, livable urban spaces that harmonize modern conveniences with environmental stewardship.
Understanding the world with NLP: interactions between society, behaviour and social media
1. Understanding the world with NLP:
interactions between society, behaviour and
social media
Dr. Diana Maynard
University of Sheffield, UK
27 February 2019, POLIMI, Milan
7. Men or women?
• 24% of all internet male users use Twitter
• 21% of all internet female users use Twitter
Out of all Twitter users (in 2019):
• 34.5% are women
• 65.5% are men
8. How old are twitter users?
• Out of all twitter users, which age range has the
higher percentage?
16. Top 10 most followed Twitter users
2017 2015 2013
Katy Perry Katy Perry Katy Perry
Justin Bieber Justin Bieber Justin Bieber
Barack Obama Taylor Swift Lady Gaga
Taylor Swift Barack Obama Barack Obama
Rihanna Youtube Taylor Swift
Ellen de Generes Lady Gaga YouTube
Lady Gaga Rihanna Britney Spears
Youtube Ellen de Generes Rihanna
Justin Timberlake Twitter Instagram
Twitter Justin Timberlake Justin Timberlake
17. Social media: a valuable source of information
(not just stupid stuff about pop stars)
l business insights
l sharing and receiving news
l campaigns
l all kinds of collective intelligence
l an alternative to traditional polls
l and much more
18. Why is social media interesting for NLP?
l Fast-growing, highly dynamic and high volume
source of data – big data!
l Reflects language used in today's society
l Reflects current views of society
l It's a great source of material for opinion mining
tools
l Challenging research area due to specialised use of
language
19. Gartner 3V definition of Big Data
• Volume
• Velocity
• Variety
• +Veracity
• High volume & velocity of messages:
• 500 million tweets per day
• Massive variety:
• Stock markets
• Earthquakes
• Social arrangements
20. Big Data is not new!
Staff sorting 4M used tickets from #London Underground
to analyse line use in 1939
21. Linguistic challenges of social media
• Language
• Problem: typically exhibits very different language style
• Solution: train specific language processing components
• Relevance
• Problem: topics and comments can rapidly diverge.
• Solution: train a classifier or use clustering techniques
• Lack of context
• Problem: hard to disambiguate entities
• Solution: data aggregation, metadata, entity linking
22. People don’t write “properly”
l Grundman:politics makes #climatechange scientific issue,people
don’t like knowitall rational voice tellin em wat 2do
l Want to solve the problem of #ClimateChange? Just #vote for a
#politician! Poof! Problem gone! #sarcasm #TVP #99%
l Human Caused #ClimateChange is a Monumental Scam!
http://www.youtube.com/watch?v=LiX792kNQeE … F**k yes!!
Lying to us like MOFO's Tax The Air We Breath! F**k Them!
l The last people I will listen2 about guns r those that know
nothing about them&politicians who live in states w/strictest
gun laws #cali #ny
24. Named Entity Recognition and Linking
NER (Professor Plum)
dbpedia.org/resource/.....
Michael_Jackson
Michael_Jackson_(writer)
Entity Linking
25. Pipelines for tweets
Errors have a cumulative effect
Good performance is important at each stage
Per-stage
Overall
26. Understanding tweets is hard even
for people
Branching out from Lincoln park after dark ... Hello
Russian Navy, it's like the same thing but with glitter!
??
27. It’s all about the Named Entities
Branching out from Lincoln park after dark ... Hello
Russian Navy, it's like the same thing but with glitter!
29. 2
9
Ecuador, 7.8 earthquake , April 2017, ~700 people
die
Droughts, affecting 60 million in 34 countries
Maxwell, California, Feb 2017
Portugal, forest fires, 64 confirmed deaths, Jun 2017
Manchester, May 2017, 22 dead
Haiti, Hurricane Matthew, Oct 2016,
~500 people died, farming devastated
31. Uses of social media during disasters
• Broadcasting info about the disaster
• Requesting info from local people and
eyewitnesses
• Requesting and offering help and
support
• Disaster mapping
• Mobilising the crowd to support
initiatives
32. How?
• Using citizen
reporters, and digital
responders for
mapping crises
• Ushahidi deployed
over 50k times
• Free and open source
• Working with us on
the COMRADES
project
36. Behaviour Analysis
• Based on the assumption that users in different behavioural
stages communicate differently (different emotions,
directives, etc.)
Pajarito @lindopajarito . 2h
Our building needs 40% of all energy consumed in Switzerland! L
DJPajarito @DJPajaritoGenial . 12h
I'm so proud when I remember to save energy and I know
however small it's helping.
Desirability: Negative sentiment (expressing personal
frustration- anger/sadness)
Buzz: Positive sentiment (happiness/joy). I/we + present tense
HotelPajarito @HotelPajarito . 18h
Join us today today to switch of a light for EH! J
Invitation: Positive sentiment (happy) + use of vocatives
37. • How do people talk about elections and political
events?
• How do the MPs talk about different topics?
• How does the public respond to them?
Social media and politics
40. Parties / themes co-occurrence
UK economy
Europe
Tax and revenue
NHS
Borders and Immigration
Scotland
Employment
Community and society
Public health
Media and communications
LabourPartyCandidate
LabourPartyMP
ConservativePartyCandidate
ConservativePartyMP
UKIPCandidate
OtherMP
SNPCandidate
GreenPartyCandidate
LiberalDemocratsCandidate
SNPOther
41. Twits, twats and twaddle:
analysis of hate speech towards politicians
42. Online abuse
• Puts people off debating online
• Puts people off becoming
politicians
• Seems to be getting worse
• Might be particularly bad for
particular groups (females, ethnic
minorities, LGBT etc)
"I am seriously considering
whether or not to stand
next time"
"My staff try not to let
me go out alone"
"Misogynist comments,
sexual abuse … My
children saw this"
"death threats"
43. • Let’s find all the “hate” terms in the messages and
look at who they’re directed at
• Then we can see who gets the most, and how this
changes over time
• Plug in a swear word lexicon, do a bit of number
crunching, and then head off to a desert island for a
nice holiday
How do we identify hate speech?
52. • Who is being abused?
• Who is abusing them?
• What is the abuse about?
• Is it really getting worse?
Aims of the Analysis
53. • Collect tweets to and from politicians in the run-up
to the 2015 and 2017 UK elections
• Annotate all the interesting information (who, what,
when, where) with the social media toolkit
• Run an abuse classifier
• Analyse the results
Plan
67. Finding abusive terms
nigger
witch
homo
God botherer
• 404 abusive terms collected
• But only annotated when
used in specific situations
shut up
f**k you
Uncivil language
idiot
kill
Threats die
Obscene nouns
cunt
twat
rape
Racist and
bigoted language
69. Did the abuse get worse?
There was more abuse in 2017 than in 2015
20172015
70. Who got the abuse in 2015?
• Men got more abuse then women
• Conservatives got more abuse than Labour
71. Who got the abuse in 2015?
A small number of
prominent MPs
72. What about in 2017?
The same thing happened (but to different people)
73. Check out the interactive version!
http://demos.gate.ac.uk/politics/buzzfeed/sunburst.html
74. Let’s look a bit deeper
• Did men get more abuse
because more men are
Conservatives?
• Did more prominent
people get more abuse
just because they got
more tweets?
• Structural equation
modelling using the
Lavaan package in R
75. • Analysing social media is super interesting
• But super tricky (even for humans)
• Need good underlying language analysis tailored to
social media, before attempting anything else
• Remember that correlation ≠ causation
Conclusions
76. The advertising bit
• Come on a GATE training course!
• 17-21 June in Sheffield
• https://gate.ac.uk/conferences/fig/fig12.html
• Come on a research visit to us!
• SoBigData Transnational Access Program
• Fully funded research visits of 2-8 weeks to work with us on
a social media analysis project
• http://sobigdata.eu/content/open-call-sobigdata-funded-
transnational-access
77. Some useful links
• GATE http://gate.ac.uk
• GateCloud https://cloud.gate.ac.uk
• Come on a GATE training course!
• Blog posts about our GATE social media work
http://gate4ugc.blogspot.co.uk/search/label/Brexit
• UK elections monitor http://gate.ac.uk/projects/pft
• COMRADES project on social media during disasters
http://gate.ac.uk/projects/comrades
• WeVerify project looking at rumour detection
http://weverify.eu
78. Publications
• Diana Maynard, Ian Roberts, Mark A. Greenwood, Dominic Rout and Kalina
Bontcheva. A Framework for Real-time Semantic Social Media Analysis. Web
Semantics: Science, Services and Agents on the World Wide Web, 2017.
• G. Gorrell, M. Greenwood, I. Roberts, D. Maynard, K. Bontcheva. Twits, Twats
and Twaddle: Trends in Online Abuse towards UK Politicians. In Proceedings of
the 12th International Conference on Web and Social Media (ICWSM 2018),
25-28 June 2018, Stanford, US
• Diana Maynard, Kalina Bontcheva, Isabelle Augenstein. Natural Language
Processing for the Semantic Web. Morgan and Claypool, December 2016. ISBN:
9781627059091 (contains a chapter on social media analysis)
• M. Fernandez, L. Piccolo, D. Maynard, M. Wippoo, C. Meili, H. Alani. Pro-
Environmental Campaigns via Social Media: Analysing Awareness and
Behaviour Patterns. Journal of Web Science 2017.
• G. Resce and D. Maynard. What matters most to people around the world?
Retrieving Better Life Index priorities on Twitter. Journal of Technological
Forecasting & Social Change 2018.
• Available (and more) at https://gate.ac.uk/gate/doc/papers.html
79. Acknowledgements
This work supported by:
• the European Union/EU under the Information and
Communication Technologies (ICT) theme of the 7th
Framework and H2020 Programmes for R&D
• DecarboNet (610829) http://www.decarbonet.eu
● SoBigData (654024) http://www.sobigdata.eu
● COMRADES (687847) http://www.comrades-project.eu
● WeVerify (825297) https://weverify.eu/
● EPSRC grant EP/I004327/1 and British Academy under call
“The Humanities and Social Sciences: Tackling the UK’s
International Challenges”.
● Nesta http://nesta.org.uk