CUS 695 Project Presentation

•Als PPTX, PDF herunterladen•

2 gefällt mir•119 views

The document analyzes social media data from Twitter during the 2016 US Presidential Election over one week. It identifies the major topics discussed and performs a sentiment analysis of significant terms. The analysis found that 18% of over 117,000 words were negative, while 11% were positive and 71% neutral. Most discussions were about political figures and groups, almost a quarter about Donald Trump, rather than issues. Future analysis could examine sentiment in each identified topic cluster.

Social Analytics on Twitter
By: Adam Ghassouine, Robert
Monegro and Adrian Duran
CUS 695 – Capstone Project
Dr. Giancarlo Crocetti
Mondays 7:10 p.m. – 9:10 p.m.

Executive Summary
This report provides an analysis and insight into social media data, in particular posts on Twitter, pertaining to
the 2016 United States Presidential Election collected over a period of one week. The purpose of this report is
to identify major topics that are being discussed in regards to the election. The method of analysis included is a
sentiment analysis of all significant terms related to the topic being considered in this study. All script files from
this analysis can be found in the appendices section of this report.
The analysis clearly shows that during the Presidential Election of 2016 there was more negative verbiage used.
18% of 117,655 words were considered negative. 11% of conversations over Twitter were using positive
language, while 71% were neutral.
The report finds support that on social media most of the discussion is about gossip pertaining to political
figures and groups, almost a quarter of which is about Donald Trump, as opposed to people discussing about
actual political issues which is not unexpected for Twitter. Recommendations for future analysis include
analyzing the sentiment of each cluster.

StopWord Analysis
• Using the a StopWords dictionary, one can extract the frequency table of all words.
• The principle for doing this analysis is to detect and remove unnecessary words that provide
little to no substance in regards to this research.
• ‘https’ was appearing frequently, causing unnecessary n-grams to be generated. This in turn
led to the removal of this term.
• Another example of a high frequency word was ‘RT’, which stands for a post that has been
retweeted. This term provided no importance to the overall analysis.
• Unnecessary URLs in each post, random words with no meaning such as ‘absfwi’ and ‘acbqdi’,
were also eliminated.
• The result of this analysis section are words that only relate to the 2016 Presidential Election.

Sentiment
Code
• Used to extract positive
and negative scores to
further discern
sentiment for the
clusters generated

Sentiment Analysis
• The analysis clearly shows that during the
president elections of 2016 there were more
negative verbiage used. 18% of 117,655 words
were considered negative. 11% of
conversations over Twitter were using positive
language, while 71% were neutral.
• There were 28 most commonly used positive
words, such as: "Happiness, Congratulations,
Splendid, Excellent and Admirability"
• There were 16 most commonly used negative
words, such as: “Threaten, Downhill,
Apocalyptic, Negative, Trashed"

Bibliography
http://i2.cdn.turner.com/cnnnext/dam/assets/160201150128-trump-
clinton-split-portrait-exlarge-169.jpg
http://sentiwordnet.isti.cnr.it/
https://rapidminer.com/
https://twitter.com/?lang=en
http://aylien.com/

Weitere ähnliche Inhalte

Was ist angesagt?

EMNLP2014読み会徳永Hiroyuki TOKUNAGA

Social listening: how to do it and how to use (SNA Perspective)Toronto Metropolitan University

Following, Mentioning, Sharing: A Search for Filter Bubbles in the Australian...Axel Bruns

Altmetrics: Listening & Giving Voice to Ideas with Social Media DataToronto Metropolitan University

#ICCSS2015 - Computational Human Security Analytics using "Big Data"Pete Burnap

Social media & sentiment analysis splunk conf2012Michael Wilde

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...Spotle.ai

Twitter data analysis using Rsantoshi mangalgi

50,000,00 Twitter fans can't be wrongMarie Boran

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...Spotle.ai

Cody Vito - Success Rate of Pick-Up LinesCody Vito

Intro to sentiment analysisTimea Turdean

Insta-graphicJeremy Terhune

Political Scientific Reasoning PPJerry Pickard

What makes a bot a bot? Exploring benign automation on TwitterFelix Victor Münch

Twitter Sentiment Prediction.pptxKrishnesh Pujari

PosterYuqian Huang

Opinion mining for social mediaDiana Maynard

Was ist angesagt? (18)

EMNLP2014読み会徳永

Social listening: how to do it and how to use (SNA Perspective)

Following, Mentioning, Sharing: A Search for Filter Bubbles in the Australian...

Altmetrics: Listening & Giving Voice to Ideas with Social Media Data

#ICCSS2015 - Computational Human Security Analytics using "Big Data"

Social media & sentiment analysis splunk conf2012

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Cyber Pun...

Twitter data analysis using R

50,000,00 Twitter fans can't be wrong

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Zer...

Cody Vito - Success Rate of Pick-Up Lines

Intro to sentiment analysis

Insta-graphic

Political Scientific Reasoning PP

What makes a bot a bot? Exploring benign automation on Twitter

Twitter Sentiment Prediction.pptx

Poster

Opinion mining for social media

Ähnlich wie CUS 695 Project Presentation

Data Science Poster FinalJesse Hinson

Target link presentationHan Woo PARK

Are Twitter Users Equal in Predicting ElectionsLu Chen

Final impoliteness GlideShahMargaret Glide

The Voice of Democracy - Politics report 2013Germin8

Voice of Democracy - Politics Report - In Collaboration with Germin8Ethinos Digital Marketing

Document(2)Sutha Guru

the complete draft about the CA election time tweets -- awaiting final weedin...japokh

Tweeting for Hillary - DS 501 case study 1Yousef Fadila

Social Media Content Analysis: Ossoff Threat Assessment 2017.05.03Alan Rosenblatt

591 Final Report - Team 7 - Political IssuesTim Sawicki

DISCOURSE ANALYSIS-M. Alshammari3IntroductionThe main re.docxsalmonpybus

DISCOURSE ANALYSIS-M. Alshammari3IntroductionThe main re.docxcuddietheresa

State of the Unionjtierney

ОПРОС: КТО ПОБЕДИТ ТРАМПА НА СЛЕДУЮЩИХ ВЫБОРАХmResearcher

Aspects of Impoliteness during 2007 and 2013 Presidential Campaigns in KenyaAJSSMTJournal

Octopus and Midget in the Israeli-Palestinian Peace Process: Who Determines W...AJSSMTJournal

Gop social media_analysis_21_dec_2011_finalRichard Hartman, Ph.D.

Increasing Voter Knowledge with Pre-Election Interventions on FacebookMIT GOV/LAB

Using Tweets for Understanding Public Opinion During U.S. Primaries and Predi...Monica Powell

Ähnlich wie CUS 695 Project Presentation (20)

Data Science Poster Final

Target link presentation

Are Twitter Users Equal in Predicting Elections

Final impoliteness GlideShah

The Voice of Democracy - Politics report 2013

Voice of Democracy - Politics Report - In Collaboration with Germin8

Document(2)

the complete draft about the CA election time tweets -- awaiting final weedin...

Tweeting for Hillary - DS 501 case study 1

Social Media Content Analysis: Ossoff Threat Assessment 2017.05.03

591 Final Report - Team 7 - Political Issues

DISCOURSE ANALYSIS-M. Alshammari3IntroductionThe main re.docx

State of the Union

ОПРОС: КТО ПОБЕДИТ ТРАМПА НА СЛЕДУЮЩИХ ВЫБОРАХ

Aspects of Impoliteness during 2007 and 2013 Presidential Campaigns in Kenya

Octopus and Midget in the Israeli-Palestinian Peace Process: Who Determines W...

Gop social media_analysis_21_dec_2011_final

Increasing Voter Knowledge with Pre-Election Interventions on Facebook

Using Tweets for Understanding Public Opinion During U.S. Primaries and Predi...

CUS 695 Project Presentation

1. Social Analytics on Twitter By: Adam Ghassouine, Robert Monegro and Adrian Duran CUS 695 – Capstone Project Dr. Giancarlo Crocetti Mondays 7:10 p.m. – 9:10 p.m.

2. Executive Summary This report provides an analysis and insight into social media data, in particular posts on Twitter, pertaining to the 2016 United States Presidential Election collected over a period of one week. The purpose of this report is to identify major topics that are being discussed in regards to the election. The method of analysis included is a sentiment analysis of all significant terms related to the topic being considered in this study. All script files from this analysis can be found in the appendices section of this report. The analysis clearly shows that during the Presidential Election of 2016 there was more negative verbiage used. 18% of 117,655 words were considered negative. 11% of conversations over Twitter were using positive language, while 71% were neutral. The report finds support that on social media most of the discussion is about gossip pertaining to political figures and groups, almost a quarter of which is about Donald Trump, as opposed to people discussing about actual political issues which is not unexpected for Twitter. Recommendations for future analysis include analyzing the sentiment of each cluster.

3. Data Retrieval

4. Dataset

5. Data Processing

6. StopWord Analysis • Using the a StopWords dictionary, one can extract the frequency table of all words. • The principle for doing this analysis is to detect and remove unnecessary words that provide little to no substance in regards to this research. • ‘https’ was appearing frequently, causing unnecessary n-grams to be generated. This in turn led to the removal of this term. • Another example of a high frequency word was ‘RT’, which stands for a post that has been retweeted. This term provided no importance to the overall analysis. • Unnecessary URLs in each post, random words with no meaning such as ‘absfwi’ and ‘acbqdi’, were also eliminated. • The result of this analysis section are words that only relate to the 2016 Presidential Election.

7. Term Frequencies

8. Term Frequencies Results

9. Clustering Analysis

10. Clustering Analysis (cont.)

11. SentiWordNet

12. Sentiment Code • Used to extract positive and negative scores to further discern sentiment for the clusters generated

13. Sentiment Analysis • The analysis clearly shows that during the president elections of 2016 there were more negative verbiage used. 18% of 117,655 words were considered negative. 11% of conversations over Twitter were using positive language, while 71% were neutral. • There were 28 most commonly used positive words, such as: "Happiness, Congratulations, Splendid, Excellent and Admirability" • There were 16 most commonly used negative words, such as: “Threaten, Downhill, Apocalyptic, Negative, Trashed"

14. Cluster K-5

15. Cluster K-6

16. Cluster K-8

17. Cluster K-9

18. Cluster K-10

19. Sentiment Score

20. Bibliography http://i2.cdn.turner.com/cnnnext/dam/assets/160201150128-trump- clinton-split-portrait-exlarge-169.jpg http://sentiwordnet.isti.cnr.it/ https://rapidminer.com/ https://twitter.com/?lang=en http://aylien.com/

Hinweis der Redaktion

Completed pulling all tweets required to march on with project Modest number, approximately 7000 Create a Twitter Connection choosing a name and using the provided access token. Select a keyword with which Twitter posts you want to query (query=‘2016 election’). Select the amount of Twitter posts you want to query at one time (limit=‘1000’). Run the operator frequently to develop a large collection of data in order to obtain the posts
Wrote the data into a csv file in order to better aid us in the analysis
The Process Documents from Data Operator allows one to take unstructured text and generate a vector space model using TD-IDF. Change all the words to lower case. Remove the http hyperlink from all posts using the following regular expression. Apply tokenization to split all words at non-letters. -This operator splits the text of a document into a sequence of tokens. There are several options how to specify the splitting points. Either you may use all non-letter character, what is the default settings. This will result in tokens consisting of one single word, what's the most appropriate option before finally building the word vector -Or if you are going to build windows of tokens or something like that, you will probably split complete sentences, this is possible by setting the split mode to specify character and enter all splitting characters. -The third option let's you define regular expressions and is the most flexible for very special cases. Each non-letter character is used as separator. As a result, each word in the text is represented by a single token Filter tokens to remove all words less than 3 letters. Filter stopwords to remove all words such as ‘https’, @ symbol, and hashtags. Stemming all tokens to their base form -A stemmer providing several stemming algorithms written for the Snowball language. -This operator stems words by applying stemming algorithms written for the Snowball language. Various stemming algorithms for different languages can be chosen Generated n-grams of length 3 -Creates term n-Grams of tokens in a document. -This operator creates term n-Grams of tokens in a document. A term n-Gram is defined as a series of consecutive tokens of length n. The term n-Grams generated by this operator consist of all series of consecutive tokens of length n.
Now that we had the data formatted correctly, needed to see which terms appeared the most
When a term appears in a few documents, but not all documents, this increases the importance of terms This means that this term is very good at describing those documents
The analysis shows that the first 3-4 terms of each cluster properly convey the sentiment/topic of said cluster
WordNet is a lexical database that is used to group words into a set of synonyms called synsets WordNet does a great job at distinguishing different kinds of words such as nouns, verbs, adjectives, and adverbs SentiWordNet is an extension of WordNet that provides for each synset three additional measures: a Positive Score, a Negative Score, and a Neutral Score
Alyien is a RapidMiner Application that is utilized to analyze the sentiment of text. Extracting sentiment from a piece of text such as a tweet, a review or an article can provide us with valuable insight about the author's emotions and perspective: whether the tone is positive, neutral or negative, and whether the text is subjective (meaning it's reflecting the author's opinion) or objective (meaning it's expressing a fact).

CUS 695 Project Presentation

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Ähnlich wie CUS 695 Project Presentation

Ähnlich wie CUS 695 Project Presentation (20)

CUS 695 Project Presentation

Hinweis der Redaktion