SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Performing sentiment analysis on Twitter data
(2011 Norway attacks)
Team –
AparnaDhanashriJayaprakash – 50094768
HimanshuYadav – 50093151
Inder Puneet Singh – 50094241
Sabah Abdul Mannan Khan – 50094894
VidyaMulukutla - 50095830
Analysis of Twitter Data Set
Introduction
Big Data is increasingly pertinent in today’s digitalized world and is being used in a lot of
different domains. With social media being so pervasive, it makes logical sense to use it to
generate the data sets for analysis in various areas from politics to entertainment.We have chosen
‘Twitter’ as our source for data since it has a wide user base that includes regular people as well
as popular individuals from the fields of media, movies, sports and politics. There are a lot of
analytical results that can be derived from a popular and widely used Social media platform like
Twitter and we used the data generated from it through an implementation using Apache Hadoop
and Hive. In order to gauge the reactions from the different users who responded to the
significant events in the month of July 2011, we performed a Sentiment Analysis. Sentiment
Analysis is the process of trying to gather subjective information through natural language
processing, computational linguistics and text analysis. It is also known as opinion mining.There
were two important and completely contrasting events that took place in July 2011for which we
came up with a comparison analysis and the description of the events is as follows:
The Norway attacks of 2011 were the most deadly attacks on the country. Two sequential
explosions took place within a span of two hours on 22nd
July 2011. The first one was a car bomb
that took place in the executive governmental headquarters that killed eight people and injured
around 209 people. The second one was a deadly assault that took place on an island. It was a
summer camp organized by the youth division of the ruling party. An unidentified man gained
access to the camp and open fired at the participating members. This attack claimed 69 lives and
seriously injured 110 persons. The accused in the case, Anders Behring Breivik, was sentenced
to 21 years in imprisonment.
Analysis of Twitter Data Set
Amy Winehouse was a hugely popular British singer and songwriter. Her work was
critically as well as commercially appreciated and she won multiple Grammy Awards for her
songs. Her sudden demise due to alcohol poisoning on 23rd
July 2011 shocked millions of her
fans worldwide and sent the online networking sites into frenzy.
Hypothesis
As per our hypothesis, we decided to evaluate how users from different geographical
locations reacted to both the stories on twitter.We took the assumption that the Norway
attackswould affect the public more as compared to the Amy Winehouse death and would garner
more tweets, hashtags and retweets as it is a more important event in the sense that it was an
attack in which many lives were lost and even more critically injured. We compared these two
events using sentiment analysis.
Technology
For our implementation, we have used Apache Hadoop which was deployed on an Amazon EC2
instance for processing of data.For the installation of Hadoop master, we used m1.1large instance
type whereas for the Hadoop slaves, we used m1.4small instance types. We elected the M1
general-purpose instance types primarily for their extremely low cost options for running
applications. They are appropriate for a moderately good CPU performance.
Apache Hive was used to analyze, summarize and query the data using a SQL type language
known as HiveQL.
Data Preparation
Data Selection
The data that was extracted was segregated into different tables for the sake of
convenience of analysis. One of the tables from the Norway attacks event is as shown below -
Analysis of Twitter Data Set
Hashtag Count
Oslo 466
Norway 396
tcot 308
oslo 244
p2 234
SAVEAMERICANOW 214
news 124
blamethemuslims 111
norway 110
breakingnews 93
isles 93
fb 90
islanders 88
cnn 82
Utoya 74
teaparty 61
osloexpl 55
News 55
prayfornorway 55
tlot 36
Breivik 34
socialmedia 34
Analysis of Twitter Data Set
politics 32
NFL 32
utoya 27
PrayForNorway 27
Utøya 27
CNN 26
Islam 24
oslobomb 24
Data Cleaning:
Contrary to our perception that the data set would be limited to one specific time period
of say one year, the information extracted from the dataset spanned over many years due to
which there was no concentration of high density of information in one particular time period.
Firstly, this meant finding events that occurred in a specific time period. Also, considering the
fact that data in the data set is acquired from varied number of sources, there is often a lot of
redundant data, which makes the deletion of duplicate information mandatory before any
analysis can be conducted.
Owing to the fact that we were dealing with huge data sets, we partitioned the data to
make the analysis easier and also to improve query performance. Another important aspect of
Data cleaning is Geo tagging locations. The reason that this needs to be considered is that the
same address can be interpreted in various forms. For example, Bangalore, Bangalore Karnataka
and Bangalore Karnataka India are all different ways to write the same location. In order to
perform an accurate analysis, the location needs to be normalized and converted into the same
Analysis of Twitter Data Set
format. The technique that we used to do this is Google’s Geocoding API. This API assists by
giving a straightforward method to convert a particular address into coordinates like latitudes and
longitudes that can be applied for map positioning.
Challenges faced during Implementation:
Some of the hindrances that we encountered with the extracted data are:
 Duplicate files:
The extracted data returned a huge number of repetitive files with the same content. This
is a huge annoyance, as single files with unique content must be filtered through additional
processing. This is also very time consuming.
 Parsing data:
Parsing is a difficult aspect and it does not work owing to varied reasons such as if the
data on Twitter consists of many languages. Another reason could be the that the JSON structure
was closed incorrectly which limits the data read beyond this point.
 Complete data not recovered:
This issue deals with the non-recovery of complete data when extracting through Apache
Hive. As we are dealing with huge data sets, a lot of extra programming and debugging is
required to repair the situation. Parsing exceptions were created which were thatched by locating
the erroneous files.
Analysis
After data selection and data cleaning process, different tables were selected that were
representative of various aspects of the analysis with regards to the two events – Amy
Winehouse and Norway attacks ; a comparison analysis for the two events along with asentiment
Analysis of Twitter Data Set
analysis for each of the two events. Following are the different aspects which will help proceed
with an analysis of the events in hand –
Data Distribution, Hashtags count table, URLS count table, Tweet sentiment, and
Famous tweeters.
Event 1: Amy Winehouse
No of Tweets
0
5000
10000
15000
20000
25000
No of Tweets
Analysis of Twitter Data Set
URL Share Count
http://t.co/0IGT940 http://t.co/kLYO5t5
http://huff.to/oDwgHC http://t.co/BtIzsiW
http://t.co/CahfKYh http://on.msnbc.com/4dpW6f
http://nyp.st/qYGM9L http://bit.ly/oapSdd
http://t.co/TkKR8Qm http://n.pr/nnu5XS
0
100
200
300
400
500
600
Hashtag Count
Analysis of Twitter Data Set
Event 2: Norway attacks
0 50 100 150 200 250 300 350 400 450
SkyNewsBreak
YouTube
BreakingNews
HuffingtonPost
Reuters
NewYorkPost
iamshortymack
RollingStone
HotNewHipHop
mashable
User Mention Count
No of Tweets
0
2000
4000
6000
8000
No of Tweets
Analysis of Twitter Data Set
7%
7%
7%
5%
5%
4%
4%
4%
4%4%3%3%3%3%
3%
3%
3%
3%
3%
3%
3%
2%
2%
2%2%2%2%2%2%2%
URL Share Count
http://on.mash.to/nViorD
http://bisi.pl/31b
http://bit.ly
http://budurl.com/2tl2
http://t.co/dPHb33j
http://bit.ly/qd41UN
http://apne.ws/qvdeXV
http://bit
http://t.co/AyS26mV
http://twitpic.com/5tzsmx
http://t.co/dXABr5T
http://apne.ws/qi7CM5
0
50
100
150
200
250
300
350
400
450
500
Hashtag Count
Analysis of Twitter Data Set
Comparison Analysis
The Amy Winehouse event occurred on 23rd
of July,2011 whereas the Norway attacks event
occurred on 22nd
July, 2011. As can be seen from the charts, the number of tweets for event 1
peaked on the day of the event and had a steep drop over the week till they finally died down. On
the other hand, the Norway attacks event, had maximum tweets on the day of the event and
subsequently over the next couple of days while the drop in number of tweets was pretty gradual.
However, it is interesting to note that event 1 garnered the maximum number of tweets of over
20000 on the day when it occurred. Despite being of more serious nature, event 2 saw much less
number of tweets on the day of its occurrence.
Sentiment Analysis
The sentiments in terms of positive, negative and neutral tweets to the two events over a span of
a week from 07/22/2011 to 07/31/2011 are visualized. Below are graphs that depict the same –
0 50 100 150 200 250 300 350 400 450
BreakingNews
Reuters
CBSNews
YouTube
HuffingtonPost
YahooNews
StateDept
mpoppel
ggreenwald
SenatorSanders
User Mention Count
Analysis of Twitter Data Set
Event 1: Amy Winehouse
The Event 1 garnered maximum neutral tweets and minimum positive tweets on the whole.
Event 2: Norway Attacks
Event 2 also garnered maximum neutral tweets and minimum positive tweets on the whole.
Interestingly, the number of negative tweets exceeded the neutral and positive tweets during the
subsequent days of the event.
0
2000
4000
6000
8000
10000
12000
20-Jul-11 22-Jul-11 24-Jul-11 26-Jul-11 28-Jul-11 30-Jul-11 1-Aug-11
Tweet Count
Positive tweet Negative Tweet Neutral Tweet
0
1000
2000
3000
4000
5000
6000
7000
8000
20-Jul-11 22-Jul-11 24-Jul-11 26-Jul-11 28-Jul-11 30-Jul-11 1-Aug-11
Tweet Count
Positive Negative Neutral
Analysis of Twitter Data Set
Conclusion
Managing huge amounts of data is becoming convenient with the advent of distributed
file systems. They have the capability of managing and analyzing huge volumes of data that can
help assess a particular event’s significance over a period of time.
The analysis negates the hypothesis that we had initially assumed and brought us to the
conclusion that Amy Winehouse event was as popular as an event as grave as the Norway attacks
if not more. The retweets that the events generated assist in determining the most discussed
issues among the twitter users. It is extremely surprising that a celebrity death can take
precedence over assault of a nation. A reasoning for this could be that people are very conscious
and careful upon commenting on issues that are sensitive in nature and choose to refrain from
expressing views. The sentiment analysis reasserts this; with the graphs showing maximum
neutral tweets to both the events, it can be interpreted that most people are reserved in their
opinions and hence take a neutral stand while participating on a public platform where most
activities are scrutinized especially an issue as delicate as the Norway attacks.
Analysis of Twitter Data Set
References
http://en.wikipedia.org/wiki/Sentiment_Analysis
http://en.wikipedia.org/wiki/Apache_Hive
http://aws.amazon.com/ec2/instance-types/#selecting-instance-types
https://developers.google.com/maps/documentation/geocoding/?hl=el

Weitere ähnliche Inhalte

Ähnlich wie Twitter analysis

NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
Patrick Grant
 
1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt
ChantellPantoja184
 
1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt
CicelyBourqueju
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Artificial Intelligence Institute at UofSC
 
Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reporting
peterverweij
 

Ähnlich wie Twitter analysis (20)

NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Document(2)
Document(2)Document(2)
Document(2)
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt
 
1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt1312021 QNT275T Statistics for Decision Making homehtt
1312021 QNT275T Statistics for Decision Making homehtt
 
Facebook Keynote for PhoCusWright India 2016
Facebook Keynote for PhoCusWright India 2016Facebook Keynote for PhoCusWright India 2016
Facebook Keynote for PhoCusWright India 2016
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
 
NSA Tweets Rohit kumar
NSA Tweets Rohit kumarNSA Tweets Rohit kumar
NSA Tweets Rohit kumar
 
Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reporting
 
Social Media Training at AED: Day 2
Social Media Training at AED: Day 2Social Media Training at AED: Day 2
Social Media Training at AED: Day 2
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 

Kürzlich hochgeladen

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Kürzlich hochgeladen (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 

Twitter analysis

  • 1. Performing sentiment analysis on Twitter data (2011 Norway attacks) Team – AparnaDhanashriJayaprakash – 50094768 HimanshuYadav – 50093151 Inder Puneet Singh – 50094241 Sabah Abdul Mannan Khan – 50094894 VidyaMulukutla - 50095830
  • 2. Analysis of Twitter Data Set Introduction Big Data is increasingly pertinent in today’s digitalized world and is being used in a lot of different domains. With social media being so pervasive, it makes logical sense to use it to generate the data sets for analysis in various areas from politics to entertainment.We have chosen ‘Twitter’ as our source for data since it has a wide user base that includes regular people as well as popular individuals from the fields of media, movies, sports and politics. There are a lot of analytical results that can be derived from a popular and widely used Social media platform like Twitter and we used the data generated from it through an implementation using Apache Hadoop and Hive. In order to gauge the reactions from the different users who responded to the significant events in the month of July 2011, we performed a Sentiment Analysis. Sentiment Analysis is the process of trying to gather subjective information through natural language processing, computational linguistics and text analysis. It is also known as opinion mining.There were two important and completely contrasting events that took place in July 2011for which we came up with a comparison analysis and the description of the events is as follows: The Norway attacks of 2011 were the most deadly attacks on the country. Two sequential explosions took place within a span of two hours on 22nd July 2011. The first one was a car bomb that took place in the executive governmental headquarters that killed eight people and injured around 209 people. The second one was a deadly assault that took place on an island. It was a summer camp organized by the youth division of the ruling party. An unidentified man gained access to the camp and open fired at the participating members. This attack claimed 69 lives and seriously injured 110 persons. The accused in the case, Anders Behring Breivik, was sentenced to 21 years in imprisonment.
  • 3. Analysis of Twitter Data Set Amy Winehouse was a hugely popular British singer and songwriter. Her work was critically as well as commercially appreciated and she won multiple Grammy Awards for her songs. Her sudden demise due to alcohol poisoning on 23rd July 2011 shocked millions of her fans worldwide and sent the online networking sites into frenzy. Hypothesis As per our hypothesis, we decided to evaluate how users from different geographical locations reacted to both the stories on twitter.We took the assumption that the Norway attackswould affect the public more as compared to the Amy Winehouse death and would garner more tweets, hashtags and retweets as it is a more important event in the sense that it was an attack in which many lives were lost and even more critically injured. We compared these two events using sentiment analysis. Technology For our implementation, we have used Apache Hadoop which was deployed on an Amazon EC2 instance for processing of data.For the installation of Hadoop master, we used m1.1large instance type whereas for the Hadoop slaves, we used m1.4small instance types. We elected the M1 general-purpose instance types primarily for their extremely low cost options for running applications. They are appropriate for a moderately good CPU performance. Apache Hive was used to analyze, summarize and query the data using a SQL type language known as HiveQL. Data Preparation Data Selection The data that was extracted was segregated into different tables for the sake of convenience of analysis. One of the tables from the Norway attacks event is as shown below -
  • 4. Analysis of Twitter Data Set Hashtag Count Oslo 466 Norway 396 tcot 308 oslo 244 p2 234 SAVEAMERICANOW 214 news 124 blamethemuslims 111 norway 110 breakingnews 93 isles 93 fb 90 islanders 88 cnn 82 Utoya 74 teaparty 61 osloexpl 55 News 55 prayfornorway 55 tlot 36 Breivik 34 socialmedia 34
  • 5. Analysis of Twitter Data Set politics 32 NFL 32 utoya 27 PrayForNorway 27 Utøya 27 CNN 26 Islam 24 oslobomb 24 Data Cleaning: Contrary to our perception that the data set would be limited to one specific time period of say one year, the information extracted from the dataset spanned over many years due to which there was no concentration of high density of information in one particular time period. Firstly, this meant finding events that occurred in a specific time period. Also, considering the fact that data in the data set is acquired from varied number of sources, there is often a lot of redundant data, which makes the deletion of duplicate information mandatory before any analysis can be conducted. Owing to the fact that we were dealing with huge data sets, we partitioned the data to make the analysis easier and also to improve query performance. Another important aspect of Data cleaning is Geo tagging locations. The reason that this needs to be considered is that the same address can be interpreted in various forms. For example, Bangalore, Bangalore Karnataka and Bangalore Karnataka India are all different ways to write the same location. In order to perform an accurate analysis, the location needs to be normalized and converted into the same
  • 6. Analysis of Twitter Data Set format. The technique that we used to do this is Google’s Geocoding API. This API assists by giving a straightforward method to convert a particular address into coordinates like latitudes and longitudes that can be applied for map positioning. Challenges faced during Implementation: Some of the hindrances that we encountered with the extracted data are:  Duplicate files: The extracted data returned a huge number of repetitive files with the same content. This is a huge annoyance, as single files with unique content must be filtered through additional processing. This is also very time consuming.  Parsing data: Parsing is a difficult aspect and it does not work owing to varied reasons such as if the data on Twitter consists of many languages. Another reason could be the that the JSON structure was closed incorrectly which limits the data read beyond this point.  Complete data not recovered: This issue deals with the non-recovery of complete data when extracting through Apache Hive. As we are dealing with huge data sets, a lot of extra programming and debugging is required to repair the situation. Parsing exceptions were created which were thatched by locating the erroneous files. Analysis After data selection and data cleaning process, different tables were selected that were representative of various aspects of the analysis with regards to the two events – Amy Winehouse and Norway attacks ; a comparison analysis for the two events along with asentiment
  • 7. Analysis of Twitter Data Set analysis for each of the two events. Following are the different aspects which will help proceed with an analysis of the events in hand – Data Distribution, Hashtags count table, URLS count table, Tweet sentiment, and Famous tweeters. Event 1: Amy Winehouse No of Tweets 0 5000 10000 15000 20000 25000 No of Tweets
  • 8. Analysis of Twitter Data Set URL Share Count http://t.co/0IGT940 http://t.co/kLYO5t5 http://huff.to/oDwgHC http://t.co/BtIzsiW http://t.co/CahfKYh http://on.msnbc.com/4dpW6f http://nyp.st/qYGM9L http://bit.ly/oapSdd http://t.co/TkKR8Qm http://n.pr/nnu5XS 0 100 200 300 400 500 600 Hashtag Count
  • 9. Analysis of Twitter Data Set Event 2: Norway attacks 0 50 100 150 200 250 300 350 400 450 SkyNewsBreak YouTube BreakingNews HuffingtonPost Reuters NewYorkPost iamshortymack RollingStone HotNewHipHop mashable User Mention Count No of Tweets 0 2000 4000 6000 8000 No of Tweets
  • 10. Analysis of Twitter Data Set 7% 7% 7% 5% 5% 4% 4% 4% 4%4%3%3%3%3% 3% 3% 3% 3% 3% 3% 3% 2% 2% 2%2%2%2%2%2%2% URL Share Count http://on.mash.to/nViorD http://bisi.pl/31b http://bit.ly http://budurl.com/2tl2 http://t.co/dPHb33j http://bit.ly/qd41UN http://apne.ws/qvdeXV http://bit http://t.co/AyS26mV http://twitpic.com/5tzsmx http://t.co/dXABr5T http://apne.ws/qi7CM5 0 50 100 150 200 250 300 350 400 450 500 Hashtag Count
  • 11. Analysis of Twitter Data Set Comparison Analysis The Amy Winehouse event occurred on 23rd of July,2011 whereas the Norway attacks event occurred on 22nd July, 2011. As can be seen from the charts, the number of tweets for event 1 peaked on the day of the event and had a steep drop over the week till they finally died down. On the other hand, the Norway attacks event, had maximum tweets on the day of the event and subsequently over the next couple of days while the drop in number of tweets was pretty gradual. However, it is interesting to note that event 1 garnered the maximum number of tweets of over 20000 on the day when it occurred. Despite being of more serious nature, event 2 saw much less number of tweets on the day of its occurrence. Sentiment Analysis The sentiments in terms of positive, negative and neutral tweets to the two events over a span of a week from 07/22/2011 to 07/31/2011 are visualized. Below are graphs that depict the same – 0 50 100 150 200 250 300 350 400 450 BreakingNews Reuters CBSNews YouTube HuffingtonPost YahooNews StateDept mpoppel ggreenwald SenatorSanders User Mention Count
  • 12. Analysis of Twitter Data Set Event 1: Amy Winehouse The Event 1 garnered maximum neutral tweets and minimum positive tweets on the whole. Event 2: Norway Attacks Event 2 also garnered maximum neutral tweets and minimum positive tweets on the whole. Interestingly, the number of negative tweets exceeded the neutral and positive tweets during the subsequent days of the event. 0 2000 4000 6000 8000 10000 12000 20-Jul-11 22-Jul-11 24-Jul-11 26-Jul-11 28-Jul-11 30-Jul-11 1-Aug-11 Tweet Count Positive tweet Negative Tweet Neutral Tweet 0 1000 2000 3000 4000 5000 6000 7000 8000 20-Jul-11 22-Jul-11 24-Jul-11 26-Jul-11 28-Jul-11 30-Jul-11 1-Aug-11 Tweet Count Positive Negative Neutral
  • 13. Analysis of Twitter Data Set Conclusion Managing huge amounts of data is becoming convenient with the advent of distributed file systems. They have the capability of managing and analyzing huge volumes of data that can help assess a particular event’s significance over a period of time. The analysis negates the hypothesis that we had initially assumed and brought us to the conclusion that Amy Winehouse event was as popular as an event as grave as the Norway attacks if not more. The retweets that the events generated assist in determining the most discussed issues among the twitter users. It is extremely surprising that a celebrity death can take precedence over assault of a nation. A reasoning for this could be that people are very conscious and careful upon commenting on issues that are sensitive in nature and choose to refrain from expressing views. The sentiment analysis reasserts this; with the graphs showing maximum neutral tweets to both the events, it can be interpreted that most people are reserved in their opinions and hence take a neutral stand while participating on a public platform where most activities are scrutinized especially an issue as delicate as the Norway attacks.
  • 14. Analysis of Twitter Data Set References http://en.wikipedia.org/wiki/Sentiment_Analysis http://en.wikipedia.org/wiki/Apache_Hive http://aws.amazon.com/ec2/instance-types/#selecting-instance-types https://developers.google.com/maps/documentation/geocoding/?hl=el