SlideShare a Scribd company logo
1 of 42
General Assembly
Data Science
Airline Tweets
Sentiment Analysis
Natural language data pre-processing steps
02
PRE-PROCESSING
What were the data collection process.
When, where, and how?
01
DATA SETS
AGENDA
Sentiment Analysis
Using machine learning algorithms, we attempt
to classify positive and negative sentiments
04
MACHINE LEARNING
BINARY CLASSIFICATIONS
Using machine learning to predict positive,
neutral and negative sentiments and determine
the best model for data with unknown sentiments
05
MACHINE LEARNING
MULTI-CLASS CLASSIFICATIONS
Without using machine learning, can we use
some basic rules to classify sentiments?
03
RULE-BASED CLASSIFICATIONS
07
CLOSING THOUGHTS
FEEDBACK | Q&A
The summary and statistics of the collected airline
tweets
06
REASULTS
DATA SETS
Where, when and how?
#unitedairlines
#unitedair
@united
UNITED
#americanairlines
#americanair
@AmericanAir
AA
#deltaairlines
#deltaair
@delta
DELTA
#southwestairlines
#southwestair
@southwestair
SOUTHWEST
FOUR MAJOR AIRLINES
Sentiment Analysis
Reformatted/cleaned tweets with graded
sentiment of Major Airlines from Feb 2015
14,640 Tweets
KAGGLE
Commercial datasets provided by Newsroom
with machine graded tweets
4,000 Tweets
Newsroom
Using Python and twython to retrieve tweets
through Twitter’s API during 7 days period.
80,121 Tweets
TWITTER API
k
SOURCES
Sentiment Analysis
itsvatime.com
600 + 600 + 600 + 600
DATA PRE-
PROCESSING
Natural language processes to prepare and transform the tweet
content for various classification models and sentiment analysis
Major processing steps
NATURAL LANGUGE PROCESSING
Sentiment Analysis
For machine learning
Vector  Matrix… get it?
4 Vectorizer
Perform regular
expression
2 RegEx
Tokenize all tweet
contents
1 Tokenization
Remove all the English
stop words
3 Stop Words
Replacing all &amp symbols with the
actual words ‘and’ with a space
re.sub(r'&', r' and', tweet)
Replace ‘&amp’ Signs
Remove all annoying white spaces
re.sub('[s]+', ' ', tweet)
Remove Additional Spaces
Convert all #hashtagwords to just
‘hashtagwords’
re.sub(r'#([^s]+)', r'1', tweet)
Replace #hastags
Turn all https://... and www… format to
display just the word ‘URL’
re.sub('((www.[^s]+)|(https?://[^s]+))','URL',tweet)
Eliminate URLs
Convert all @username to display just
‘username’ without the ‘@’ sign
re.sub(r'@([^s]+)', r'1', tweet)
Replace @username
Turn all letters to lower cases. One
option is to retain original upper cases
Lower Cases
REGULAR EXPRESSION
Sentiment Analysis
tweet.lower()
&
RULE BASED
CLASSIFICATION
First we will use two simple rule-based techniques to conduct our sentiment analysis to see what happens
• (pos) x (pos) = (pos)
This movie is very good
• (pos) x (neg) = (neg)
This movie is not good at all
• (neg) x (neg) = (pos)
This movie was not bad
Require a Pre-defined List of Positive & Negative Words
‘COMMON SENSE’
CLASSIFICATION RULES
Accuracy
NEWSROOM (4,000)
KAGGLE (14,600)
MANUAL LABEL (2,400)
49.50 %
52.45 %
47.53 %
VADER is a lexicon and rule-
based sentiment analysis tool
that is especially attuned to
sentiments expressed in social
media.
Valence Aware Dictionary and sEntiment Reasoner
V.A.D.E.R.
SENTIMENT SCORES
Examples
{‘neg’: 0.0, ‘neu’: 0.293, ‘pos’: 0.707, ‘compound’: 0.8646}
“VADER is very smart, handsome, and funny!”
{‘neg’: 0.0, ‘neu’: 0.254, ‘pos’: 0.746, ‘compound’: 0.8316}
“VADER is smart, handsome and funny.”
{‘neg’: 0.0, ‘neu’: 0.233, ‘pos’: 0.767, ‘compound’: 0.9342}
“VADER is VERY SMART, handsome, and FUNNY!!!”
{‘neg’: 0.779, ‘neu’: 0.221, ‘pos’: 0.0, ‘compound’: -0.5461}
“Today SUX!”
{‘neg’: 0.0, ‘neu’: 0.637, ‘pos’: 0.363, ‘compound’: 0.431}
“At least it isn’t a horrible book.”
{‘neg’: 0.154, ‘neu’: 0.42, ‘pos’: 0.426, ‘compound’: 0.5974}
“Today kinda sux! But I’ll get by, lol :)”
NEWSROOM (4,000)
KAGGLE (14,600)
MANUAL LABEL (2,400)
54.75 %
38.00 %
54.02 %
Accuracy
MACHINE
LEARNING
Part I – Binary Classifications
Random Forest
Perform Random Forest Model2
Hyperplane
Linear SVM
Perform Support Vector Classifier3
Data Prep
Remove All Neutral Sentiment1
MACHINE LEARNING I
Sentiment Analysis
Training Accuracy: 99.88%
Testing Accuracy: 86.49%
TRAINING: 11,541 TESTING: 1,843
Training Accuracy: 99.91%
Testing Accuracy: 84.48%
TRAINING: 1,124 TESTING: 290
Training Accuracy: 99.87%
Testing Accuracy: 89.58%
TRAINING: 9,266 TESTING: 2,275
RANDOM FOREST
Sentiment Analysis
k
k
k
Training Accuracy: 99.12%
Testing Accuracy: 86.38%
Training Accuracy: 99.56%
Testing Accuracy: 84.14%
Training Accuracy: 99.18%
Testing Accuracy: 90.95%
SVC
Sentiment Analysis
TRAINING: 11,541 TESTING: 1,843
k
TRAINING: 9,266 TESTING: 2,275
kk
TRAINING: 1,124 TESTING: 290
MACHINE LEARING
Part II – Multi-Class Classifications
Apply Model on Testing
model.fit ( )
AdaBoost
MACHINE LEARNING II
Sentiment Analysis
Grid Search Cross-Validation
GridSearchCV ( )
5-Fold Cross Validation
StratifiedKFold ( )
TRAINING: 2,040 TESTING: 360 TRAINING: 14,640 TESTING: 2,400
k
SVC
ADABOOST
MODEL PERFORMANCE
While the test accuracy is slightly lower, using Kaggle data as
training set proved to be a stronger predictor based on the cross-
validation accuracy results. In addition, it is superior in positive and
neutral precision, not to mention much higher training volume.
96.89%
1.43%
33.85%
66.94%
64.86%
65.25%
50.00%
60.34%
60.88%
67.80%
0% 20% 40% 60% 80% 100% 120%
[ -1] Precision
[ne] Precision
[+1] Precision
Test Accuracy
CV Accuracy
Kaggle->Tweet Tweet->Tweet
SVC
MODEL PERFORMANCE
Again, while we see the precision of the twitter data is superior, the
Kaggle data yields a more balanced results for all three sentiments.
Because the overall test accuracy is about the same, we chose to
use Kaggle data as training due to its superior CV accuracy as well
as its large sum of data points (~8X).
93.33%
14.29%
47.69%
69.72%
66.39%
79.34%
49.10%
60.34%
69.08%
71.91%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
[ -1] Precision
[ne] Precision
[+1] Precision
Test Accuracy
CV Accuracy
Kaggle->Tweet Tweet->Tweet
CONCLUSION
Support Vector Classifier
Model Parameters: decision_fun=‘ovo’, kernel=‘linear’, cost=0.4, gamma=0
Use Entire Kaggle Dataset as Training Set (14,640 data points)
Apply Model to Tweets of Unknown Sentiment
STATISTICS
Now we have the model, let’s take a look at the
results of collected tweets!
5,705
5,478
5,728
10,516
12,567
9,647
11,718
18,762
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000
United
Southwest
Delta
American
Original Tweets Re-Tweets
NUMBER OF TWEETS
Sentiment Analysis
OVERALL SENTIMENT
Sentiment Analysis
SENTIMENT
59.5 %26.0%
14.5%
52,694
Original Tweets
Positive
Neutral
Negative
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
20,000
American Delta Southwest United
Negative Netual Positive
Percentage of Original TweetsNumber of Original Tweets
SENTIMENT PER AIRLINE
Sentiment Analysis
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
American Delta Southwest United
Negative Netual Positive
-
Positive Sentiment
“Cenk Uygur”
Negative Sentiment
AMERICAN AIRLINE
Sentiment Analysis
-
Positive Sentiment
“Delta Assist”
Negative Sentiment
DELTA AIRLINE
Sentiment Analysis
“Dalton Rapattoni”
Positive Sentiment
“Dalton Rapattoni”
Negative Sentiment
SOUTHWEST AIRLINE
Sentiment Analysis
“Happy Birthday”, “90th"
Positive Sentiment
“Muslim”, “Shame”
Negative Sentiment
UNITED AIRLINE
Sentiment Analysis
RETWEET PER SENTIMENT
Sentiment Analysis
NEUTRAL POSITIVE
26.0%
NEGATIVE
31.7%
14.5%
19.1%
59.5%
49.2%
RetweetsOriginal Tweets
If the sentiment is classify accurately, people
are more likely to retweet a neutral or positive
tweets than a negative tweet.
RT @JackAndJackReal: Thanks @Delta for adding our
movie to all your flights! If any of you are traveling and
bored make sure u peep it
3,535 Retweets…
RT @ggreenwald: Arab-American family of 5 going on
vacation - 3 young kids in tow - removed from @United
plane for "safety reasons" https:/…
352 Retweets…
RT @UMG: The perfect trio @theweeknd @ddlovato and
Lucian Grainge #MusicIsUniversal #UMGShowcase w/
@AmericanAir & @Citi https://t.co/J
3,795 Retweets…
RT @DaltonRapattoni: Been home in Dallas for hours now
but still in the airport. @SouthwestAir PLEASE find my
bag. Please RT. #SWAFindDalto
1,952 Retweets…
RT @Mets: We’re partnering with @Delta to send a lucky
fan to games 1 & 2! RT for your chance to win
#DeltaMetsSweeps https://t.co/CDELSe84
17,093 Retweets…
RT @antoniodelotero: this skank bitch was SO RUDE to
me on my flight!!!! @united I asked for her name and she
goes, "it's Britney bitch.”
3,329 Retweets…
RT @JensenAckles: @AmericanAir I know it's crazy
#DFWsnow I'm just excited 2see my baby girl! Man
freaking out next 2 me isn't helping
10,260 Retweets…
RT @SouthwestAir: Yo #Kimye, we're really happy for you,
we'll let you finish, but South is one of the best names of all
time.
3,362 Retweets…
MOST RETWEETED TWEETS
Sentiment Analysis
TWEET
SOURCES
The majority of the tweets are from iPhone. There is also
almost 20% of tweets from Android devices. The rest
from the Web client and other applications. There is little
difference between the source of negative and positive
tweets; but there seems to be more alternative sources
for the neutral sentiment.
0%
10%
20%
30%
40%
50%
60%
iPhone
Android
Web
Others
Negative Neutral Positive
Photo Link Video LinkPhoto
16,787
SENTIMENT PER MEDIA TYPE
Sentiment Analysis
1,299 540
SENTIMENT
PositiveNeutralNegative
ENDING
THOUGHTS
Discussing final thoughts and some future steps
https://github.com/michaelucsb/Python_Twitter-Sentiment_GA
Each data point being
validated by multiple people
MULTIPLE VALIDATION
Naïve Bayes
Lemmatization
OTHER MODELS
Current coordinates data
isn’t great for Geo-location
analysis
COORDINATES
Unsupervised model to
discover words association
CLUSTERING
THOUGHTS & FUTURE STEPS
Sentiment Analysis
THANK
YOU
Q & A
Project Description
Prior to machine learning, the project conducted two rule-based senti-
ment classifications. One of which simply count the number of positive
and negative words, and the other utilizes VADER lexicon, a pre-built
sentiment analysis tool for Python. Both rule-based sentiment
classifications achieved an accuracy of 47% to 55%.
Next, the project used machine learning algorithms – Random Forest,
and Support Vector Classifier – to conduct binary classification on
positive and negative sentiments by removing all known neutral tweets.
85% to 90% accuracy were achieved on the test dataset.
Finally, the project used Ada Boost and Support Vector Classifier to
conduct multi-class classifications by including back all neutral tweets.
Through grid-search and cross-validation, the project achieved a 70%
overall accuracy on prediction and an 80% precision on negative
sentiment prediction on the test dataset.
Using the best model and parameters found, the model is applied to
tweets with unknown sentiments. The statistics summary were included
in the later half of the presentation. The presented word clouds also
captured interesting stories during the time the tweets were collected.
The project aims to use rule-based classification and machine learning
algorithms to predict content sentiments of Twitter feeds that mentioned
four major airlines: American Airline, United, Delta, and Southwest.
The project collected 80,121 tweets during a 7-day period via Twitter’s
API using Twython, a Python package; among these tweets, 2,400
randomly selected tweets were given a sentiment rating manually
(test). Through Kaggle, the project also obtained a cleaned dataset
contains 14,640 airline tweets with already rated sentiments (training).
To start off the project, the tweets were processed using tokenization,
regular expression, stop words removals, and vectorization. In regular
expression, the project eliminated all URLs, “@” sign, “#” sign, and
replaced “&” sign with the word “and”.

More Related Content

What's hot

Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisSunil Kandari
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Eirini Ntoutsi
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptxRishita Gupta
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Sentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewSentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewAbdullah Moin
 

What's hot (20)

Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
Sentiment Analysis of Social Media Content: A multi-tool for listening to you...
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptx
 
Amazon seniment
Amazon senimentAmazon seniment
Amazon seniment
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Sentiment Analysis Using Product Review
Sentiment Analysis Using Product ReviewSentiment Analysis Using Product Review
Sentiment Analysis Using Product Review
 

Viewers also liked

R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
 
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersSentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersAnimesh Dalakoti
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...MMADave
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTrilok Sharma
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsSentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsLucas Alexander
 
Telecomin in indian industrial analysis
Telecomin in indian  industrial analysisTelecomin in indian  industrial analysis
Telecomin in indian industrial analysisNaveenkumar Sn
 
Industrial Analysis on power Industry
Industrial Analysis on power IndustryIndustrial Analysis on power Industry
Industrial Analysis on power IndustrySiva Konduri
 
Analysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationAnalysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationKofi Amoako-Gyan
 
industrial analysis
industrial analysisindustrial analysis
industrial analysistaimur2708
 
Airline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsAirline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsMillennium Aviation, Inc.
 
Peste Analysis Airline
Peste Analysis AirlinePeste Analysis Airline
Peste Analysis Airlinezeeshanvali
 
factors including economic and industrial analysis
factors including economic and industrial analysisfactors including economic and industrial analysis
factors including economic and industrial analysisvishnu1204
 
Toyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisToyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisMhara Dalangin
 
Web data from R
Web data from RWeb data from R
Web data from Rschamber
 
Emirates consumer behaviour anaysis
Emirates consumer  behaviour anaysisEmirates consumer  behaviour anaysis
Emirates consumer behaviour anaysisOlya Dyachuk
 
Industrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasIndustrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasSaravanan A
 

Viewers also liked (20)

R by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlinesR by example: mining Twitter for consumer attitudes towards airlines
R by example: mining Twitter for consumer attitudes towards airlines
 
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline TravellersSentiment Analysis - Analyzing Twitter Trend for Airline Travellers
Sentiment Analysis - Analyzing Twitter Trend for Airline Travellers
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
Who do Customers Like More? A Sentiment Analysis of SouthWest Air & Delta Air...
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text AnalyticsSentiment Analysis in Dynamics CRM using Azure Text Analytics
Sentiment Analysis in Dynamics CRM using Azure Text Analytics
 
Telecomin in indian industrial analysis
Telecomin in indian  industrial analysisTelecomin in indian  industrial analysis
Telecomin in indian industrial analysis
 
Industrial analysis on Civil Aviation sector
Industrial analysis on Civil Aviation sectorIndustrial analysis on Civil Aviation sector
Industrial analysis on Civil Aviation sector
 
Industrial Analysis on power Industry
Industrial Analysis on power IndustryIndustrial Analysis on power Industry
Industrial Analysis on power Industry
 
Analysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final PresentationAnalysis of Industrial Wind Turbines_Final Presentation
Analysis of Industrial Wind Turbines_Final Presentation
 
industrial analysis
industrial analysisindustrial analysis
industrial analysis
 
Airline Revenue Management Conference Presentations
Airline Revenue Management Conference PresentationsAirline Revenue Management Conference Presentations
Airline Revenue Management Conference Presentations
 
Peste Analysis Airline
Peste Analysis AirlinePeste Analysis Airline
Peste Analysis Airline
 
factors including economic and industrial analysis
factors including economic and industrial analysisfactors including economic and industrial analysis
factors including economic and industrial analysis
 
Toyota motors company-Industrial Analysis
Toyota motors company-Industrial AnalysisToyota motors company-Industrial Analysis
Toyota motors company-Industrial Analysis
 
Web data from R
Web data from RWeb data from R
Web data from R
 
Emirates consumer behaviour anaysis
Emirates consumer  behaviour anaysisEmirates consumer  behaviour anaysis
Emirates consumer behaviour anaysis
 
Industrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & GasIndustrial Analysis Presentation Oil & Gas
Industrial Analysis Presentation Oil & Gas
 
Malaysian airlines power point
Malaysian airlines power pointMalaysian airlines power point
Malaysian airlines power point
 

Similar to Sentiment Analysis of Airline Tweets

Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningJohn Edward Slough II
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Findings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingFindings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingShainaBoling829
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsYousef Fadila
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Projectjpeterson2058
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AIWalid Maalej
 
Younger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningYounger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningPrem Ananda
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Machine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionMachine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionSparkCognition
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsDEEPRAJ PATHAK
 
Top 25 location R&D Hardware
Top 25 location R&D HardwareTop 25 location R&D Hardware
Top 25 location R&D HardwareCEB TalentNeuron
 
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Matt Stubbs
 

Similar to Sentiment Analysis of Airline Tweets (20)

Course Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine LearningCourse Project for Coursera Practical Machine Learning
Course Project for Coursera Practical Machine Learning
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Cist2014 slides
Cist2014 slidesCist2014 slides
Cist2014 slides
 
CAR EVALUATION DATABASE
CAR EVALUATION DATABASECAR EVALUATION DATABASE
CAR EVALUATION DATABASE
 
Findings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport WritingFindings, Conclusions, & RecommendationsReport Writing
Findings, Conclusions, & RecommendationsReport Writing
 
Textual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie ReviewsTextual & Sentiment Analysis of Movie Reviews
Textual & Sentiment Analysis of Movie Reviews
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Peterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_ProjectPeterson_-_Machine_Learning_Project
Peterson_-_Machine_Learning_Project
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AI
 
Younger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep LearningYounger: Predicting Age with Deep Learning
Younger: Predicting Age with Deep Learning
 
Simple rules for building robust machine learning models
Simple rules for building robust machine learning modelsSimple rules for building robust machine learning models
Simple rules for building robust machine learning models
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
ASA.pptx
ASA.pptxASA.pptx
ASA.pptx
 
Machine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognitionMachine Learning and Cognitive Fingerprinting - SparkCognition
Machine Learning and Cognitive Fingerprinting - SparkCognition
 
Effort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software ProjectsEffort Estimation Techniques used in Software Projects
Effort Estimation Techniques used in Software Projects
 
Top 25 location R&D Hardware
Top 25 location R&D HardwareTop 25 location R&D Hardware
Top 25 location R&D Hardware
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
 
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
Big Data LDN 2017: Cognitive Search & Analytics – Bringing the Power of AI to...
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Sentiment Analysis of Airline Tweets

  • 1. General Assembly Data Science Airline Tweets Sentiment Analysis
  • 2. Natural language data pre-processing steps 02 PRE-PROCESSING What were the data collection process. When, where, and how? 01 DATA SETS AGENDA Sentiment Analysis
  • 3. Using machine learning algorithms, we attempt to classify positive and negative sentiments 04 MACHINE LEARNING BINARY CLASSIFICATIONS Using machine learning to predict positive, neutral and negative sentiments and determine the best model for data with unknown sentiments 05 MACHINE LEARNING MULTI-CLASS CLASSIFICATIONS Without using machine learning, can we use some basic rules to classify sentiments? 03 RULE-BASED CLASSIFICATIONS
  • 4. 07 CLOSING THOUGHTS FEEDBACK | Q&A The summary and statistics of the collected airline tweets 06 REASULTS
  • 7. Reformatted/cleaned tweets with graded sentiment of Major Airlines from Feb 2015 14,640 Tweets KAGGLE Commercial datasets provided by Newsroom with machine graded tweets 4,000 Tweets Newsroom Using Python and twython to retrieve tweets through Twitter’s API during 7 days period. 80,121 Tweets TWITTER API k SOURCES Sentiment Analysis
  • 9. DATA PRE- PROCESSING Natural language processes to prepare and transform the tweet content for various classification models and sentiment analysis
  • 10. Major processing steps NATURAL LANGUGE PROCESSING Sentiment Analysis For machine learning Vector  Matrix… get it? 4 Vectorizer Perform regular expression 2 RegEx Tokenize all tweet contents 1 Tokenization Remove all the English stop words 3 Stop Words
  • 11. Replacing all &amp symbols with the actual words ‘and’ with a space re.sub(r'&', r' and', tweet) Replace ‘&amp’ Signs Remove all annoying white spaces re.sub('[s]+', ' ', tweet) Remove Additional Spaces Convert all #hashtagwords to just ‘hashtagwords’ re.sub(r'#([^s]+)', r'1', tweet) Replace #hastags Turn all https://... and www… format to display just the word ‘URL’ re.sub('((www.[^s]+)|(https?://[^s]+))','URL',tweet) Eliminate URLs Convert all @username to display just ‘username’ without the ‘@’ sign re.sub(r'@([^s]+)', r'1', tweet) Replace @username Turn all letters to lower cases. One option is to retain original upper cases Lower Cases REGULAR EXPRESSION Sentiment Analysis tweet.lower() &
  • 12. RULE BASED CLASSIFICATION First we will use two simple rule-based techniques to conduct our sentiment analysis to see what happens
  • 13. • (pos) x (pos) = (pos) This movie is very good • (pos) x (neg) = (neg) This movie is not good at all • (neg) x (neg) = (pos) This movie was not bad Require a Pre-defined List of Positive & Negative Words ‘COMMON SENSE’ CLASSIFICATION RULES
  • 14. Accuracy NEWSROOM (4,000) KAGGLE (14,600) MANUAL LABEL (2,400) 49.50 % 52.45 % 47.53 %
  • 15. VADER is a lexicon and rule- based sentiment analysis tool that is especially attuned to sentiments expressed in social media. Valence Aware Dictionary and sEntiment Reasoner V.A.D.E.R. SENTIMENT SCORES
  • 16. Examples {‘neg’: 0.0, ‘neu’: 0.293, ‘pos’: 0.707, ‘compound’: 0.8646} “VADER is very smart, handsome, and funny!” {‘neg’: 0.0, ‘neu’: 0.254, ‘pos’: 0.746, ‘compound’: 0.8316} “VADER is smart, handsome and funny.” {‘neg’: 0.0, ‘neu’: 0.233, ‘pos’: 0.767, ‘compound’: 0.9342} “VADER is VERY SMART, handsome, and FUNNY!!!” {‘neg’: 0.779, ‘neu’: 0.221, ‘pos’: 0.0, ‘compound’: -0.5461} “Today SUX!” {‘neg’: 0.0, ‘neu’: 0.637, ‘pos’: 0.363, ‘compound’: 0.431} “At least it isn’t a horrible book.” {‘neg’: 0.154, ‘neu’: 0.42, ‘pos’: 0.426, ‘compound’: 0.5974} “Today kinda sux! But I’ll get by, lol :)”
  • 17. NEWSROOM (4,000) KAGGLE (14,600) MANUAL LABEL (2,400) 54.75 % 38.00 % 54.02 % Accuracy
  • 18. MACHINE LEARNING Part I – Binary Classifications
  • 19. Random Forest Perform Random Forest Model2 Hyperplane Linear SVM Perform Support Vector Classifier3 Data Prep Remove All Neutral Sentiment1 MACHINE LEARNING I Sentiment Analysis
  • 20. Training Accuracy: 99.88% Testing Accuracy: 86.49% TRAINING: 11,541 TESTING: 1,843 Training Accuracy: 99.91% Testing Accuracy: 84.48% TRAINING: 1,124 TESTING: 290 Training Accuracy: 99.87% Testing Accuracy: 89.58% TRAINING: 9,266 TESTING: 2,275 RANDOM FOREST Sentiment Analysis k k k
  • 21. Training Accuracy: 99.12% Testing Accuracy: 86.38% Training Accuracy: 99.56% Testing Accuracy: 84.14% Training Accuracy: 99.18% Testing Accuracy: 90.95% SVC Sentiment Analysis TRAINING: 11,541 TESTING: 1,843 k TRAINING: 9,266 TESTING: 2,275 kk TRAINING: 1,124 TESTING: 290
  • 22. MACHINE LEARING Part II – Multi-Class Classifications
  • 23. Apply Model on Testing model.fit ( ) AdaBoost MACHINE LEARNING II Sentiment Analysis Grid Search Cross-Validation GridSearchCV ( ) 5-Fold Cross Validation StratifiedKFold ( ) TRAINING: 2,040 TESTING: 360 TRAINING: 14,640 TESTING: 2,400 k SVC
  • 24. ADABOOST MODEL PERFORMANCE While the test accuracy is slightly lower, using Kaggle data as training set proved to be a stronger predictor based on the cross- validation accuracy results. In addition, it is superior in positive and neutral precision, not to mention much higher training volume. 96.89% 1.43% 33.85% 66.94% 64.86% 65.25% 50.00% 60.34% 60.88% 67.80% 0% 20% 40% 60% 80% 100% 120% [ -1] Precision [ne] Precision [+1] Precision Test Accuracy CV Accuracy Kaggle->Tweet Tweet->Tweet
  • 25. SVC MODEL PERFORMANCE Again, while we see the precision of the twitter data is superior, the Kaggle data yields a more balanced results for all three sentiments. Because the overall test accuracy is about the same, we chose to use Kaggle data as training due to its superior CV accuracy as well as its large sum of data points (~8X). 93.33% 14.29% 47.69% 69.72% 66.39% 79.34% 49.10% 60.34% 69.08% 71.91% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% [ -1] Precision [ne] Precision [+1] Precision Test Accuracy CV Accuracy Kaggle->Tweet Tweet->Tweet
  • 26. CONCLUSION Support Vector Classifier Model Parameters: decision_fun=‘ovo’, kernel=‘linear’, cost=0.4, gamma=0 Use Entire Kaggle Dataset as Training Set (14,640 data points) Apply Model to Tweets of Unknown Sentiment
  • 27. STATISTICS Now we have the model, let’s take a look at the results of collected tweets!
  • 28. 5,705 5,478 5,728 10,516 12,567 9,647 11,718 18,762 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 United Southwest Delta American Original Tweets Re-Tweets NUMBER OF TWEETS Sentiment Analysis
  • 29. OVERALL SENTIMENT Sentiment Analysis SENTIMENT 59.5 %26.0% 14.5% 52,694 Original Tweets Positive Neutral Negative
  • 30. 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 American Delta Southwest United Negative Netual Positive Percentage of Original TweetsNumber of Original Tweets SENTIMENT PER AIRLINE Sentiment Analysis 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% American Delta Southwest United Negative Netual Positive
  • 31. - Positive Sentiment “Cenk Uygur” Negative Sentiment AMERICAN AIRLINE Sentiment Analysis
  • 32. - Positive Sentiment “Delta Assist” Negative Sentiment DELTA AIRLINE Sentiment Analysis
  • 33. “Dalton Rapattoni” Positive Sentiment “Dalton Rapattoni” Negative Sentiment SOUTHWEST AIRLINE Sentiment Analysis
  • 34. “Happy Birthday”, “90th" Positive Sentiment “Muslim”, “Shame” Negative Sentiment UNITED AIRLINE Sentiment Analysis
  • 35. RETWEET PER SENTIMENT Sentiment Analysis NEUTRAL POSITIVE 26.0% NEGATIVE 31.7% 14.5% 19.1% 59.5% 49.2% RetweetsOriginal Tweets If the sentiment is classify accurately, people are more likely to retweet a neutral or positive tweets than a negative tweet.
  • 36. RT @JackAndJackReal: Thanks @Delta for adding our movie to all your flights! If any of you are traveling and bored make sure u peep it 3,535 Retweets… RT @ggreenwald: Arab-American family of 5 going on vacation - 3 young kids in tow - removed from @United plane for "safety reasons" https:/… 352 Retweets… RT @UMG: The perfect trio @theweeknd @ddlovato and Lucian Grainge #MusicIsUniversal #UMGShowcase w/ @AmericanAir & @Citi https://t.co/J 3,795 Retweets… RT @DaltonRapattoni: Been home in Dallas for hours now but still in the airport. @SouthwestAir PLEASE find my bag. Please RT. #SWAFindDalto 1,952 Retweets… RT @Mets: We’re partnering with @Delta to send a lucky fan to games 1 & 2! RT for your chance to win #DeltaMetsSweeps https://t.co/CDELSe84 17,093 Retweets… RT @antoniodelotero: this skank bitch was SO RUDE to me on my flight!!!! @united I asked for her name and she goes, "it's Britney bitch.” 3,329 Retweets… RT @JensenAckles: @AmericanAir I know it's crazy #DFWsnow I'm just excited 2see my baby girl! Man freaking out next 2 me isn't helping 10,260 Retweets… RT @SouthwestAir: Yo #Kimye, we're really happy for you, we'll let you finish, but South is one of the best names of all time. 3,362 Retweets… MOST RETWEETED TWEETS Sentiment Analysis
  • 37. TWEET SOURCES The majority of the tweets are from iPhone. There is also almost 20% of tweets from Android devices. The rest from the Web client and other applications. There is little difference between the source of negative and positive tweets; but there seems to be more alternative sources for the neutral sentiment. 0% 10% 20% 30% 40% 50% 60% iPhone Android Web Others Negative Neutral Positive
  • 38. Photo Link Video LinkPhoto 16,787 SENTIMENT PER MEDIA TYPE Sentiment Analysis 1,299 540 SENTIMENT PositiveNeutralNegative
  • 40. https://github.com/michaelucsb/Python_Twitter-Sentiment_GA Each data point being validated by multiple people MULTIPLE VALIDATION Naïve Bayes Lemmatization OTHER MODELS Current coordinates data isn’t great for Geo-location analysis COORDINATES Unsupervised model to discover words association CLUSTERING THOUGHTS & FUTURE STEPS Sentiment Analysis
  • 42. Project Description Prior to machine learning, the project conducted two rule-based senti- ment classifications. One of which simply count the number of positive and negative words, and the other utilizes VADER lexicon, a pre-built sentiment analysis tool for Python. Both rule-based sentiment classifications achieved an accuracy of 47% to 55%. Next, the project used machine learning algorithms – Random Forest, and Support Vector Classifier – to conduct binary classification on positive and negative sentiments by removing all known neutral tweets. 85% to 90% accuracy were achieved on the test dataset. Finally, the project used Ada Boost and Support Vector Classifier to conduct multi-class classifications by including back all neutral tweets. Through grid-search and cross-validation, the project achieved a 70% overall accuracy on prediction and an 80% precision on negative sentiment prediction on the test dataset. Using the best model and parameters found, the model is applied to tweets with unknown sentiments. The statistics summary were included in the later half of the presentation. The presented word clouds also captured interesting stories during the time the tweets were collected. The project aims to use rule-based classification and machine learning algorithms to predict content sentiments of Twitter feeds that mentioned four major airlines: American Airline, United, Delta, and Southwest. The project collected 80,121 tweets during a 7-day period via Twitter’s API using Twython, a Python package; among these tweets, 2,400 randomly selected tweets were given a sentiment rating manually (test). Through Kaggle, the project also obtained a cleaned dataset contains 14,640 airline tweets with already rated sentiments (training). To start off the project, the tweets were processed using tokenization, regular expression, stop words removals, and vectorization. In regular expression, the project eliminated all URLs, “@” sign, “#” sign, and replaced “&” sign with the word “and”.