3. Sentiment Definition
• Sentiment means “a general thought, view, feeling, emotion,
opinion, or sense,” Sentiment analysis - “the use of natural
language processing, text analysis, and computational
linguistics to identify and extract subjective information in
source materials.”
• To perform a sentiment analysis about some event, we need
to teach computers what a sentiment is (i.e., how to define
“positive” or “negative” and “good” or “bad”). This is where
machine learning comes in: we must teach computers the
meaning of positive, negative, and so on
4. SENTIMENT DATA
• Data is fetched from twitter for this project
Downloading Data From Social API:
• Need to register your app in twitter apps
• Get Access token ,Access token secret ,
consumer key , consumer secret
• Save the data to run along with your code to
fetch data
5. Cleaning the data
• Remove html link , re tweet entities , remove
hastag , @ , punctuation , numbers , white
spaces and slang words
• Convert the collected tweets to lower case
• Remove NA and repeated tweets
• I have written a function to perform all this
“cleanuptweets.R”
6. Scoring the Sentiment
• To score the tweets I have tokenized the sentence into
words and then check these words against a list of
positive and negative words.
• opinion lexicon Positive and Negative list of words are
downloaded and compared to the tweets
• My score.sentiment(), computes the raw sentiment
based on the simple matching algorithm – does boolean
match to the list already present and sums the scores as
total positive or negative
• Return a data frame with respective sentence and the
score
7. Naives Bayes – Emotion
• Naive Bayes is a probabilistic model that is
unsurprisingly built upon a naive
interpretation of Bayesian statistics
• Naive Bayes is a linear classifier
• For texts - we use a probabilistic method such
as the NBC.
• Classify-emotion() function in sentiment
package is used (Anger , disgust , fear , joy
sadness , surprise )
8. Naïve Bayes – Polarity
• classify_polarity function is used to classify the
polarity of the words in the tweets in built in
Sentiment packages
• Say positive , Negative or neutral
• The idea is to compute the log likelihood of a tweet,
assuming it belongs to either of the two classes.
Once these likelihoods are calculated, a ratio of the
pos-likelihood to neg-likelihood is calculated, and,
based on this ratio, the tweets are classified as
belonging to a particular class. It's important to note
that if this ratio turns out to be 1, then the overall
sentiment of the tweet is assumed to be "neutral".
10. Word cloud
• Most frequently used words associated to the
tweets emotions are represented in form of
word cloud
11.
12. Deployment
• I have used Rshiny app for deployment
• Shiny has better visualization
• Ui.r and Server.r are the two files needed .
• Just three lines of code to run the program for
users who are not familiar with R
• Code can be found in my github link