Snapshot of winning submissions- Jigsaw Academy ValueLabs Sentiment Analysis feb2015
1. Jigsaw Academy and ValueLabs
Sentiment Analysis Competition
A Snapshot of the Winning Submissions
2. Business Objective
The Company wants to see if we can use the Sentiment Score
derived from the comments section in the feedback form to
predict the recommended score
● To find out sentiment score using comments given by clients.
● To build a model to predict Recommended score (RECOM) using the derived
sentiment score
4. ● Mine the "Comments" section and arrive at a sentiment score
Association Matrix to Capture the correlation between words of interest
Clustering Dendograms to Interpret the Sentiment of the various QPs
based on word clusters
Derive an Linear regression Model using training dataset which
captures the dependency of RECOM on other QPs
Methodology
6. Results
Recommend(RECOM)
• Positive rating from the original dataset
• Positive cumulative weighted sentiment,
when SA was performed on the
“Comments”
• Positive sentiment using LRM
On Time Delivery(OTD)
• Low rating from the original dataset.
• Negative sentiment = -0.92, when SA
was performed on the “Comments”
• Word clusters from Dendrogram
• Need quality, delivery
Quality(QUAL)
• Low rating from the original dataset.
• Negative sentiment = -0.57, when SA was
performed on the “Comments”
• Word clusters from Dendrogram
• Need quality, delivery
• Improve process and testing
• Require detail understanding
Commitment(COMMIT)
• High rating from the original dataset.
• Positive sentiment = 3.85, when SA was
performed on the “Comments”
• Word clusters from Dendrogram
• Provide consistency, meet expectation
• Always commit really
• Happy service
7. Results
• Year 2011 has been the worst
Low rating from the original dataset.
Low sentiment, when SA was performed on the “Comments” section from 2011
Need to investigate what went wrong
• Year 2013 has been the best
High rating from the original dataset.
High sentiment, when SA was performed on the “Comments” section from 2013
Need to investigate what went right
• These countries have given low rating. Investigate.
MAL, ME, ROW
UK, USA sometimes
• These countries have given high rating(3/4), in general
IND, AUS, UK/Europe, USA
• The following LRM has been deduced
recom ~ (0.15*otd) + (0.15*flex) + (0.19*qual) + (0.11*proce) + (0.12*respons) + (0.11*partn) + (0.14*escal) + (0.12*minds)
9. ● Overall Sentiment: Positive, Sentiment Score of 2 approx.
● Sentiment Analysis of Comments
Extracted all positive and negative words
Identified positive and negative words in each comment
Used dictionary to assign sentiment score to words
Identified negations and adjusted word sentiment
Polarity of word's sentiment was reversed
Identified suggestions and recommendations
Sentiment score calculated for each comment
Methodology
10. Model
Model Used: Sentiment Score Density as only Predictor Variable
RECOM = 9.848 + 3.848*(Score Density)
Here, Normalized Score (Score Density is used for the modeling)
80% data-points taken for training the model and 20% testing
Results:
60% of cases were classified correctly
However, if only polarity is considered 86% of cases were classified correctly
Out of cases of incorrect classification, 4 cases have negative sentiment score- This
shows that model is not good in classifying cases with negative sentiments
11. Word Cloud Analysis
Most Frequent Positive Polarity words:
Excellent, Good, Like & Support
Most Frequent Negative Polarity words:
Issues, Issue
13. Conclusion
● Comparing sentiment score directly with Recommended Score- Although, only in
35% of cases Normalized Score matches perfectly with recommended Score, 88%
of cases were classified correctly in terms of polarity
● Therefore, sentiment score itself gives better result than using regression model
to predict recommended score.
15. Methodology
Jeffrey Breen Sentiment
Algorithm was implemented.
It estimates the sentiment by
assigning an integer score by
subtracting the number of
occurrences of negative words
from that of the positive
words.
16. Analysis
• The histogram and ggplot of the sentiment scores for all the comments spanning across the year 2009
to 2013 has been plotted.
• The plot is right tailed and we can infer that comments are more on the positive note and there are
very few negative comments.
17. Analysis
Plot of year-wise score distribution :Based on the plot we
can infer that there are more positive comments for the
company and minimal negative comments.
Sentiment Score : The total positive count, negative count and
Sentiment Score percentage for each year from 2009 to 2013
has been tabulated below
Ratings Score : The average rating for each record in the dataset is
calculated using the formula:
Score=(OTD+FLEX+QUAL+COMMIT+PROCE+TRU.REL+
RESPONS+PARTN+ESCAL+MINDS+OVERAL+RECOM)/12
18. Results
• From the table we can infer that sentiment score
obtained by using the Jeffrey Breen algorithm
almost coincide with the ratings score.
• Therefore we can conclude that the sentiment score
derived from the comments useful in predicting the
recommended score.
• The straight line represents a simple linear regression between the
rating score and the sentiment scores.
• According to the plot, we can infer that the sentiment score are
pretty close to our ratings score