This presentation compares four tools for analysing the sentiment in the content of free-text survey responses concerning a healthcare information website. It was completed by Despo Georgiou as part of her internship at UXLabs (http://uxlabs.co.uk)
5. Sentiment Analysis – Examples
Surveys: analyse open-ended questions
Business & Governments: assist in the
decision-making process & monitor
negative communication
Consumer feedback: analyse reviews
Health: analyse biomedical text
6. Aims & Objectives
Can existing Sentiment Analysis tools
respond to the needs of any healthcare-
related matter?
Is it possible to accurate replicate human
language using machines?
7. The case study details
8 survey questions (open & close-ended)
Analysed 137 responses based on the
question: “What is your feedback?”
Commercial tools: Semantria & TheySay
Non-commercial tools: Google
Predication API & WEKA
15. Google Prediction API
1) Pre-process the data:
punctuation & capital removal,
account for negation
2) Separate into training and testing sets
3) Insert pre-labelled data
4) Train model
5) Test model
6) Cross validation: 4-fold
7) Compare with baseline
16. Google Prediction API – Results
5
122
10
Classification Results
Neutral Negative Positive
17. WEKA
1) Separate into training and testing sets
2) Choose graphical user interface: “The
Explorer”
3) Insert pre-labelled data
4) Pre-process the data:
punctuation, capital & stopwords
removal and alphabetically tokenize
18. WEKA
5) Consider resampling:
whether a balanced dataset is
preferred
6) Choose classifier: “Naïve Bayes”
7) Classify using cross validation: 4-fold
19. WEKA – Results
Resampling:
10% increase in precision
6% increase in accuracy
Overall, 82% correctly classified
20. The tools
Semantria: range between -2 and 2
TheySay: three percentages for negative,
positive & neutral
Google Prediction API: three values for
negative, positive & neutral
WEKA: percentage of correctly classified
26. Evaluation:
Single-sentence responses
Tool
Accuracy based on
correct classification
All
responses
Single-
sentence
Responses
Commercial Tools
Semantria 51.09% 53.49%
TheySay 68.61% 72.09%
Non-Commercial Tools
Google Prediction API 72.25% 54%
WEKA 82.35% 70%
27. Conclusions
Semantria: business use
TheySay: prepare for competition &
academic research
Google Prediction API: classification
WEKA: extraction & classification in
healthcare
29. Conclusions
Is it possible to accurate replicate human
language using machines?
Approx. 70% accuracy for all tools
(except Semantria)
WEKA: most powerful tool
30. Conclusions
Can existing SA tools respond to the needs
of any healthcare-related matter?
Commercial tools can not respond
Non-commercial can be trained
31. Limitations
Only four tools
Small dataset
Potential errors in manual classification
Detailed analysis of single-sentence
responses was omitted
32. Recommendations
Examine reliability of other commercial
tools
Investigate other non-commercial tools,
especially NLTK and GATE
Examine other classifiers (SVM & MaxEnt)
Investigate all WEKA’s GUI
33. Recommendations
Verify labels using more people
Label sentence as well as the whole
response
Negativity associated with long reviews