Human Factors of XR: Using Human Factors to Design XR Systems
A Probabilistic Approach to Tweets' Sentiment Classification - ACII 2013 Conference
1. Francesco Colace, Massimo De Santo, Luca Greco
DIEM –Università degli Studi di Salerno
{fcolace, desanto, lgreco}@unisa.it
ACII 2013 – Geneva, 2-5 September 2013
2. Web 2.0 (or Web X.Y) rules!
Social Networks, Blogs, Microblogs, Reviews’
Collectors Sites: huge and terrific quantity of
heterogeneus and opinonated data
ACII 2013 – Geneva, 2-5 September 2013
3. Open issues:
o How to manage this information?
o How to extract the sentiment inside the data?
o How to understand something about the users?
o How to evaluate the opinion of people about some topics or
products?
Sentiment Analysis
ACII 2013 – Geneva, 2-5 September 2013
4. Brief introduction to the Sentiment Analysis
o Related Works
Towards a Sentiment Analysis Framework
o The Proposed Approach
• The LDAApproach
• The Mixed Graph of Terms
• A sentiment mining algorithm
Experimental results
Conclusions and Future Works
ACII 2013 – Geneva, 2-5 September 2013
5. Sentiment:
o a thought, view, or attitude, especially based mainly on emotion instead
of reason
Sentiment Analysis (as known as Opinion mining):
o use of Natural Language Processing (NLP) and computational
techniques to automate the extraction and classification of sentiment
from unstructured texts
ACII 2013 – Geneva, 2-5 September 2013
6. Consumer information
o Product reviews (Amazon, e-Bay, …)
Marketing
o Consumer attitudes
o Trends
Politics
o Politicians want to know voters’ point of views
o Voters want to know policitians’ stances and who else supports them
Social
o Find like-minded individuals or communities
ACII 2013 – Geneva, 2-5 September 2013
7. What features adopt?
o Words
o Sentences
How to interpret features for sentiment detection?
o As a bag of words
o By the use of annotated lexicons
o According to syntactic patterns
o Analyzing the paragraph structure
ACII 2013 – Geneva, 2-5 September 2013
9. By the use of the Bag of Words approach, a document
can be represented as an ordered set of words
Problems:
o What words express better the sentiment in a text?
o How to compare various «bag of words» derived from texts with the
same sentiment?
o By the use of the bag of words is it possible to represent the documents’
domain of interest?
ACII 2013 – Geneva, 2-5 September 2013
10. The mixed Graph of Terms is a «graph based» representation
of documents
In the proposed approach, a mixed Graph of Terms is obtained
by an automatic extraction of words based on probabilistic
clustering techniques as Latent Dirichlet Allocation (LDA)
In a mixed Graph of Terms the words are linked according to
their mutual occurence probability and «aggregating_word»
and «aggregated_words» can be recognized
Our proposal: a mixed Graph of Terms can be used as a
«sentiment filter»
ACII 2013 – Geneva, 2-5 September 2013
11. In the proposed approach, in a mixed Graph of Terms two
different layers can be recognized:
The Aggregator Layer: the words with higher degree of
interconnection with the words that are in the documents
The “Aggregated Words” Layer: this layer expresses words
that have higher degree of interconnection with one or more
Aggregator Word
ACII 2013 – Geneva, 2-5 September 2013
12. In natural language processing, Latent Dirichlet Allocation (LDA) is a
generative model that allows sets of observations to be explained by
unobserved groups that explain why some parts of the data are similar
For example, if observations are words collected into documents, it
posits that each document is a mixture of a small number of topics and
that each word's creation is attributable to one of the document's topics
The basic idea is that the documents are represented as random
mixtures over latent topics, where a topic is characterized by a
distribution over words
By the use of the Latent Dirichlet Allocation technique a set of
documents can be represented as a mixed Graph of Terms
ACII 2013 – Geneva, 2-5 September 2013
15. Step_1: Learn a mixed Graph of Terms by the
use of labelled documents (i.e. Positive or
Negative) obtaining:
o mGT positive
o mGT negative
Step_2: Use the mixed Graph of Terms as filter
in order to classify the sentiment of texts
o Comparing concepts that are both in the mGTs both
in the text
o Comparing words that are both in the mGTs both in
the text
ACII 2013 – Geneva, 2-5 September 2013
22. Pro:
o Indipendent from Language
o Fast classification
o Continous Upgrade
o Little Training Set
Cons:
o In general, long Time for mGT building process
o An Annotated Lexicon is needed
ACII 2013 – Geneva, 2-5 September 2013
23. To improve the classification by the continous update of
the training set
To Introduce SentiWordnet as Annotated lexicon
To adopt an ontological formalism for a better
representation of the mGT
To build a bigger tweets’ dataset
ACII 2013 – Geneva, 2-5 September 2013
24. ACII 2013 – Geneva, 2-5 September 2013
Don’t forget to tweet your sentiment!!!