La nostra seconda presentazione al CLIC 2014: "Geometric and Statistical Analysis of Topic and Emotions in Corpora", con cui Francesco Tarasconi ha vinto l'attestato di Distinguished Young Paper, dato agli 8 migliori papers del convegno con un autore giovane.
Celi @Clic2014: Geometric and Statistical Analysis of Topic and Emotions in Corpora
1. Geometric and Statistical Analysis of Topics and Emotions in Corpora
Francesco Tarasconi - tarasconi@celi.it
Vittorio Di Tomaso - ditomaso@celi.it
Pisa, 9/12/2014
2. Introduction: Analysis of Emotions
Francesco Tarasconi and Vittorio Di Tomaso
2
NLP:
Topic detection
Sentiment analysis
Emotion detection
Many, potentially correlated, variables
Role of Data Analysis:
Define, visualize and understand emotional similarities
Focus of the present work: background, metholodogy, examples
5. Social TV, the “Second Screen”
Francesco Tarasconi and Vittorio Di Tomaso
5
Sharing of experiences (and emotions!) between viewers of the same program
Source: Blogmeter, www.blogmeter.it
Emotional profiles of audiences and, by extension, of whole shows / episodes
7. Vector Space Model Representations
Francesco Tarasconi and Vittorio Di Tomaso
7
DOCi = { topic A, topic B, ... , emotion x, emotion y, ... }
Annotated documents as vectors in a ntopic + nemotion dimensional space
Document-annotation indicator matrix D
TOPICi = [ frequency 1, frequency 2, ... , frequency nemotion ]
Topics as vectors in a nemotion dimensional space
Topic-emotion frequency matrix T
IMPRESSIONi = { topic A, emotion x }
Impressions as vectors in a ntopic + nemotion dimensional space
Impression-annotation indicator matrix J
8. Emotional Distances Between Topics
Francesco Tarasconi and Vittorio Di Tomaso
8
Key elements:
1)High variance in topic absolute frequencies
2)High variance in emotion absolute frequencies
3)A graphical representation is required
4)Why are two topics similar? A graphical representation can be obtained using by dimension reduction.
9. Simple and Multiple Correspondence Analysis
Francesco Tarasconi and Vittorio Di Tomaso
9
Strong link with PCA: dimension reduction, eigenvalue methods
CA (Hirschfeld, 1935) of contingency table T
SVD of standardized residual matrix
Principal coordinates and symmetric map
Inertia and quality of the representation
MCA of indicator matrix J or Burt matrix JTJ
Analysis of surveys (Benzecrì, 1960s – 1970s)
As a geometric method (Le Roux and Rouanet, 2004)
Adjustment of inertia (Greenacre, 2006)
10. Why MCA
Francesco Tarasconi and Vittorio Di Tomaso
10
1)It accounts for different volumes in the original variables (masses), but focuses on the shape of data (residuals)
2)Graphical method
3)Symmetric treatment of topics and emotions
16. Conclusions and Further Researches
Francesco Tarasconi and Vittorio Di Tomaso
16
We have shown how to represent and highlight important emotional relations between topics using carefully chosen multivariate techniques.
In future we would like to:
add information about the authors to our analysis;
study in greater detail the clouds of impressions, documents and authors.
17. We would like to thank:
V. Cosenza and S. Monotti Graziadei for stimulating these researches;
the ISI-CRT foundation and CELI S.R.L. for the support provided through the Lagrange Project;
A. Bolioli for the essential help and supervision in the preparation of this paper.
Grazie per l’attenzione!
Pisa, 9/12/2014