Interaction-level relations for Opinion Analysis Putting forth the benefits of Textometry
1. Interaction-level relations for Opinion Analysis
Putting forth the benefits of Textometry
Sentiment Analysis Symposium 2011
Manhattan Conference Center, New York, USA
Marguerite Leenhardt - PhD Student in Applied Linguistics, NLP, Textometry SYLED/CLA2T - Paris 3 University
mleenhardt@le-semiopole.fr
April 12th, 2011
mardi 12 avril 2011
2. TEXTOMETRY ?
- branch of statistical study of linguistic data TWOFOLD TEXT SEGMENTATION PROCESS
GENERATES THE DATASET’S CANVAS/FRAMEWORK
- text considered as possessing its own internal structure
- bypassing information extraction step (qualitative CONTENTS
textual sequences organized in
coding) CORPUS sentences, paragraphs, ...
> applying statistical and probabilistic calculations to b. b.
the units that make up comparable texts in a corpus a. b. b.
> mostly based on hypergeometric model and b.
proximity algorithms CONTAINERS
annotation systems (e.g.
> reveals structures that would remain hidden due to sentence or paragraph segmentation
d. d. markers considered a specific type of
the quantity of data annotation on contents)
c. d. d.
- robust method processing data without external d.
ressources constraints (lexicons, dictionnaries, ontologies)
- analyzing objects distribution within the corpus
framework
mardi 12 avril 2011
3. IDENTIFYING MAJOR TRENDS AND OPPOSITIONS IN A DATASET
- Corpus Cocoon : online media analysis following a product launch - 40 000 words
- Factorial Correspondence Analysis is used to determine distance between textual objetcs compared on the basis of
proximity algorithm (positioning sets of elements in the corpus space)
- Closest objects heavily cite the press release ; blogs cite Named Entities (brand and product) but diverge from the press
release.
!
AFC output to compare user’s comments on different web supports ; french corpus
mardi 12 avril 2011
4. INTERACTION-LEVEL RELATIONS : WHY ?
- textual interactions as the main material for Opinion Mining/Sentiment Analysis
- contextual analysis as an important challenge (Pang & Lee, 2008) and a major ressource for
interpretation (Somasundaran, 2010) : interactional features are informative on a global scale (discourse
≠ interaction)
- Textometry as a means to go beyond the local context boundaries by taking global dimensions into
account : text is considered a component in and of itself (bottum-up approach)
- «A lot of information is often not captured in the handbuilt model and lost.» (Boiy et al., 2007)
- qualitative coding should not be the first approach but a second step after mining corpus-based
knowledge
mardi 12 avril 2011
5. INTERACTION-LEVEL RELATIONS : HOW ?
annotating interactional relations between
user’s contributions in a given discussion
> linking and specifying containers
> Corpus enhanced with qualitative information
> Acquiring information on the context : conversational tree
> Determining zones of intensity in a discussion feed (computer-
assisted task)
Named Entities Recognition +
matchnig paraphrases
> Analyzing linguistic specificness of linked containers vs. the
whole corpus Corpus-driven lexical ressource
> Building corpus-driven linguistic ressources (textometric (LR) for thematic analysis
objects) Corpus-driven lexical ressource
(LR) for opinion
mardi 12 avril 2011
6. PROJECTING THE CORPUS-DRIVEN LINGUISTIC RESSOURCES FOR OPINION
- Corpus Cocoon : the LR is projected on the dataset’s canvas/framework to highlight distribution of opinions
amongst UGCs (adaptation of the Appraisal Theory scale for opinion orientation)
- Distributional Inventory is used to identify major trends in opinion expression ; here, most of UGCs are not
relevant as they only cite the brand in congratulation messages to the bloggers who posted on the product launch.
!
Opinion distribution amongst user’s comments
mardi 12 avril 2011