SlideShare a Scribd company logo
1 of 10
Download to read offline
Interaction-level relations for Opinion Analysis
               Putting forth the benefits of Textometry

               Sentiment Analysis Symposium 2011
               Manhattan Conference Center, New York, USA



                Marguerite Leenhardt - PhD Student in Applied Linguistics, NLP, Textometry   SYLED/CLA2T - Paris 3 University
                mleenhardt@le-semiopole.fr




                                                                                                                                April 12th, 2011

mardi 12 avril 2011
TEXTOMETRY ?

              - branch of statistical study of linguistic data                           TWOFOLD TEXT SEGMENTATION PROCESS
                                                                                      GENERATES THE DATASET’S CANVAS/FRAMEWORK

              - text considered as possessing its own internal structure

              - bypassing information extraction step (qualitative                                         CONTENTS
                                                                                                           textual sequences organized in
              coding)                                                        CORPUS                        sentences, paragraphs, ...

                 > applying statistical and probabilistic calculations to                   b.   b.
                 the units that make up comparable texts in a corpus                   a.   b.   b.
                 > mostly based on hypergeometric model and                                 b.
                 proximity algorithms                                                                          CONTAINERS
                                                                                                               annotation systems (e.g.
                 > reveals structures that would remain hidden due to                                          sentence or paragraph segmentation
                                                                                            d.   d.            markers considered a specific type of
                 the quantity of data                                                                          annotation on contents)
                                                                                       c.   d.   d.
              - robust method processing data without external                              d.
              ressources constraints (lexicons, dictionnaries, ontologies)

              - analyzing objects distribution within the corpus
              framework




mardi 12 avril 2011
IDENTIFYING MAJOR TRENDS AND OPPOSITIONS IN A DATASET

              - Corpus Cocoon : online media analysis following a product launch - 40 000 words

              - Factorial Correspondence Analysis is used to determine distance between textual objetcs compared on the basis of
              proximity algorithm (positioning sets of elements in the corpus space)

              - Closest objects heavily cite the press release ; blogs cite Named Entities (brand and product) but diverge from the press
              release.
                        !




                                       AFC output to compare user’s comments on different web supports ; french corpus


mardi 12 avril 2011
INTERACTION-LEVEL RELATIONS : WHY ?

              - textual interactions as the main material for Opinion Mining/Sentiment Analysis

              - contextual analysis as an important challenge (Pang & Lee, 2008) and a major ressource for
              interpretation (Somasundaran, 2010) : interactional features are informative on a global scale (discourse
              ≠ interaction)

              - Textometry as a means to go beyond the local context boundaries by taking global dimensions into
              account : text is considered a component in and of itself (bottum-up approach)

              - «A lot of information is often not captured in the handbuilt model and lost.» (Boiy et al., 2007)

              - qualitative coding should not be the first approach but a second step after mining corpus-based
              knowledge




mardi 12 avril 2011
INTERACTION-LEVEL RELATIONS : HOW ?
     annotating interactional relations between
     user’s contributions in a given discussion
     > linking and specifying containers




                                              > Corpus enhanced with qualitative information
                                              > Acquiring information on the context : conversational tree
                                              > Determining zones of intensity in a discussion feed (computer-
                                              assisted task)




                                                                                                                 Named Entities Recognition +
                                                                                                                 matchnig paraphrases
                                              > Analyzing linguistic specificness of linked containers vs. the
                                              whole corpus                                                       Corpus-driven lexical ressource
                                              > Building corpus-driven linguistic ressources (textometric        (LR) for thematic analysis
                                              objects)                                                           Corpus-driven lexical ressource
                                                                                                                 (LR) for opinion




mardi 12 avril 2011
PROJECTING THE CORPUS-DRIVEN LINGUISTIC RESSOURCES FOR OPINION

              - Corpus Cocoon : the LR is projected on the dataset’s canvas/framework to highlight distribution of opinions
              amongst UGCs (adaptation of the Appraisal Theory scale for opinion orientation)

              - Distributional Inventory is used to identify major trends in opinion expression ; here, most of UGCs are not
              relevant as they only cite the brand in congratulation messages to the bloggers who posted on the product launch.




                                                                                                      !
                                                      Opinion distribution amongst user’s comments




mardi 12 avril 2011
«I» NETWORK IN THE ORANGE CORPUS




mardi 12 avril 2011
«FORFAIT» IN THE ORANGE CORPUS




mardi 12 avril 2011
ORANGE LEXICO-SEMANTIC NETWORK




mardi 12 avril 2011
Merci !
           Marguerite Leenhardt PhD student
           mleenhardt@le-semiopole.fr




mardi 12 avril 2011

More Related Content

Viewers also liked

Chesterfield
ChesterfieldChesterfield
ChesterfieldGabirice
 
Motivating Visual Arts Students To Utilize Their Textbooks
Motivating Visual Arts Students To Utilize Their TextbooksMotivating Visual Arts Students To Utilize Their Textbooks
Motivating Visual Arts Students To Utilize Their Textbooksjabdurrashid
 
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéS
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéSSzakmai Gyakorlati FoglalkoztatóI EgyeztetéS
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéS987987
 
Legalis Munkavegzes
Legalis MunkavegzesLegalis Munkavegzes
Legalis Munkavegzes987987
 

Viewers also liked (6)

Chesterfield
ChesterfieldChesterfield
Chesterfield
 
Motivating Visual Arts Students To Utilize Their Textbooks
Motivating Visual Arts Students To Utilize Their TextbooksMotivating Visual Arts Students To Utilize Their Textbooks
Motivating Visual Arts Students To Utilize Their Textbooks
 
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéS
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéSSzakmai Gyakorlati FoglalkoztatóI EgyeztetéS
Szakmai Gyakorlati FoglalkoztatóI EgyeztetéS
 
Legalis Munkavegzes
Legalis MunkavegzesLegalis Munkavegzes
Legalis Munkavegzes
 
Daniel 2 B
Daniel 2 BDaniel 2 B
Daniel 2 B
 
Amina
AminaAmina
Amina
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Interaction-level relations for Opinion Analysis Putting forth the benefits of Textometry

  • 1. Interaction-level relations for Opinion Analysis Putting forth the benefits of Textometry Sentiment Analysis Symposium 2011 Manhattan Conference Center, New York, USA Marguerite Leenhardt - PhD Student in Applied Linguistics, NLP, Textometry SYLED/CLA2T - Paris 3 University mleenhardt@le-semiopole.fr April 12th, 2011 mardi 12 avril 2011
  • 2. TEXTOMETRY ? - branch of statistical study of linguistic data TWOFOLD TEXT SEGMENTATION PROCESS GENERATES THE DATASET’S CANVAS/FRAMEWORK - text considered as possessing its own internal structure - bypassing information extraction step (qualitative CONTENTS textual sequences organized in coding) CORPUS sentences, paragraphs, ... > applying statistical and probabilistic calculations to b. b. the units that make up comparable texts in a corpus a. b. b. > mostly based on hypergeometric model and b. proximity algorithms CONTAINERS annotation systems (e.g. > reveals structures that would remain hidden due to sentence or paragraph segmentation d. d. markers considered a specific type of the quantity of data annotation on contents) c. d. d. - robust method processing data without external d. ressources constraints (lexicons, dictionnaries, ontologies) - analyzing objects distribution within the corpus framework mardi 12 avril 2011
  • 3. IDENTIFYING MAJOR TRENDS AND OPPOSITIONS IN A DATASET - Corpus Cocoon : online media analysis following a product launch - 40 000 words - Factorial Correspondence Analysis is used to determine distance between textual objetcs compared on the basis of proximity algorithm (positioning sets of elements in the corpus space) - Closest objects heavily cite the press release ; blogs cite Named Entities (brand and product) but diverge from the press release. ! AFC output to compare user’s comments on different web supports ; french corpus mardi 12 avril 2011
  • 4. INTERACTION-LEVEL RELATIONS : WHY ? - textual interactions as the main material for Opinion Mining/Sentiment Analysis - contextual analysis as an important challenge (Pang & Lee, 2008) and a major ressource for interpretation (Somasundaran, 2010) : interactional features are informative on a global scale (discourse ≠ interaction) - Textometry as a means to go beyond the local context boundaries by taking global dimensions into account : text is considered a component in and of itself (bottum-up approach) - «A lot of information is often not captured in the handbuilt model and lost.» (Boiy et al., 2007) - qualitative coding should not be the first approach but a second step after mining corpus-based knowledge mardi 12 avril 2011
  • 5. INTERACTION-LEVEL RELATIONS : HOW ? annotating interactional relations between user’s contributions in a given discussion > linking and specifying containers > Corpus enhanced with qualitative information > Acquiring information on the context : conversational tree > Determining zones of intensity in a discussion feed (computer- assisted task) Named Entities Recognition + matchnig paraphrases > Analyzing linguistic specificness of linked containers vs. the whole corpus Corpus-driven lexical ressource > Building corpus-driven linguistic ressources (textometric (LR) for thematic analysis objects) Corpus-driven lexical ressource (LR) for opinion mardi 12 avril 2011
  • 6. PROJECTING THE CORPUS-DRIVEN LINGUISTIC RESSOURCES FOR OPINION - Corpus Cocoon : the LR is projected on the dataset’s canvas/framework to highlight distribution of opinions amongst UGCs (adaptation of the Appraisal Theory scale for opinion orientation) - Distributional Inventory is used to identify major trends in opinion expression ; here, most of UGCs are not relevant as they only cite the brand in congratulation messages to the bloggers who posted on the product launch. ! Opinion distribution amongst user’s comments mardi 12 avril 2011
  • 7. «I» NETWORK IN THE ORANGE CORPUS mardi 12 avril 2011
  • 8. «FORFAIT» IN THE ORANGE CORPUS mardi 12 avril 2011
  • 10. Merci ! Marguerite Leenhardt PhD student mleenhardt@le-semiopole.fr mardi 12 avril 2011