SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
A tailor-made one-size-fits-all
approach to sentiment analysis


            Diana Maynard
       University of Sheffield, UK




    Sentment Analysis Symposium, San Francisco, October 2012
What's the problem?

• Like most language engineering tasks, sentiment analysis works best
  on a specific domain, in a single language, where the task is clear
  and the range of options limited
• This means that tailor-made approaches to SA are the key to
  success
• Off-the-shelf tools for sentiment analysis are unlikely to work well
  when used for different tasks
• This is great when you want to find opinions about the latest films
  from a single film review site, written in English, as you can train
  models for this purpose and get reasonable results
• But what if you want to do more complex forms of analysis on a
  potentially unknown set of documents containing potentially
  unknown kinds of entities?
               Sentment Analysis Symposium, San Francisco, October 2012
Language processing tradeoffs




Sentment Analysis Symposium, San Francisco, October 2012
The task
• Two very different domains:
   – Greek and Austrian Parliamentary texts
   – German rock concerts
• We want to find out ultimately:
  – What are the opinions on crucial social events and who are the
    key people involved?
  – How are these opinions distributed in relation to demographic
    user data?
  – How have these opinions evolved?
   – Who are the opinion leaders?
   – What is their impact and influence?
• The first task is to investigate public opinion about key entities and
  events
                Sentment Analysis Symposium, San Francisco, October 2012
Sentiment analysis is hard...
●
    And it's even harderin our case because:
    ●
        we have lots of different text types and domains
    ●
        we're processing social media
    ●
        we're processing multiple languages
    ●
        we don't necessarily know what we're looking for
●
    Experimenting with different techniques and subcomponents
    to build up a complex system
●
    Experimenting with techniques for cross-language adaptation




                Sentment Analysis Symposium, San Francisco, October 2012
GATE to the rescue...
• We use GATE, a tool for language engineering, because it is robust
  and adaptable with a component-based architecture
• Lots of experience adapting IE applications to different languages
  and domains
• Can we do the same thing for sentiment analysis?
• Main challenges:
  – how can we easily port sentiment analysis to a new language,
    and how can we process documents in multiple languages most
    efficiently?
  – how can we easily adapt applications which work on standard
    reviews to deal with noisy social media?


              Sentment Analysis Symposium, San Francisco, October 2012
We use a rule-based approach because...

• Although ML applications are typically used for Opinion Mining, this
  task involves documents from many different text types, genres,
  languages and domains
• This is problematic for ML because it requires many applications
  trained on the different datasets, and methods to deal with acquisition
  of training material
• Aim of using a rule-based system is that the bulk of it can be used
  across different kinds of texts, with only the pre-processing and some
  sentiment dictionaries which are domain and language-specific




                Sentment Analysis Symposium, San Francisco, October 2012
Application Components

• Structural pre-processing, specific to social media types
• Linguistic pre-processing (including language detection)
• Standard language-specific NE, term and event recognition
• Additional targeted language- and task-specific gazetteer lookup
• JAPE grammars for opinion finding
• Aggregation and summarisation of opinions




               Sentment Analysis Symposium, San Francisco, October 2012
Conditional processing in GATE

    In GATE, you can set a processing resource in your application to run
    or not depending on certain circumstances

    You can have several different PRs loaded, and let the system
    automatically choose which one to run, for each document.

    This is very helpful when you have texts in multiple languages, or of
    different types, which might require different kinds of processing

    For example, if you have a mixture of German and English
    documents in your corpus, you might have some PRs which are
    language-dependent and some which are not

    You can set up the application to run the relevant PRs on the right
    documents automatically.


                Sentment Analysis Symposium, San Francisco, October 2012
Conditional processing with different languages

• Suppose we have a corpus with documents in German and English, and
  we only want to process the English texts.
• First we must distinguish between the two kinds of text, using a
  language identification tool
• We use TextCat (a GATE plugin) to add a feature on each document,
  telling us which language the document is in
• Then we run a conditional processing pipeline, that only runs the
  subsequent PRs if the value of the language feature on the document
  is English
• The other documents will not be processed
• What if we want to process both German and English documents?
  – we can just call some language-specific PRs conditionally, and use
    the language-neutral PRs on all documents
                Sentment Analysis Symposium, San Francisco, October 2012
What if the documents themselves are in
                multiple languages?

    Segment Processing PR enables you to process labelled sections of a
    document independently, one at a time

    It then merges back the individual sections once they've been processed

    Useful for
      
           when you want annotations in different sections to be
           independent of each other
      
           when you only want to process certain sections within a
           document
      
           processing multiple languages separately within a single
           document



                 Sentment Analysis Symposium, San Francisco, October 2012
English and German content




Sentment Analysis Symposium, San Francisco, October 2012
Sentiment Finding in Rock am Ring




    Sentment Analysis Symposium, San Francisco, October 2012
Creating language-specific resources

• Linguistic pre-processing
   – standard tools (POS tagging, language ID, etc)
    – NE and term recognition: existing language-specific plugins
• Gazetteers
   – Automatic translation of gazetteer lists
    – Collection of lists from the web, e.g. swear words, typical
      phrases
    – Automatic gazetteer induction from texts: bootstrapping
      approach
• Everything else is language-independent


              Sentment Analysis Symposium, San Francisco, October 2012
Creating domain-specific resources

• Unlike the language issue, documents are only ever of a single type
• Social media poses challenges for linguistic processing
   – requires specially developed PRs with more flexible matching,
     special handling of tweets etc (tokeniser, POS tagger etc)
  “RT @Bthompson WRITEZ: @libbyabrego honored?! Everybody
  knows the libster is nice with it...lol...(thankkkks a bunch;))”
   – See our work on the TrendMiner project for more details of this
     problem and how we handle it
   – Phenomena like sarcasm occur more often on less formal texts
• We can again use conditional processing on a document-by-
  document basis to deal with these issues

              Sentment Analysis Symposium, San Francisco, October 2012
Conclusions

• It's best if you can tailor your application as much as possible
  ot the domain and language
• But if you have to process multiple kinds of text, a modular
  rule-based approach can allow you to combine the specific
  resources with the generic resources
• GATE is ideally set up for this (of course, other tools are
  available too...)
• We use a rule-based approach, but lots of current research on
  automatic induction of new training data on different kinds of
  text


             Sentment Analysis Symposium, San Francisco, October 2012
More information

• GATE http://gate.ac.uk (general info, download, tutorials, demos,
  references etc)
• The EU-funded ARCOMEM and TrendMiner projects are dealing
  with lots of issues about opinion and trend mining from social
  media, and use GATE for this.
   – http://www.arcomem.eu
   – http://www.trendminer-project.eu/
• More information on dealing with the problems of social media:
   – D. Maynard and K. Bontcheva and D. Rout. Challenges in developing
     opinion mining tools for social media. In Proceedings of @NLP can u
     tag #usergeneratedcontent?! Workshop at LREC 2012, May 2012,
     Istanbul, Turkey.

              Sentment Analysis Symposium, San Francisco, October 2012
Sentment Analysis Symposium, San Francisco, October 2012

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (7)

Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
Multilingual vocabularies for the Web: Session on multilingual vocabularies, ...
 
Language Access for Legal Aid Websites
Language Access for Legal Aid WebsitesLanguage Access for Legal Aid Websites
Language Access for Legal Aid Websites
 
George Androulakis
George AndroulakisGeorge Androulakis
George Androulakis
 
Arabic Script SHOULD NOT be so Scary!
Arabic Script SHOULD NOT be so Scary!Arabic Script SHOULD NOT be so Scary!
Arabic Script SHOULD NOT be so Scary!
 
Kyra Pollitt
Kyra PollittKyra Pollitt
Kyra Pollitt
 
Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)
 
Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)Growing Your Freelance Business (Olga Melnikova)
Growing Your Freelance Business (Olga Melnikova)
 

Andere mochten auch

What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?
Farida Vis
 
Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
PatrickCrompton
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
Lora Cecere
 

Andere mochten auch (20)

an introduction to social media and research
an introduction to social media and researchan introduction to social media and research
an introduction to social media and research
 
The Big Picture: Responsive Images in Action #scd14
The Big Picture: Responsive Images in Action #scd14The Big Picture: Responsive Images in Action #scd14
The Big Picture: Responsive Images in Action #scd14
 
Research Presentation: How Numbers are Powering the Next Era of Marketing
Research Presentation: How Numbers are Powering the Next Era of MarketingResearch Presentation: How Numbers are Powering the Next Era of Marketing
Research Presentation: How Numbers are Powering the Next Era of Marketing
 
Efficient data reduction and analysis of DECam images using multicore archite...
Efficient data reduction and analysis of DECam images using multicore archite...Efficient data reduction and analysis of DECam images using multicore archite...
Efficient data reduction and analysis of DECam images using multicore archite...
 
Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?
 
Big Data has Big Problems
Big Data has Big ProblemsBig Data has Big Problems
Big Data has Big Problems
 
What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?
 
Cloud and Big Data Conference Images
Cloud and Big Data Conference ImagesCloud and Big Data Conference Images
Cloud and Big Data Conference Images
 
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
ESRC Research Methods Festival - From Flickr to Snapchat: The challenge of an...
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
 
Are You Ready for the Era of Big Data and Extreme Information Management?
Are You Ready for the Era of Big Data and Extreme Information Management?Are You Ready for the Era of Big Data and Extreme Information Management?
Are You Ready for the Era of Big Data and Extreme Information Management?
 
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...
 
Picturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter SchoolPicturing the Social: Talk for Transforming Digital Methods Winter School
Picturing the Social: Talk for Transforming Digital Methods Winter School
 
Presentation Visual Aids V3
Presentation Visual Aids  V3Presentation Visual Aids  V3
Presentation Visual Aids V3
 
Whitepaper: Thriving in the Big Data era Manage Data before Data Manages you
Whitepaper: Thriving in the Big Data era Manage Data before Data Manages you Whitepaper: Thriving in the Big Data era Manage Data before Data Manages you
Whitepaper: Thriving in the Big Data era Manage Data before Data Manages you
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
 
Business Model Innovation: Designing and testing business model in a corporat...
Business Model Innovation: Designing and testing business model in a corporat...Business Model Innovation: Designing and testing business model in a corporat...
Business Model Innovation: Designing and testing business model in a corporat...
 
Big Data, Small Dashboard
Big Data, Small DashboardBig Data, Small Dashboard
Big Data, Small Dashboard
 

Ähnlich wie A tailor-made one-size-fits-all approach to sentiment analysis

GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
MediaEval2012
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
Ashwini Awatare
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
Traian Rebedea
 
Introduction to NLP_1.pptx
Introduction to NLP_1.pptxIntroduction to NLP_1.pptx
Introduction to NLP_1.pptx
jkamble
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
jkamble
 
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
Jodie Nicotra
 
Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019
Ron Martinez
 

Ähnlich wie A tailor-made one-size-fits-all approach to sentiment analysis (20)

GATE: a text analysis tool for social media
GATE: a text analysis tool for social mediaGATE: a text analysis tool for social media
GATE: a text analysis tool for social media
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Text analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATEText analysis and Semantic Search with GATE
Text analysis and Semantic Search with GATE
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Fossetcon15
Fossetcon15Fossetcon15
Fossetcon15
 
Scale2016
Scale2016Scale2016
Scale2016
 
It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...It’s getting crowded! A critical view of what crowdsourcing can do for termin...
It’s getting crowded! A critical view of what crowdsourcing can do for termin...
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Addis Ababa University.pptx
Addis Ababa University.pptxAddis Ababa University.pptx
Addis Ababa University.pptx
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
Opinion mining for social media and news items in Romanian
Opinion mining for social media and news items in RomanianOpinion mining for social media and news items in Romanian
Opinion mining for social media and news items in Romanian
 
Introduction to NLP_1.pptx
Introduction to NLP_1.pptxIntroduction to NLP_1.pptx
Introduction to NLP_1.pptx
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
ENGL 202 Project 1 Slidedoc 1: How to do the Analyzing Professional Writing P...
 
Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019Scientific and technical translation in English - week 3 2019
Scientific and technical translation in English - week 3 2019
 
Unit 5f.pptx
Unit 5f.pptxUnit 5f.pptx
Unit 5f.pptx
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 

Mehr von Diana Maynard

Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Diana Maynard
 

Mehr von Diana Maynard (20)

Filth and lies: analysing social media
Filth and lies: analysing social mediaFilth and lies: analysing social media
Filth and lies: analysing social media
 
Adding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long wayAdding value to NLP: a little semantics goes a long way
Adding value to NLP: a little semantics goes a long way
 
Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...Methodological possibilities for strengthening the monitoring of SDG indicato...
Methodological possibilities for strengthening the monitoring of SDG indicato...
 
Getting the-most-out-of-conferences
Getting the-most-out-of-conferencesGetting the-most-out-of-conferences
Getting the-most-out-of-conferences
 
Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...Understanding the world with NLP: interactions between society, behaviour and...
Understanding the world with NLP: interactions between society, behaviour and...
 
The language of social media
The language of social mediaThe language of social media
The language of social media
 
Text analysis-semantic-search
Text analysis-semantic-searchText analysis-semantic-search
Text analysis-semantic-search
 
Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...Using language to save the world: interactions between society, behaviour and...
Using language to save the world: interactions between society, behaviour and...
 
Adapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutionsAdapting NLP tools to diverse data: challenges and solutions
Adapting NLP tools to diverse data: challenges and solutions
 
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
Ontologies as bridges between data sources and user queries: the KNOWMAK proj...
 
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
20 Years of Text Mining Applications with GATE: from Donald Trump to curing c...
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Tools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media AnalysisTools for (Almost) Real-Time Social Media Analysis
Tools for (Almost) Real-Time Social Media Analysis
 
Social media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATESocial media analytics as a service: tools from GATE
Social media analytics as a service: tools from GATE
 
Cls8 decarbonet
Cls8 decarbonetCls8 decarbonet
Cls8 decarbonet
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?Can Social Media Analysis Improve Collective Awareness of Climate Change?
Can Social Media Analysis Improve Collective Awareness of Climate Change?
 
Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?Do we really know what people mean when they tweet?
Do we really know what people mean when they tweet?
 
Disability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged SwordDisability and Adventure Travel: the Double-Edged Sword
Disability and Adventure Travel: the Double-Edged Sword
 
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
Who cares about sarcastic tweets? Investigating the impact of sarcasm on sent...
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

A tailor-made one-size-fits-all approach to sentiment analysis

  • 1. A tailor-made one-size-fits-all approach to sentiment analysis Diana Maynard University of Sheffield, UK Sentment Analysis Symposium, San Francisco, October 2012
  • 2. What's the problem? • Like most language engineering tasks, sentiment analysis works best on a specific domain, in a single language, where the task is clear and the range of options limited • This means that tailor-made approaches to SA are the key to success • Off-the-shelf tools for sentiment analysis are unlikely to work well when used for different tasks • This is great when you want to find opinions about the latest films from a single film review site, written in English, as you can train models for this purpose and get reasonable results • But what if you want to do more complex forms of analysis on a potentially unknown set of documents containing potentially unknown kinds of entities? Sentment Analysis Symposium, San Francisco, October 2012
  • 3. Language processing tradeoffs Sentment Analysis Symposium, San Francisco, October 2012
  • 4. The task • Two very different domains: – Greek and Austrian Parliamentary texts – German rock concerts • We want to find out ultimately: – What are the opinions on crucial social events and who are the key people involved? – How are these opinions distributed in relation to demographic user data? – How have these opinions evolved? – Who are the opinion leaders? – What is their impact and influence? • The first task is to investigate public opinion about key entities and events Sentment Analysis Symposium, San Francisco, October 2012
  • 5. Sentiment analysis is hard... ● And it's even harderin our case because: ● we have lots of different text types and domains ● we're processing social media ● we're processing multiple languages ● we don't necessarily know what we're looking for ● Experimenting with different techniques and subcomponents to build up a complex system ● Experimenting with techniques for cross-language adaptation Sentment Analysis Symposium, San Francisco, October 2012
  • 6. GATE to the rescue... • We use GATE, a tool for language engineering, because it is robust and adaptable with a component-based architecture • Lots of experience adapting IE applications to different languages and domains • Can we do the same thing for sentiment analysis? • Main challenges: – how can we easily port sentiment analysis to a new language, and how can we process documents in multiple languages most efficiently? – how can we easily adapt applications which work on standard reviews to deal with noisy social media? Sentment Analysis Symposium, San Francisco, October 2012
  • 7. We use a rule-based approach because... • Although ML applications are typically used for Opinion Mining, this task involves documents from many different text types, genres, languages and domains • This is problematic for ML because it requires many applications trained on the different datasets, and methods to deal with acquisition of training material • Aim of using a rule-based system is that the bulk of it can be used across different kinds of texts, with only the pre-processing and some sentiment dictionaries which are domain and language-specific Sentment Analysis Symposium, San Francisco, October 2012
  • 8. Application Components • Structural pre-processing, specific to social media types • Linguistic pre-processing (including language detection) • Standard language-specific NE, term and event recognition • Additional targeted language- and task-specific gazetteer lookup • JAPE grammars for opinion finding • Aggregation and summarisation of opinions Sentment Analysis Symposium, San Francisco, October 2012
  • 9. Conditional processing in GATE  In GATE, you can set a processing resource in your application to run or not depending on certain circumstances  You can have several different PRs loaded, and let the system automatically choose which one to run, for each document.  This is very helpful when you have texts in multiple languages, or of different types, which might require different kinds of processing  For example, if you have a mixture of German and English documents in your corpus, you might have some PRs which are language-dependent and some which are not  You can set up the application to run the relevant PRs on the right documents automatically. Sentment Analysis Symposium, San Francisco, October 2012
  • 10. Conditional processing with different languages • Suppose we have a corpus with documents in German and English, and we only want to process the English texts. • First we must distinguish between the two kinds of text, using a language identification tool • We use TextCat (a GATE plugin) to add a feature on each document, telling us which language the document is in • Then we run a conditional processing pipeline, that only runs the subsequent PRs if the value of the language feature on the document is English • The other documents will not be processed • What if we want to process both German and English documents? – we can just call some language-specific PRs conditionally, and use the language-neutral PRs on all documents Sentment Analysis Symposium, San Francisco, October 2012
  • 11. What if the documents themselves are in multiple languages?  Segment Processing PR enables you to process labelled sections of a document independently, one at a time  It then merges back the individual sections once they've been processed  Useful for  when you want annotations in different sections to be independent of each other  when you only want to process certain sections within a document  processing multiple languages separately within a single document Sentment Analysis Symposium, San Francisco, October 2012
  • 12. English and German content Sentment Analysis Symposium, San Francisco, October 2012
  • 13. Sentiment Finding in Rock am Ring Sentment Analysis Symposium, San Francisco, October 2012
  • 14. Creating language-specific resources • Linguistic pre-processing – standard tools (POS tagging, language ID, etc) – NE and term recognition: existing language-specific plugins • Gazetteers – Automatic translation of gazetteer lists – Collection of lists from the web, e.g. swear words, typical phrases – Automatic gazetteer induction from texts: bootstrapping approach • Everything else is language-independent Sentment Analysis Symposium, San Francisco, October 2012
  • 15. Creating domain-specific resources • Unlike the language issue, documents are only ever of a single type • Social media poses challenges for linguistic processing – requires specially developed PRs with more flexible matching, special handling of tweets etc (tokeniser, POS tagger etc) “RT @Bthompson WRITEZ: @libbyabrego honored?! Everybody knows the libster is nice with it...lol...(thankkkks a bunch;))” – See our work on the TrendMiner project for more details of this problem and how we handle it – Phenomena like sarcasm occur more often on less formal texts • We can again use conditional processing on a document-by- document basis to deal with these issues Sentment Analysis Symposium, San Francisco, October 2012
  • 16. Conclusions • It's best if you can tailor your application as much as possible ot the domain and language • But if you have to process multiple kinds of text, a modular rule-based approach can allow you to combine the specific resources with the generic resources • GATE is ideally set up for this (of course, other tools are available too...) • We use a rule-based approach, but lots of current research on automatic induction of new training data on different kinds of text Sentment Analysis Symposium, San Francisco, October 2012
  • 17. More information • GATE http://gate.ac.uk (general info, download, tutorials, demos, references etc) • The EU-funded ARCOMEM and TrendMiner projects are dealing with lots of issues about opinion and trend mining from social media, and use GATE for this. – http://www.arcomem.eu – http://www.trendminer-project.eu/ • More information on dealing with the problems of social media: – D. Maynard and K. Bontcheva and D. Rout. Challenges in developing opinion mining tools for social media. In Proceedings of @NLP can u tag #usergeneratedcontent?! Workshop at LREC 2012, May 2012, Istanbul, Turkey. Sentment Analysis Symposium, San Francisco, October 2012
  • 18. Sentment Analysis Symposium, San Francisco, October 2012