SlideShare a Scribd company logo
1 of 40
RIP Boris Strugatski
Science Fiction will never be the same
Implicit Sentiment Mining
     (do you tweet like Hamas?)

          Maksim Tsvetovat
           Jacqueline Kazil
        Alexander Kouznetsov
My book
Twitter predicts stock market
Sentiment Mining, old-schoool

• Start with a corpus of words that have sentiment
  orientation (bad/good):
     • “awesome” : +1
     • “horrible”: -1
     • “donut” : 0 (neutral)

• Compute sentiment of a text by averaging all
  words in text
…however…
• This doesn’t quite work (not reliably, at least).

• Human emotions are actually quite complex




• ….. Anyone surprised?
We do things like this:



“This restaurant would deserve highest praise if
      you were a cockroach” (a real Yelp review ;-)
We do things like this:



  “This is only a flesh wound!”
We do things like this:



“This concert was f**ing awesome!”
We do things like this:



“My car just got rear-ended! F**ing awesome!”
We do things like this:



“A rape is a gift from God” (he lost! Good ;-)
To sum up…

• Ambiguity is rampant

• Context matters

• Homonyms are everywhere

• Neutral words become charged as discourse
 changes, charged words lose their meaning
More Sentiment Analysis

• We can parse text using POS (parts-of-
  speech) identification

• This helps with homonyms and some
  ambiguity
More Sentiment Analysis

• Create rules with amplifier words and inverter
  words:
   – “This concert (np) was (v) f**ing (AMP) awesome (+1) = +2

   – “But the opening act (np) was (v) not (INV) great (+1) = -1

   – “My car (np) got (v) rear-ended (v)! F**ing (AMP)
     awesome (+1) = +2??
To do this properly…
• Valence (good vs. bad)

• Relevance (me vs. others)

• Immediacy (now/later)

• Certainty (definitely/maybe)
•   …. And about 9 more less-significant dimensions


        Samsonovich A., Ascoli G.: Cognitive map dimensions of the human value
        system extracted from the natural language. In Goertzel B. (Ed.): Advances in
        Artificial General Intelligence (Proc. 2006 AGIRI Workshop), IOS Press, pp. 111-
        124 (2007).
This is hard



• But worth it?
  Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink
Sentiment, Gangnam Style!
Hypothesis


• Support for a political candidate, party, brand,
  country, etc. can be detected by observing
  indirect indicators of sentiment in text
Mirroring – unconscious copying
  of words or body language




 Fay, W. H.; Coleman, R. O. (1977). "A human sound transducer/reproducer: Temporal
 capabilities of a profoundly echolalic child". Brain and language 4 (3): 396–402
Marker words
• All speakers have some words and
  expressions in common (e.g.
  conservative, liberal, party designation,
  etc)
• However, everyone has a set of
  trademark words and expressions that
  make him unique.
GOP Presidential Candidates
Israel vs. Hamas on Twitter
Observing Mirroring

• We detect marker words and expressions in
 social media speech and compute sentiment
 by observing and counting mirrored phrases
The research question


• Is media biased towards Israel or Hamas in
  the current conflict?

• What is the slant of various media sources?
Data harvest
• Get Twitter feeds for:
   – @IDFSpokesperson
   – @AlQuassam
   – Twitter feeds for CNN, BBC, CNBC, NPR, Al-Jazeera,
     FOX News – all filtered to only include articles on
     Israel and Gaza

• (more text == more reliable results)
Fast Computational Linguistics
Text Cleaning
import string
stoplist_str="""
a
a's                                                                • Tweet text is dirty
able
About                                                              • (RT, VIA, #this and
...                                                                   @that, ROFL, etc)
...
z                                                                  • Use a stoplist to produce a
zero
rt                                                                    stripped-down tweet
via
"""

stoplist=[w.strip() for w in stoplist_str.split('n') if w !='']
Language ID

• Language identification is pretty easy…

• Every language has a characteristic
  distribution of tri-grams (3-letter sequences);
  – E.g. English is heavy on “the” trigram

• Use open-source library “guess-language”
Stemming
• Stemming identifies root of a word, stripping
  away:
  – Suffixes, prefixes, verb tense, etc

• “stemmer”, “stemming”, “stemmed” ->>
  “stem”
• “go”,”going”,”gone” ->> “go”
Term Networks
• Output of the cleaning step is a term
   vector
• Union of term vectors is a term network
• 2-mode network linking speakers with
   bigrams
• 2-mode network linking locations with
   bigrams
• Edge weight = number of occurrences
   of edge bigram/location or
   candidate/location
Build a larger net

• Periodically purge single co-occurrences
  – Edge weights are power-law distributed
  – Single co-occurrences account for ~ 90% of data

• Periodically discount and purge old co-
  occurrences
  – Discourse changes, data should reflect it.
Israel vs. Hamas on Twitter
Israel, Hamas and Media
Metrics computation

• Extract ego-networks for IDF and HAMAS
• Extract ego-networks for media organizations
• Compute hamming distance H(c,l)
   – Cardinality of an intersection set between two networks
   – Or… how much does CNN mirror Hamas? What about FOX?

• Normalize to percentage of support
Aggregate & Normalize


• Aggregate speech
  differences and
  similarities by
  media source
• Normalize values
Media Sources, Hamas and IDF
                         Chart Title
                         IDF   Hamas

    NPR       0.579395354               0.420604646

AlJazeera   0.530344094                0.469655906

    CNN       0.585616438                0.414383562

     BBC     0.537492158               0.462507842

    FOX     0.49329523                 0.50670477

   CNBC        0.601137576               0.398862424
Ron Paul, Romney, Gingrich, Santorum
         March 2012 (based on Twitter Support)
MT
MN
UT
MD
ID
IA
IL
AR
AK
PA
LA
HI
SD
KY
KS
OK
GA
CO
RI
NE
NC
NJ
WY
WV
WA

     0       0.2    0.4    0.6    0.8    1       1.2
Conclusions

• This works pretty well! ;-)

• However – it only works in
  aggregates, especially on Twitter.

• More text == better accuracy.
Conclusions

• The algorithm is cheap:
  – O(n) for words on ingest – real-time on a stream

  – O(n^2) for storage (pruning helps a lot)

• Storage can go to Redis
  – make use of built-in set operations
Implicit Sentiment Mining in Twitter Streams

More Related Content

Viewers also liked

Research_and_Development_in_the_Solar_Re
Research_and_Development_in_the_Solar_ReResearch_and_Development_in_the_Solar_Re
Research_and_Development_in_the_Solar_ReVladimir Krupkin
 
Ipsos sack ministers survey april 2015
Ipsos sack ministers survey april 2015Ipsos sack ministers survey april 2015
Ipsos sack ministers survey april 2015The Star Newspaper
 
Rmls Data 1 22 08
Rmls Data 1 22 08Rmls Data 1 22 08
Rmls Data 1 22 08broach
 
Social media in higher ed may 2010
Social media in higher ed may 2010Social media in higher ed may 2010
Social media in higher ed may 2010Lisa Fisher
 
New strategies for attacking deferred maintenance december 2012
New strategies for attacking deferred maintenance december 2012New strategies for attacking deferred maintenance december 2012
New strategies for attacking deferred maintenance december 2012Sightlines
 
Relocating For Work?
Relocating For Work?Relocating For Work?
Relocating For Work?Louise Bailey
 
지리산콘도 미국비자신청방법
지리산콘도 미국비자신청방법지리산콘도 미국비자신청방법
지리산콘도 미국비자신청방법dehryes
 
Senior Capstone - Nasogastruc Intubation Training
Senior Capstone - Nasogastruc Intubation TrainingSenior Capstone - Nasogastruc Intubation Training
Senior Capstone - Nasogastruc Intubation TrainingKonrad Wolfmeyer
 
Devoxx France 2015 - UX : Le Poids des Mots - 1.1
Devoxx France 2015 - UX : Le Poids des Mots - 1.1Devoxx France 2015 - UX : Le Poids des Mots - 1.1
Devoxx France 2015 - UX : Le Poids des Mots - 1.1Grégory Weinbach
 
Generation Y Study In China Whitepaper
Generation Y Study In China WhitepaperGeneration Y Study In China Whitepaper
Generation Y Study In China WhitepaperSteven Chen
 

Viewers also liked (16)

Research_and_Development_in_the_Solar_Re
Research_and_Development_in_the_Solar_ReResearch_and_Development_in_the_Solar_Re
Research_and_Development_in_the_Solar_Re
 
Ipsos sack ministers survey april 2015
Ipsos sack ministers survey april 2015Ipsos sack ministers survey april 2015
Ipsos sack ministers survey april 2015
 
Zaragoza turismo-100
Zaragoza turismo-100Zaragoza turismo-100
Zaragoza turismo-100
 
Rmls Data 1 22 08
Rmls Data 1 22 08Rmls Data 1 22 08
Rmls Data 1 22 08
 
Social media in higher ed may 2010
Social media in higher ed may 2010Social media in higher ed may 2010
Social media in higher ed may 2010
 
Zaragoza turismo 230 bis
Zaragoza turismo 230 bisZaragoza turismo 230 bis
Zaragoza turismo 230 bis
 
New strategies for attacking deferred maintenance december 2012
New strategies for attacking deferred maintenance december 2012New strategies for attacking deferred maintenance december 2012
New strategies for attacking deferred maintenance december 2012
 
Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012
Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012
Daniel Hibbert - Reward in Local Government - PPMA Seminar April 2012
 
Daneia Apografh Draseon
Daneia Apografh DraseonDaneia Apografh Draseon
Daneia Apografh Draseon
 
Relocating For Work?
Relocating For Work?Relocating For Work?
Relocating For Work?
 
CV PA
CV PACV PA
CV PA
 
지리산콘도 미국비자신청방법
지리산콘도 미국비자신청방법지리산콘도 미국비자신청방법
지리산콘도 미국비자신청방법
 
Senior Capstone - Nasogastruc Intubation Training
Senior Capstone - Nasogastruc Intubation TrainingSenior Capstone - Nasogastruc Intubation Training
Senior Capstone - Nasogastruc Intubation Training
 
8051f044
8051f0448051f044
8051f044
 
Devoxx France 2015 - UX : Le Poids des Mots - 1.1
Devoxx France 2015 - UX : Le Poids des Mots - 1.1Devoxx France 2015 - UX : Le Poids des Mots - 1.1
Devoxx France 2015 - UX : Le Poids des Mots - 1.1
 
Generation Y Study In China Whitepaper
Generation Y Study In China WhitepaperGeneration Y Study In China Whitepaper
Generation Y Study In China Whitepaper
 

Similar to Implicit Sentiment Mining in Twitter Streams

IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"Pete Burnap
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingFlorian Leitner
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Florian Leitner
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with ElasticsearchAleksander Stensby
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talkrtelmore
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-datac.titus.brown
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveJames Hendler
 

Similar to Implicit Sentiment Mining in Twitter Streams (20)

Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)Deep Learning Summit (DLS01-4)
Deep Learning Summit (DLS01-4)
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...
 
Watson System
Watson SystemWatson System
Watson System
 
We love NLTK
We love NLTKWe love NLTK
We love NLTK
 
OUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language ModelingOUTDATED Text Mining 2/5: Language Modeling
OUTDATED Text Mining 2/5: Language Modeling
 
TRank ISWC2013
TRank ISWC2013TRank ISWC2013
TRank ISWC2013
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Petermrjisc20141201
Petermrjisc20141201Petermrjisc20141201
Petermrjisc20141201
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
Data Exploration with Elasticsearch
Data Exploration with ElasticsearchData Exploration with Elasticsearch
Data Exploration with Elasticsearch
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
2013 siam-cse-big-data
2013 siam-cse-big-data2013 siam-cse-big-data
2013 siam-cse-big-data
 
DeepLearning
DeepLearningDeepLearning
DeepLearning
 
Why Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspectiveWhy Watson Won: A cognitive perspective
Why Watson Won: A cognitive perspective
 

Recently uploaded

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

Implicit Sentiment Mining in Twitter Streams

  • 1. RIP Boris Strugatski Science Fiction will never be the same
  • 2. Implicit Sentiment Mining (do you tweet like Hamas?) Maksim Tsvetovat Jacqueline Kazil Alexander Kouznetsov
  • 5. Sentiment Mining, old-schoool • Start with a corpus of words that have sentiment orientation (bad/good): • “awesome” : +1 • “horrible”: -1 • “donut” : 0 (neutral) • Compute sentiment of a text by averaging all words in text
  • 6. …however… • This doesn’t quite work (not reliably, at least). • Human emotions are actually quite complex • ….. Anyone surprised?
  • 7. We do things like this: “This restaurant would deserve highest praise if you were a cockroach” (a real Yelp review ;-)
  • 8. We do things like this: “This is only a flesh wound!”
  • 9. We do things like this: “This concert was f**ing awesome!”
  • 10. We do things like this: “My car just got rear-ended! F**ing awesome!”
  • 11. We do things like this: “A rape is a gift from God” (he lost! Good ;-)
  • 12. To sum up… • Ambiguity is rampant • Context matters • Homonyms are everywhere • Neutral words become charged as discourse changes, charged words lose their meaning
  • 13. More Sentiment Analysis • We can parse text using POS (parts-of- speech) identification • This helps with homonyms and some ambiguity
  • 14. More Sentiment Analysis • Create rules with amplifier words and inverter words: – “This concert (np) was (v) f**ing (AMP) awesome (+1) = +2 – “But the opening act (np) was (v) not (INV) great (+1) = -1 – “My car (np) got (v) rear-ended (v)! F**ing (AMP) awesome (+1) = +2??
  • 15. To do this properly… • Valence (good vs. bad) • Relevance (me vs. others) • Immediacy (now/later) • Certainty (definitely/maybe) • …. And about 9 more less-significant dimensions Samsonovich A., Ascoli G.: Cognitive map dimensions of the human value system extracted from the natural language. In Goertzel B. (Ed.): Advances in Artificial General Intelligence (Proc. 2006 AGIRI Workshop), IOS Press, pp. 111- 124 (2007).
  • 16. This is hard • But worth it? Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink
  • 18. Hypothesis • Support for a political candidate, party, brand, country, etc. can be detected by observing indirect indicators of sentiment in text
  • 19. Mirroring – unconscious copying of words or body language Fay, W. H.; Coleman, R. O. (1977). "A human sound transducer/reproducer: Temporal capabilities of a profoundly echolalic child". Brain and language 4 (3): 396–402
  • 20. Marker words • All speakers have some words and expressions in common (e.g. conservative, liberal, party designation, etc) • However, everyone has a set of trademark words and expressions that make him unique.
  • 22. Israel vs. Hamas on Twitter
  • 23. Observing Mirroring • We detect marker words and expressions in social media speech and compute sentiment by observing and counting mirrored phrases
  • 24. The research question • Is media biased towards Israel or Hamas in the current conflict? • What is the slant of various media sources?
  • 25. Data harvest • Get Twitter feeds for: – @IDFSpokesperson – @AlQuassam – Twitter feeds for CNN, BBC, CNBC, NPR, Al-Jazeera, FOX News – all filtered to only include articles on Israel and Gaza • (more text == more reliable results)
  • 27. Text Cleaning import string stoplist_str=""" a a's • Tweet text is dirty able About • (RT, VIA, #this and ... @that, ROFL, etc) ... z • Use a stoplist to produce a zero rt stripped-down tweet via """ stoplist=[w.strip() for w in stoplist_str.split('n') if w !='']
  • 28. Language ID • Language identification is pretty easy… • Every language has a characteristic distribution of tri-grams (3-letter sequences); – E.g. English is heavy on “the” trigram • Use open-source library “guess-language”
  • 29. Stemming • Stemming identifies root of a word, stripping away: – Suffixes, prefixes, verb tense, etc • “stemmer”, “stemming”, “stemmed” ->> “stem” • “go”,”going”,”gone” ->> “go”
  • 30. Term Networks • Output of the cleaning step is a term vector • Union of term vectors is a term network • 2-mode network linking speakers with bigrams • 2-mode network linking locations with bigrams • Edge weight = number of occurrences of edge bigram/location or candidate/location
  • 31. Build a larger net • Periodically purge single co-occurrences – Edge weights are power-law distributed – Single co-occurrences account for ~ 90% of data • Periodically discount and purge old co- occurrences – Discourse changes, data should reflect it.
  • 32. Israel vs. Hamas on Twitter
  • 34. Metrics computation • Extract ego-networks for IDF and HAMAS • Extract ego-networks for media organizations • Compute hamming distance H(c,l) – Cardinality of an intersection set between two networks – Or… how much does CNN mirror Hamas? What about FOX? • Normalize to percentage of support
  • 35. Aggregate & Normalize • Aggregate speech differences and similarities by media source • Normalize values
  • 36. Media Sources, Hamas and IDF Chart Title IDF Hamas NPR 0.579395354 0.420604646 AlJazeera 0.530344094 0.469655906 CNN 0.585616438 0.414383562 BBC 0.537492158 0.462507842 FOX 0.49329523 0.50670477 CNBC 0.601137576 0.398862424
  • 37. Ron Paul, Romney, Gingrich, Santorum March 2012 (based on Twitter Support) MT MN UT MD ID IA IL AR AK PA LA HI SD KY KS OK GA CO RI NE NC NJ WY WV WA 0 0.2 0.4 0.6 0.8 1 1.2
  • 38. Conclusions • This works pretty well! ;-) • However – it only works in aggregates, especially on Twitter. • More text == better accuracy.
  • 39. Conclusions • The algorithm is cheap: – O(n) for words on ingest – real-time on a stream – O(n^2) for storage (pruning helps a lot) • Storage can go to Redis – make use of built-in set operations