SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Social Text, Sentiment and Tone
Analysis
Advanced Analytics and Insights

Ruben Pertusa Lopez, SolidQ, Business Intelligence DPE & MAP 2013 (rpertusa@solidq.com)
Paco Gonzalez, SolidQ, Mentor (paco@solidq.com)



                                                                        April 10-12 | Chicago, IL
Please silence
cell phones

                 April 10-12 | Chicago, IL
Paco & Ruben

•   Conference Speakers
•   Based in Europe
•   Book Authors
•   PHD Candidate on Data Mining
•   Project Managers
•   Microsoft Certified Professionals



                                        3
Goals

• This session will help us understand how to analyze
  sentiment / tone using cutting edge Microsoft
  Technologies

• This is NOT a research session on NLP (Natural Language
  Processing). Don’t be scared! ☺



                                                        4
Agenda

• Overview of Sentiment Analysis

• Gathering and Storing Data

• Sentiment Analysis Techniques

• Business Analytics for Sentiment & Structured data,
  together

                                                        5
Overview of Sentiment
Analysis




                   April 10-12 | Chicago, IL
What is Sentiment?

Feelings, Opinions, Emotions
•   Like
•   Dislike
•   Good
•   Bad
•   …




                               7
Some examples
   paulo @paulomors
   Barcelona Beating Milan 4-0 and a gallon of Coke, best day of the year ☺                        ☺
   αυѕтιη ✰ @AustinJohnsto
   A 2 hour delay is like when a restaurant only has Pepsi products. It'll work but ill still be
   very disappointed.

   Trace @trace_haf
   Drinking more Coke than water every day does no good for you health.
                                                                                                   ☺?
                                                                                                    ?
   DONALD @donald150
   My budget decides whether coke or pepsi, always buy the cheaper



                                                                                                   8
Not only Twitter!
               The Walking Dead Season 3 (3300 customer reviews)
    Awesome season opener! October 15, 2012 By K. Erwin
    After sticking through season two, which was alot of looking for a little girl and
    standing around on a farm waiting for something to happen, I hoped that they'd
    pick up the pace a little with season three. The premiere doesn't dissapoint!


    The Walking Dead Season 3 Mid-Premiere (72K Likes, 7K comments)
    Timothy Berteau You guys are crazy. One boring episode and you think the show sucks,
    did you forget how awesome the two episodes before it were? Not every episode is
    going to be action packed. Season 1 and 2 had some very boring moments as well.
    18 February at 00:58 via mobile ¡ 5 likes



                                                                                           9
DEMO
Looking at some Twitter Sentiment


                                    10
Surrounded by opinions
                  12 Tb         21 Pb
                                 Hadoop
                   day
                                 cluster


                           7 Pb
                          Month
                   (search queries info)


                  1 Tb           7 Tb
                  Tweets
                                Data day
                   day


                  75 Mi           4B
                  Scores         Graph
                   day          edg/day


                  Millions of
                  opinions
                                           11
Valuable Information for our Business

Questions
• Is this review positive or negative?
• Do this Twitter User like or dislike my new show?
• What are they saying about our company or services?
• How are Facebook User’s Attitudes about the next
  election?
                               Is that the only valuable info?
                                                                 12
What is Sentiment Analysis?

Text Categorization (Opinion Mining)
• Positive / Negative / Neutral
• 1, 2, 3, 4, 5 Stars
• For / Against

                           Sentiment Analysis
        Text        IN                          OUT   Category
                              Techniques

      Photos

    Videos, etc.
                                                                 13
DEMO
Twitter for Analytics


                        14
Gathering and Storing Data




                    April 10-12 | Chicago, IL
Gathering data from sources

Public/Private APIs
•   Limitations
•   Privacy
•   Format & Structure of source data
•   Updates

Webcrawlers? Call the cops!



                                        16
How may it look?

JSON Example (1 tweet)




                         17
Storing the gathered data

Why?
• Valuable data!
• Store now. Figure out later


How?
• Structured (Relational Database – SQL Server)
• Semi-Structured (Hadoop Cluster – Microsoft HDInsight)



                                                           18
Doctor, We need some help

The 4 V’s

                Volume

                Velocity

                Variety

               Variability


                             19
Doctor, We need some help

The 4 V’s

                Volume

                Velocity

                Variety

               Variability


                             20
Start building our system



            Extract
                        Extract
          Transform
                         Load
             Load




                                  21
DEMO
Let’s gather some Twitter data ☺


                                   22
Sentiment Analysis
Techniques




                     April 10-12 | Chicago, IL
Understanding the problem

                     Data != Sentiment Data


Identify relevant parts
• Nouns
• Adjectives/Adverbs
• Verbs

Drop anything else

                                              24
Sentiment Analysis Techniques

Techniques
• Natural Language Processing
• Basic Statistics
  •   Clustering
  •   Fuzzy Components
  •   Classification
  •   Estimation


Tools
HDInsight, SSIS, SSAS DM, FullText Search (Semantic Search)

                                                              25
Dictionaries
                                     Tone      Index   Dictionary
                                    Positive      10    amazing
                                    Positive       9    awesome
                                    Positive       8    best
Match words                         Positive       7    excellent
                                    Positive       6    exciting
“Coke is the best & coolest drink   Positive       5    great
on the market”                      Positive
                                    Positive
                                                   4
                                                   3
                                                        good
                                                        rocks
                                    Positive       2    cool
                                    Positive       1    :)
                                    Neutral        0
                                    Negative      -1   :(
                                    Negative      -2   poor
                                    Negative      -3   bad
                                    Negative      -4   criticized
Some Other Database Example         Negative      -5   attacked
                                    Negative      -6   humiliated
SentiWordNet                        Negative      -7   sucks
http://sentiwordnet.isti.cnr.it/    Negative
                                    Negative
                                                  -8
                                                  -9
                                                       terrible
                                                       horrible
                                    Negative     -10   worthless

                                                                    26
A closer look at the example
             “Coke is the best & coolest drink on the market”


                                          Tone      Index       Dictionary
                                         Positive     10         amazing
                 Fuzzy Match             Positive      9         awesome
                                         Positive      8         best
                                         Positive      7         excellent
     “Best” matches “best” =+ 8          Positive      6         exciting
                                         Positive      5         great
  “coolest” matches “cool” =+ 2          Positive
                                         Positive
                                                       4
                                                       3
                                                                 good
                                                                 rocks
                                         Positive      2         cool
                   Total = + 10          Positive      1         :)
                                         Neutral       0


                               ☺         Negative
                                         Negative
                                                      -1
                                                      -2
                                                                :(
                                                                poor



                                                                             27
DEMO
Give me your sentiment!


                          28
Sentiment Analysis Challenges

Phrase level polarity
• “The US fears happy caffeine consumers”.


Context polarity
• Context: The date that Steve Jobs died.
• Opinion 1: “This is the worst day of my life”.
• Opinion 2: “The world is a better place today”.


Irony, Sarcasm, Irregular forms, Pragmatic information…
                                                          29
It looks like…

                   Extract
                               Extract
                 Transform
                                Load
                    Load




                           SA
                       Techniques

                        ☺
                                         30
Combining Sentiment
Analysis with Structured
Data




                     April 10-12 | Chicago, IL
MORE Valuable Information for our
Business

Questions
• Is this positive/negative review affecting my sales?

• Have we increased our sales because of positive tweets? Can we
  learn anything from opinions to improve our products?

• How are people responding to this campaign?


                                                                   32
Bring our cubes into a new world

New ways of analysis

Better decisions
• Drive next campaign, new wave of products, etc.


Estimate future figures

                                 ☺              +
                                                    33
DEMO
Our cubes have feelings ☺


                            34
Final picture

                  Extract
                              Extract
                Transform
                               Load
                   Load




                          SA
                      Techniques

                       ☺           +    35
Brief Summary

During this session..
• Business Insights from social opinions

• Sentiment Analysis loves Big Data ☺

• Microsoft Technologies can help us with some SA Techniques

• There is a huge competitive advantage by using Sentiment Analysis


                                                                      36
QUESTIONS
Positive / Negative / Neutral ones ☺


                                       37
Contact us!

RubĂŠn Pertusa LĂłpez (rpertusa@solidq.com)
@rpertusa
Data Platform Engineer, SolidQ
Microsoft Active Professional 2013


Paco GonzĂĄlez (paco@solidq.com)
@pacosql
Mentor, SolidQ




                                            38
Win a Microsoft Surface Pro!
Complete an online SESSION EVALUATION
to be entered into the draw.

Draw closes April 12, 11:59pm CT
Winners will be announced on the PASS BA
Conference website and on Twitter.

Go to passbaconference.com/evals or follow the QR code link displayed on
session signage throughout the conference venue.

Your feedback is important and valuable. All feedback will be used to improve
and select sessions for future events.
Platinum Sponsor
                                     Thank you!
Diamond Sponsor




                                             April 10-12, Chicago, IL

Weitere ähnliche Inhalte

Ähnlich wie Social text sentiment and tone analysis [aai 201] - (4160)

Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012srosenblatt
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real worldDiana Maynard
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen ScienceAndrea Wiggins
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionRobin van Emden
 
Corporate Social Networking
Corporate Social NetworkingCorporate Social Networking
Corporate Social NetworkingAndy Hadfield
 
Using a data visualization tool to drive data curiosity
Using a data visualization tool to drive data curiosityUsing a data visualization tool to drive data curiosity
Using a data visualization tool to drive data curiosityInnoTech
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft "The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft Dataconomy Media
 
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site IntelligencegoBalto
 
What is devops
What is devopsWhat is devops
What is devopsAaron Blythe
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?Ehren Reilly
 
Classroom action research
Classroom action researchClassroom action research
Classroom action researchsukong
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Linuxmalaysia Malaysia
 
JAX London 2016: "Empathy - The hidden ingredient of good software development?"
JAX London 2016: "Empathy - The hidden ingredient of good software development?"JAX London 2016: "Empathy - The hidden ingredient of good software development?"
JAX London 2016: "Empathy - The hidden ingredient of good software development?"Daniel Bryant
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational DataLars von Sneidern
 
Transversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal Ltd
 
Science of Social Media Personal Branding Keynote
Science of Social Media Personal Branding KeynoteScience of Social Media Personal Branding Keynote
Science of Social Media Personal Branding KeynoteDevon Smith
 
Why IT Needs Artistic Sensibilities
Why IT Needs Artistic SensibilitiesWhy IT Needs Artistic Sensibilities
Why IT Needs Artistic SensibilitiesVince Kellen, Ph.D.
 
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...Gigi Johnson
 

Ähnlich wie Social text sentiment and tone analysis [aai 201] - (4160) (20)

Action research for_librarians_carl2012
Action research for_librarians_carl2012Action research for_librarians_carl2012
Action research for_librarians_carl2012
 
Challenges of social media analysis in the real world
Challenges of social media analysis in the real worldChallenges of social media analysis in the real world
Challenges of social media analysis in the real world
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
 
Developing in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit editionDeveloping in R - the contextual Multi-Armed Bandit edition
Developing in R - the contextual Multi-Armed Bandit edition
 
Corporate Social Networking
Corporate Social NetworkingCorporate Social Networking
Corporate Social Networking
 
Using a data visualization tool to drive data curiosity
Using a data visualization tool to drive data curiosityUsing a data visualization tool to drive data curiosity
Using a data visualization tool to drive data curiosity
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft "The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft
"The Data Janitor 101", Daniel Molnar, Senior Data Scientist at Microsoft
 
CSI: Clinical Site Intelligence
CSI: Clinical Site IntelligenceCSI: Clinical Site Intelligence
CSI: Clinical Site Intelligence
 
What is devops
What is devopsWhat is devops
What is devops
 
What Questions Are Worth Answering?
What Questions Are Worth Answering?What Questions Are Worth Answering?
What Questions Are Worth Answering?
 
Classroom action research
Classroom action researchClassroom action research
Classroom action research
 
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th Novembe...
 
JAX London 2016: "Empathy - The hidden ingredient of good software development?"
JAX London 2016: "Empathy - The hidden ingredient of good software development?"JAX London 2016: "Empathy - The hidden ingredient of good software development?"
JAX London 2016: "Empathy - The hidden ingredient of good software development?"
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational Data
 
Transversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revisedTransversal social media monitoring overview (october 2012) revised
Transversal social media monitoring overview (october 2012) revised
 
Science of Social Media Personal Branding Keynote
Science of Social Media Personal Branding KeynoteScience of Social Media Personal Branding Keynote
Science of Social Media Personal Branding Keynote
 
Data Mining & Engineering
Data Mining & EngineeringData Mining & Engineering
Data Mining & Engineering
 
Why IT Needs Artistic Sensibilities
Why IT Needs Artistic SensibilitiesWhy IT Needs Artistic Sensibilities
Why IT Needs Artistic Sensibilities
 
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...
Messy Research: How to Make Qualitative Data Quantifiable and Make Messy Data...
 

KĂźrzlich hochgeladen

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

KĂźrzlich hochgeladen (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Social text sentiment and tone analysis [aai 201] - (4160)

  • 1. Social Text, Sentiment and Tone Analysis Advanced Analytics and Insights Ruben Pertusa Lopez, SolidQ, Business Intelligence DPE & MAP 2013 (rpertusa@solidq.com) Paco Gonzalez, SolidQ, Mentor (paco@solidq.com) April 10-12 | Chicago, IL
  • 2. Please silence cell phones April 10-12 | Chicago, IL
  • 3. Paco & Ruben • Conference Speakers • Based in Europe • Book Authors • PHD Candidate on Data Mining • Project Managers • Microsoft Certified Professionals 3
  • 4. Goals • This session will help us understand how to analyze sentiment / tone using cutting edge Microsoft Technologies • This is NOT a research session on NLP (Natural Language Processing). Don’t be scared! ☺ 4
  • 5. Agenda • Overview of Sentiment Analysis • Gathering and Storing Data • Sentiment Analysis Techniques • Business Analytics for Sentiment & Structured data, together 5
  • 6. Overview of Sentiment Analysis April 10-12 | Chicago, IL
  • 7. What is Sentiment? Feelings, Opinions, Emotions • Like • Dislike • Good • Bad • … 7
  • 8. Some examples paulo @paulomors Barcelona Beating Milan 4-0 and a gallon of Coke, best day of the year ☺ ☺ αυѕтιη ✰ @AustinJohnsto A 2 hour delay is like when a restaurant only has Pepsi products. It'll work but ill still be very disappointed. Trace @trace_haf Drinking more Coke than water every day does no good for you health. ☺? ? DONALD @donald150 My budget decides whether coke or pepsi, always buy the cheaper 8
  • 9. Not only Twitter! The Walking Dead Season 3 (3300 customer reviews) Awesome season opener! October 15, 2012 By K. Erwin After sticking through season two, which was alot of looking for a little girl and standing around on a farm waiting for something to happen, I hoped that they'd pick up the pace a little with season three. The premiere doesn't dissapoint! The Walking Dead Season 3 Mid-Premiere (72K Likes, 7K comments) Timothy Berteau You guys are crazy. One boring episode and you think the show sucks, did you forget how awesome the two episodes before it were? Not every episode is going to be action packed. Season 1 and 2 had some very boring moments as well. 18 February at 00:58 via mobile ¡ 5 likes 9
  • 10. DEMO Looking at some Twitter Sentiment 10
  • 11. Surrounded by opinions 12 Tb 21 Pb Hadoop day cluster 7 Pb Month (search queries info) 1 Tb 7 Tb Tweets Data day day 75 Mi 4B Scores Graph day edg/day Millions of opinions 11
  • 12. Valuable Information for our Business Questions • Is this review positive or negative? • Do this Twitter User like or dislike my new show? • What are they saying about our company or services? • How are Facebook User’s Attitudes about the next election? Is that the only valuable info? 12
  • 13. What is Sentiment Analysis? Text Categorization (Opinion Mining) • Positive / Negative / Neutral • 1, 2, 3, 4, 5 Stars • For / Against Sentiment Analysis Text IN OUT Category Techniques Photos Videos, etc. 13
  • 15. Gathering and Storing Data April 10-12 | Chicago, IL
  • 16. Gathering data from sources Public/Private APIs • Limitations • Privacy • Format & Structure of source data • Updates Webcrawlers? Call the cops! 16
  • 17. How may it look? JSON Example (1 tweet) 17
  • 18. Storing the gathered data Why? • Valuable data! • Store now. Figure out later How? • Structured (Relational Database – SQL Server) • Semi-Structured (Hadoop Cluster – Microsoft HDInsight) 18
  • 19. Doctor, We need some help The 4 V’s Volume Velocity Variety Variability 19
  • 20. Doctor, We need some help The 4 V’s Volume Velocity Variety Variability 20
  • 21. Start building our system Extract Extract Transform Load Load 21
  • 22. DEMO Let’s gather some Twitter data ☺ 22
  • 23. Sentiment Analysis Techniques April 10-12 | Chicago, IL
  • 24. Understanding the problem Data != Sentiment Data Identify relevant parts • Nouns • Adjectives/Adverbs • Verbs Drop anything else 24
  • 25. Sentiment Analysis Techniques Techniques • Natural Language Processing • Basic Statistics • Clustering • Fuzzy Components • Classification • Estimation Tools HDInsight, SSIS, SSAS DM, FullText Search (Semantic Search) 25
  • 26. Dictionaries Tone Index Dictionary Positive 10 amazing Positive 9 awesome Positive 8 best Match words Positive 7 excellent Positive 6 exciting “Coke is the best & coolest drink Positive 5 great on the market” Positive Positive 4 3 good rocks Positive 2 cool Positive 1 :) Neutral 0 Negative -1 :( Negative -2 poor Negative -3 bad Negative -4 criticized Some Other Database Example Negative -5 attacked Negative -6 humiliated SentiWordNet Negative -7 sucks http://sentiwordnet.isti.cnr.it/ Negative Negative -8 -9 terrible horrible Negative -10 worthless 26
  • 27. A closer look at the example “Coke is the best & coolest drink on the market” Tone Index Dictionary Positive 10 amazing Fuzzy Match Positive 9 awesome Positive 8 best Positive 7 excellent “Best” matches “best” =+ 8 Positive 6 exciting Positive 5 great “coolest” matches “cool” =+ 2 Positive Positive 4 3 good rocks Positive 2 cool Total = + 10 Positive 1 :) Neutral 0 ☺ Negative Negative -1 -2 :( poor 27
  • 28. DEMO Give me your sentiment! 28
  • 29. Sentiment Analysis Challenges Phrase level polarity • “The US fears happy caffeine consumers”. Context polarity • Context: The date that Steve Jobs died. • Opinion 1: “This is the worst day of my life”. • Opinion 2: “The world is a better place today”. Irony, Sarcasm, Irregular forms, Pragmatic information… 29
  • 30. It looks like… Extract Extract Transform Load Load SA Techniques ☺ 30
  • 31. Combining Sentiment Analysis with Structured Data April 10-12 | Chicago, IL
  • 32. MORE Valuable Information for our Business Questions • Is this positive/negative review affecting my sales? • Have we increased our sales because of positive tweets? Can we learn anything from opinions to improve our products? • How are people responding to this campaign? 32
  • 33. Bring our cubes into a new world New ways of analysis Better decisions • Drive next campaign, new wave of products, etc. Estimate future figures ☺ + 33
  • 34. DEMO Our cubes have feelings ☺ 34
  • 35. Final picture Extract Extract Transform Load Load SA Techniques ☺ + 35
  • 36. Brief Summary During this session.. • Business Insights from social opinions • Sentiment Analysis loves Big Data ☺ • Microsoft Technologies can help us with some SA Techniques • There is a huge competitive advantage by using Sentiment Analysis 36
  • 37. QUESTIONS Positive / Negative / Neutral ones ☺ 37
  • 38. Contact us! RubĂŠn Pertusa LĂłpez (rpertusa@solidq.com) @rpertusa Data Platform Engineer, SolidQ Microsoft Active Professional 2013 Paco GonzĂĄlez (paco@solidq.com) @pacosql Mentor, SolidQ 38
  • 39. Win a Microsoft Surface Pro! Complete an online SESSION EVALUATION to be entered into the draw. Draw closes April 12, 11:59pm CT Winners will be announced on the PASS BA Conference website and on Twitter. Go to passbaconference.com/evals or follow the QR code link displayed on session signage throughout the conference venue. Your feedback is important and valuable. All feedback will be used to improve and select sessions for future events.
  • 40. Platinum Sponsor Thank you! Diamond Sponsor April 10-12, Chicago, IL